Time Series Forecasting in Python, the DuckLack R Library, New Tutorials| Issue 76
A weekly curated update on data science and engineering topics and resources.
This week's agenda:
Open Source of the Week - The DuckLack R project
New learning resources - Fine-tuning LLMs, reading files with Python, getting started, and running OpenClaw with local LLMs
Book of the week - Time Series Forecasting in Python by Marco Peixeiro
The newsletter is also available on LinkedIn and Medium.
Are you interested in learning how to set up automation using GitHub Actions? If so, please check out my course on LinkedIn Learning:
Open Source of the Week
This week’s focus is on the DuckLake project - an R library by Travis Gerke, ScD that provides versioned data lake infrastructure to data-intensive workflows. Built on top of DuckDB and the DuckLake open lakehouse format, it helps data practitioners move beyond isolated flat files (CSV, Excel, etc.) and adopt a structured, audit-friendly, transactional data lake inside R. With support for ACID transactions, automatic versioning, time-travel queries, and full audit trails, ducklake-r enables reproducible, collaborative data workflows under a layered architecture familiar to modern data engineering.
Project repo: https://github.com/tgerke/ducklake-r/
Key Features
Provides an R interface to create and manage DuckLake-based data lakes with ACID guarantees and versioning.
Supports medallion (bronze/silver/gold) data architecture for layered, quality-controlled data workflows.
Enables time travel and versioned queries, allowing you to reconstruct historical dataset states.
Includes transactional operations with author attribution and commit messages, improving auditability and collaboration.
Integrates smoothly with dplyr and the R ecosystem, letting you build, transform, and explore datasets directly from R.
Implements utility functions to attach/detach data lakes, create/replace tables, and list snapshots in familiar R workflows.
More details are available in the project documentation.
License: MIT
New Learning Resources
Here are some new learning resources that I came across this week.
Fine-Tune an Open Source LLM with Claude Code/Codex
The following tutorial by Alejandro AO provides an introduction to fine-tuning a small, open-source model with Codex using the Hugging Face model trainer skill and HF Jobs.
Find Tune LLM locally
Another guide for fine-tuning LLMs, focusing on tuning small models locally.
Reading files with Python
The following concise video provides a step-by-step guide to reading files with Python (e.g., text, data).
Setting Up OpenClaw
The following tutorial by Tech with Tim provides a step-by-step guide for setting up OpenClaw on a server.
Setting OpenClaw with local LLMs
The following tutorial covers OpenClaw settings with Ollama to run it with local LLMs.
Book of the Week
This week’s focus is on forecasting book - Time Series Forecasting in Python by Marco Peixeiro. This book, as the name implies, focuses on time series forecasting modeling with Python, from classical statistical models to deep learning applications.
What the Book Covers
Introduction to time series forecasting basics — what makes forecasting different from other prediction tasks, setting forecasting goals, and preparing data.
Naive and baseline models — starting with simple methods like last-value and seasonal baselines to establish performance baselines.
Statistical forecasting models — building and tuning classical models such as moving average (MA), autoregressive (AR), ARMA, ARIMA, SARIMA, and VAR models.
Handling seasonality and exogenous variables — incorporating calendar effects and external predictors into forecasting models.
Multivariate forecasting — techniques like VAR to forecast multiple related time series jointly.
Deep learning approaches — preparing data for deep models, implementing neural networks (including LSTMs and CNNs) for forecasting, and hybrid architectures.
Automation and advanced tools — using automated forecasting libraries like Prophet to streamline model selection and evaluation.
Capstone projects — several hands-on projects applying the methods to real datasets, such as stock prices, electricity consumption, and economic indicators.
This book is ideal for data scientists and machine learning practitioners who are familiar with Python and want to learn how to build accurate, scalable time-series forecasting models using both classical statistics and modern deep learning tools such as TensorFlow.
The book is available online on the publisher’s website, and a hard copy can be purchased on Amazon.
Have any questions? Please comment below!
See you next Saturday!
Thanks,
Rami



