The mirai Project, Polars Cookbook, New Learning Resources

Jun 21, 2025

This week's agenda:

Open Source of the Week - The mirai project
New learning resources - Docker for Data Science, AI agents in production, optimize large datasets with Pandas, mointor Python application with Prometheus and Grafana
Book of the week - Polars Cookbook by Yuki Kakegawa

I share daily updates on Substack, Facebook, Telegram, WhatsApp, and Viber.

Are you interested in learning how to set up automation using GitHub Actions? If so, please check out my course on LinkedIn Learning:

My LinkedIn Learning Course

Open Source of the Week

What I love about R is that it has a built-in multicore parallelizing capability with the parallel library and a wide ecosystem for multicore computing. This week, I came across the Mirai project - an async evaluation Framework for R.

Project repo: https://github.com/r-lib/mirai/

mirai (Japanese ミライ = “future”) is a lightweight R package enabling asynchronous, parallel, and distributed evaluation of R expressions. It empowers users to seamlessly delegate computations to background or remote daemons, offering a modern concurrency model built atop nanonext and NNG (Nanomsg Next Gen) for reliable inter-process communication and TLS-secured transport.

The inherently queued architecture of mirai enables it to handle many more tasks than available processes, unlike traditional parallel computing in R.

Async Executing for Shiny

One of the interesting applications of mirai is asynchronous task execution on Shiny applications using the ExtendedTask function. The following vignette demonstrates how to incorporate this function on a Shiny app.

More details are available in the project documentation.

License: MIT

New Learning Resources

Here are some new learning resources that I came across this week.

Getting Started with Docker for Data Science

The slides from my talk about Docker for data science at the University of New Hampshire's Peter T. Paul College of Business and Economics are available here. Code examples are available in the following repo.

Agents Towards Production

The following repository offers a comprehensive set of tutorials that cover the essential components required to build production-level AI agents. This includes topics such as:

Orchestration
Tool integration
Observability
Deployment
Memory
UI & Frontend
Agent Frameworks
Model Customization
Multi-agent Coordination
Security
Evaluation

Optimize Large Datasets with Pandas

The following tutorial by Anna Strahl focuses on methods for efficiently working with medium-sized datasets by optimizing memory usage, processing data in chunks, and loading it into a SQLite database for analysis.

Monitor Your Python Applications with Prometheus and Grafana

The following short tutorial by NeuralNine provides an introduction to monitoring with Prometheus and Grafana - open-source tools for capturing logs and visualizing them.

Book of the Week

This week's focus is on Polars - Python's new framework for working with data frames. The Polars Cookbook, by Yuki Kakegawa, is a hands‑on guide to using Python Polars—a lightning‑fast, Rust‑based DataFrame library—for real‑world data processing.

Installing Polars and mastering core concepts like DataFrame vs. LazyFrame, series operations, and method chaining
Loading from and saving to various formats (CSV, Parquet, JSON, Excel), multiple files, delta tables, and database systems
Inspecting and transforming data: type casting, deduplication, masking, outlier detection, and data visualization with Plotly
Aggregations, group‑bys, window functions, UDF usage, and SQL‑style queries within Polars
Handling missing data: detection, removal, and imputation strategies
String operations: filtering, parsing dates, extracting substrings, cleaning, splitting, and concatenation
Working with nested data: list creation and aggregation, struct and JSON manipulation
Time‑series analysis: specialized functions and workflows
Performance techniques: streaming, lazy evaluation, hardware‑efficient patterns (e.g. chunked and parallel execution)
Integration with Python libraries like pandas, NumPy, PyArrow, plus cloud platforms AWS, GCS, BigQuery, Snowflake, and S3

This book is ideal for data analysts, scientists, and engineers who are comfortable with Python (and ideally familiar with pandas or PySpark) and are seeking efficient, scalable workflows with Polars.

The book is available for purchase on Amazon.

Have any questions? Please comment below!

See you next Saturday!

Thanks,

Rami

Rami's Data Newsletter

Discussion about this post