The ggplot2 New Release, Regression and Other Stories, Deep Learning for Computer Vision, Introduction to Decision Trees with Python
A weekly curated update on data science and engineering topics and resources.
This week's agenda:
Open Source of the Week - ggplot2 new release
New learning resources - Stanford deep learning for computer vision, MIT real analysis, embedding Gemma model, introduction to decision trees with Python
Book of the week - Regression and Other Stories by Prof. Andrew Gelman, Prof. Jennifer Hill, and Prof. Aki Vehtari
I share daily updates on Substack, Facebook, Telegram, WhatsApp, and Viber.
Are you interested in learning how to set up automation using GitHub Actions? If so, please check out my course on LinkedIn Learning:
Open Source of the Week
This week's focus is on the ggplot2 recent release - version 4.0.0. The ggplot2 project is one of the main open source data visualization projects. This R library follows the principles of the grammar of graphics, and it provides tools and functions for creating plots and infographics.
Project repo: https://github.com/tidyverse/ggplot2/
Version 4.0.0 Highlights:
Adopting S7 - a major change in the new release is adopting the S7 objects instead of the S3. The S7 is the newer R OOP system, and it provides more flexibility with respect to S3.
Theme enhancements and new style defaults
The theme system now also controls layer defaults (things like default point shapes, default colour/fill) - more of the non-data styling is centralized in themes.
The new roles “ink” (foreground), “paper” (background), and “accent” are introduced to distinguish colours and style by role rather than just fill vs colour.
Built-in “complete” themes now include these roles and propagate them to layer defaults.
Improved scale and palette defaults
Default scales (e.g., for discrete, continuous) now have
palette = NULL
. If the palette is null, the theme’s palette setting is used. This lets themes define palettes for aesthetics (colour, shape, etc.)New theme settings of the form
palette.{aesthetic}.{type}
(e.g., discrete vs continuous) makes it easier to coordinate theme-wide colour/shape palettes.
Facet improvements
facet_wrap()
now have a new single argumentdir
with 8 options (combinations of top/bottom/left/right) that encode where the first facet will be, and how filling proceeds. Replaces the older mix ofdir
+as.table
For cases where there is only one row or one column,
facet_wrap(space = ...)
now allows allocating panel widths (or heights) proportional to data ranges (“free space”), similar to what facet_grid withspace
does.New
layout
argument in facets: gives options for how data are repeated or assigned to panels
More details are available in the release notes.
If you are using Python, the plotnine project is the Python equivalent of ggplot2.
License: MIT
New Learning Resources
Here are some new learning resources that I came across this week.
Stanford Deep Learning for Computer Vision
Stanford released the new version of one of the most popular Stanford Deep Learning courses - Deep Learning for Computer Vision, taught by Prof. Fei-Fei Li, Prof. Ehsan Adeli, Prof. Justin Johnson, and TA Zane Durante. This full-semester course covers the following topics:
End-to-end models
Image classification, localization, and detection
Implementation, training, and debugging
Learning algorithms, such as backpropagation
Long Short-Term Memory (LSTM)
Recurrent Neural Networks (RNN)
Supervised and unsupervised learning
More details are on the course website: https://cs231n.stanford.edu/
MIT Real Analysis Course
MIT released its Real Analysis course, and I highly recommend it if you wish to learn how to prove mathematical theory. This full-semester course, by Prof. Tobias Holck Colding, focuses on:
Prove mathematical theorems in Analysis
Write proofs
Prove theorems in calculus in a rigorous way
Embedding Gemma
This tutorial provides a step-by-step guide for on-device RAG with Google’s Embedding Gemma (300M) model.
Introduction to Decision Trees with Python
The following tutorial by Anna Strahl provides an introduction to decision trees and random forests with Python. This one-hour tutorial covers the following topics:
Exploratory Data Analysis (EDA)
Data Cleaning
Machine learning data prep
Building the decision tree
Visualizing and explaining the model
Evaluating the model
Book of the Week
Following last week's book, this week's focus is on another core statistics book - Regression and Other Stories by Prof. Andrew Gelman, Prof. Jennifer Hill, and Prof. Aki Vehtari. This book emphasizes applied and real-world regression problems such as comparison, estimation, prediction, and causal inference.
Topics covered include:
Fundamental topics such as data and measurement, probability, statistical inference, and simulation
Linear regression theory - single and multiple predictors, fitting models, inference, assumptions, and model diagnostics
Generalized linear models (GLM) - other regression forms, such as logistic regression
Causal inference methods with regression
This book is ideal for anyone interested in practical data analysis — especially students or professionals in statistics, social sciences, public health, and economics — who wants to go beyond black-box tools and develop intuition, diagnostic skills, and judgment when using regression and causal inference in complex real-world situations.
Thanks to the authors, a free online version of the book is available on the website. If you'd like to support the authors or get a physical copy, you can purchase the book on Amazon.
Have any questions? Please comment below!
See you next Saturday!
Thanks,
Rami