This week's agenda:
Open Source of the Week - the Snowflake-AI-Toolkit project
New learning resources - building a transformer, reducing A/B test bias, calculus with Python, bivariate maps
Book of the week - R for Economic Research by J. Renato Leripio
I share daily updates on Substack, Facebook, Telegram, WhatsApp, and Viber.
Are you interested in learning how to set up automation using GitHub Actions? If so, please check out my course on LinkedIn Learning:
Open Source of the Week
This week's focus is on the Snowflake-AI-Toolkit project. The Snowflake AI Toolkit is an open-source, Streamlit-based native app designed to accelerate AI development within the Snowflake ecosystem. It serves as an interactive playground for data scientists, machine learning engineers, and developers to explore, prototype, and deploy AI solutions leveraging Snowflake's Cortex and AI functions.
Project repository: https://github.com/Snowflake-Labs/Snowflake-AI-Toolkit
Key Features
Interactive Playground: Experiment with Snowflake Cortex functions, test prompts, and engage in conversational AI within a user-friendly interface.
Rapid Prototyping: Quickly build and validate AI-driven solutions directly in Snowflake without extensive setup.
Plug-and-Play Deployment: Easily integrate and deploy the toolkit as a native app within your Snowflake environment.
Build: A dedicated section for constructing and deploying data pipelines and workflows using Snowflake Cortex's powerful AI capabilities, enabling seamless integration with your Snowflake databases and tables
Search: Get a hybrid (vector and keyword) search engine on your text data in minutes
Agent: Cortex Agents orchestrate across both structured and unstructured data sources to deliver insights.
Thanks to Shankar Narayanan SGS, who is also a contributor to this project, for sharing this project with me.
License: Apache 2.0
New Learning Resources
Here are some new learning resources that I came across this week.
Agent Observability with Opik
The following tutorial by
is part of a course that focuses on LLMOps. This video focuses on:Prompt versioning
Agent observability
Evaluation of dataset management
Agentic RAG evals
Build Your Own Transformer
The following workshop by Sheetal Borar, Chuxin Liu, and Shefali Shrivastava from the PyData Global conference dives into the foundation of transformer models. This includes building a transformer model from scratch using Python and numpy.
PyData Global 2024 Conference
All the talks and workshops from the PyData Global 2024 conference are now available to watch on YouTube. I really enjoyed the talks I watched so far, and the talks span from core statistics topics to core AI.
Model Context Protocol
A short tutorial for building an MCP server by Shaw Talebi. The tutorial covers how to build a custom MCP server with Python and connect it to Claude Desktop.
Statistics for Data Science
The second article from
newsletter focuses on core statistical concepts for data science. It covers foundational topics such as percentiles and quartiles, z-score, and confidence intervals. Great summary!Reducing A/B Test Bias: A Machine Learning Approach
The following article by Segev Samuel Gavish from Fiverr discusses how biases like failed randomization, heterogeneous user behavior, and outliers can distort A/B test results, even with proper design. To address this, Fiverr developed "Edison," a machine learning-based tool that detects and corrects biases, ensuring more accurate experiment outcomes.
Calculus with Python
This new course by Ed Pratowski and freeCodeCamp focuses on university-level calculus with Python implementation.
Bivariate Maps Tutorial
A great tutorial for creating a 2D and 3D bivariate visualization by Milos Popovic using R.
Airflow 3.0 Explained
We will conclude this section with a data engineering resource. The following video by
provides an introduction to Airflow 3.0.Book of the Week
This week's focus is on the R for Economic Research by J. Renato Leripio. This book is an intermediate-level guide designed for economists and data professionals aiming to enhance their analytical capabilities using R. The book emphasizes practical tools and workflows essential for modern economic analysis.​
The book covers the following topics:
Data manipulation and automation with the Tidyverse
Time series techniques: rolling calculations, seasonal adjustment, deflating nominal values, Hodrick-Prescott filter
Forecasting methods: comparative forecasting, simulations
Economic modeling: single and multiple equation models
State-space models: time-varying regression coefficients, dynamic factor models​
This book is ideal for economists, analysts, and data scientists with a foundational understanding of R. This book bridges the gap between theoretical concepts and practical application in economic research.
The book is available online and is open and free.
Have any questions? Please comment below!
See you next Saturday!
Thanks,
Rami