The practical, end-to-end online course that teaches you how RL is used to fine-tune LLMs, through clear explanations and hands-on labs.

February 25 - March 14, 2026

Join us to learn how to:

Explain the full LLM training stack and where RL fits
Choose between PPO/DPO/GRPO/GSPO for real use cases
Design reward signals (including verifiable rewards)
Build a simple tool-using agent and fine-tune it with RL
Debug RL fine-tuning with practical diagnostics

Who is this course for?

ML / LLM engineers and applied scientists who want practical RL-for-LLMs skills
Data scientists who can code but want to understand and implement RLHF-style training
Builders working on agents, tool use, reasoning, or reliability improvements
You should be comfortable with: Python + basic deep learning concepts.

Program Overview

Live virtual sessions

Learn directly from five well-known instructors who teach clearly and build things that work.

All live online sessions will take place at 12pm Eastern Time / 9am Pacific Time

Wednesday, February 25th
Saturday, February 28th
Wednesday, March 4th
Saturday, March 7th
Wednesday, March 11th
Saturday, March 14th

Access to labs

and a learning platform resources available for you to learn on your own schedule

Community

You won’t be learning alone. Expect structured support and a place to ask questions as you go.

You’ll also get access to:

Step-by-step coding labs (including an agent you’ll build and improve)
Templates, notebooks, and reference implementations
Recommended readings + “cheat sheets” for key methods
Replays (so you can review anything you missed)

Course curriculum

Module 1: The Essential Concepts of Reinforcement Learning

with Josh Starmer

A clean foundation: environments, rewards, policies, and how RL differs from supervised learning. You’ll see RL in action and code a simple example to make optimal decisions under uncertainty.

Module 2: The Evolution of RL for LLMs: RLHF → Verifiable Rewards

with Maarten Grootendorst

Understand common reinforcement learning algorithms, like PPO and GRPO, along with several use cases such as reasoning, tool use, and test-time reinforcement learning.

Module 3: GRPO Deep Dive

with Luis Serrano

A focused, intuitive deep dive into GRPO—what’s happening under the hood, why it works well for verifiable rewards, and what tradeoffs to watch.

Module 4: Labs 1–2: Setting up the Calculator Agent

with Chris McCormick

You’ll build the full scaffold: prompts, tool interface, evaluation loop, and a reward signal that can be verified.

Module 5: Lab 3: Dissecting a GRPO training run

with Jay Alammar

Put it all together: an in-depth look at a full RL training run, the systems involved, and what changes across the training run.

BONUS Module: Join us for an "Ask Us Anything"

Whether you have additional technical or career questions, the full team will be available for you.

Faculty

Jay Alammar is a machine learning researcher and writer, co-author of Hands-On Large Language Models: Language Understanding and Generation, whose illustrated articles have helped millions visually understand transformers and modern NLP.

Maarten Grootendorst is a data scientist and creator of popular NLP libraries like BERTopic and KeyBERT, and co-author of Hands-On Large Language Models: Language Understanding and Generation, bridging cutting-edge research with practical tools.

Chris McCormick is a leading AI educator and researcher whose deep-dive tutorials on BERT, transformers, and NLP have become go-to references for practitioners worldwide, combining rigorous understanding with clear, implementation-ready code.

Luis Serrano is an ex-Google, ex-Apple AI scientist, educator, and author of Grokking Machine Learning, dedicated to making complex ideas intuitive and accessible through the Serrano Academy platform.

Josh Starmer is the founder of StatQuest and author of The StatQuest Illustrated Guide to Machine Learning and The StatQuest Illustrated Guide to Neural Networks and AI, known for turning intimidating concepts into clear, joyful explanations.

At the end of the course you'll receive a certificate of completion

We hope you'll join us!

Early Bird Rate

$1,500

SOLD OUT!

USD

when you register before February 20, 2026

Student Rate

$250

$2,000

USD

when you register with your university email account.

General Admission

$2,000

USD

when you register after February 21, 2026

RAGPACK.ai

[email protected]

The practical, end-to-end online course that teaches you how RL is used to fine-tune LLMs, through clear explanations and hands-on labs.

Join us to learn how to:

Explain the full LLM training stack and where RL fits

Choose between PPO/DPO/GRPO/GSPO for real use cases

Design reward signals (including verifiable rewards)

Build a simple tool-using agent and fine-tune it with RL

Debug RL fine-tuning with practical diagnostics

Who is this course for?

Program Overview

Live virtual sessions

Access to labs

Community

You’ll also get access to:

Course curriculum

Module 1: The Essential Concepts of Reinforcement Learning

Module 2: The Evolution of RL for LLMs: RLHF → Verifiable Rewards

Module 3: GRPO Deep Dive

Module 4: Labs 1–2: Setting up the Calculator Agent

Module 5: Lab 3: Dissecting a GRPO training run

BONUS Module: Join us for an "Ask Us Anything"

Faculty

At the end of the course you'll receive a certificate of completion

We hope you'll join us!

Early Bird Rate

$1,500

SOLD OUT!

USD

Student Rate

$250

$2,000

USD

General Admission

$2,000

RAGPACK.ai

Join Our Free Trial