Skip to main content

Reinforcement Learning

11 Modules ~36 hours Intermediate → Advanced

Master Reinforcement Learning from theory to practice: Markov Decision Processes, dynamic programming, Q-learning, policy gradients, actor-critic, and modern deep RL (DQN, PPO, SAC) with PyTorch and Gymnasium.

Course roadmap

#ModuleStatusTopics
0Setup & RL VocabularyPlan readyAgent, environment, reward, state, action, policy, return, episode
1Markov Decision ProcessesPlan readyMDPs, Bellman equations, value functions, policies
2Dynamic ProgrammingPlan readyPolicy iteration, value iteration, model-based RL
3Monte Carlo & Temporal DifferencePlan readyMC prediction, TD(0), SARSA, Q-learning
4Function ApproximationPlan readyLinear FA, neural net FA, deadly triad
5Deep Q-NetworksPlan readyDQN, replay buffer, target net, Double DQN, Dueling DQN
6Policy Gradient MethodsPlan readyREINFORCE, baselines, actor-critic, A2C, A3C
7Trust Region MethodsPlan readyTRPO, PPO, GAE, clipping
8Continuous ControlPlan readyDDPG, TD3, SAC, exploration noise
9Advanced TopicsPlan readyMulti-agent RL, offline RL, model-based RL, RLHF for LLMs
10CapstonePlan readyTrain an agent on a Gymnasium env: MountainCar → LunarLander → custom env

What's available now

Curriculum plan published. Content rolling out 2026 H2.

Related courses:

Last updated

2026-05 — Curriculum plan published.