Reinforcement Learning
Teach agents to make optimal decisions through rewards, exploration, and policy optimization. Learn reinforcement learning from Markov Decision Processes through modern deep RL.
Level: Advanced · Category: Reinforcement Learning · Estimated time: 6 hours
Prerequisites
- Neural Networks Deep Dive
Lessons
- MDPs & Value Functions — Markov Decision Processes, state/action values, and the Bellman equation.
- Dynamic Programming — Policy evaluation, policy iteration, and value iteration.
- Monte Carlo & TD Learning — MC methods, temporal difference, SARSA, and Q-learning.
- Deep Q-Networks (DQN) — Function approximation, experience replay, target networks, and DQN variants.
- Policy Gradient Methods — REINFORCE, variance reduction, and the policy gradient theorem.
- Actor-Critic & PPO — A2C, A3C, PPO — combining value and policy methods for stable training.
- Multi-Agent RL & Applications — Multi-agent environments, self-play, and real-world RL applications.
Topics covered
reinforcement-learning, q-learning, policy-gradient, ppo, gym, deep-rl