Reinforcement Learning

Teach agents to make optimal decisions through rewards, exploration, and policy optimization. Learn reinforcement learning from Markov Decision Processes through modern deep RL.

Level: Advanced · Category: Reinforcement Learning · Estimated time: 6 hours

Prerequisites

Neural Networks Deep Dive

Lessons

MDPs & Value Functions — Markov Decision Processes, state/action values, and the Bellman equation.
Dynamic Programming — Policy evaluation, policy iteration, and value iteration.
Monte Carlo & TD Learning — MC methods, temporal difference, SARSA, and Q-learning.
Deep Q-Networks (DQN) — Function approximation, experience replay, target networks, and DQN variants.
Policy Gradient Methods — REINFORCE, variance reduction, and the policy gradient theorem.
Actor-Critic & PPO — A2C, A3C, PPO — combining value and policy methods for stable training.
Multi-Agent RL & Applications — Multi-agent environments, self-play, and real-world RL applications.

Topics covered

reinforcement-learning, q-learning, policy-gradient, ppo, gym, deep-rl

Browse all neo-ai courses · neo-ai home