Week | Lecture | Date | Topics | Lecture Slides |
1 | 1 | 2/23 | Logistics and Introduction to RL | Lec1 |
1 | 2 | 2/26 | Introduction to RL and MDP | Lec2 |
2 | 3 | 3/2 | Planning for MDPs | Lec3 |
2 | 4 | 3/5 | Planning and Distributional Perspective of MDPs | Lec4 |
3 | 5 | 3/9 | A Distributional Perspective of MDPs and Policy Optimization | Lec5 |
3 | 6 | 3/12 | Policy Optimization and Gradient Descent | Lec6 |
4 | 7 | 3/16 | Policy Gradient | Lec7 |
4 | 8 | 3/19 | Variance Reduction and Model-Free Prediction | Lec8 |
5 | 9 | 3/23 | Model-Free Prediction and Actor-Critic Algorithms | Lec9 |
5 | 10 | 3/26 | Model-Free Prediction and Global Convergence of Policy Gradient | Lec10 |
6 | 11 | 3/30 | Global Convergence of Policy Gradient | Lec11, Lec11 (annotated) |
6 | | 4/2 | Spring Break | |
7 | | 4/6 | Spring Break | |
7 | 12 | 4/9 | Global Convergence of Policy Gradient and Value Function Approximation | Lec12 |
8 | 13 | 4/13 | Value Function Approximation | Lec13 |
8 | 14 | 4/16 | Trust Region Policy Optimization (TRPO) | Lec14 |
9 | 15 | 4/20 | Trust Region Policy Optimization (TRPO) | Lec15 |
9 | 16 | 4/23 | Proximal Policy Optimization (PPO) and Deterministic Policy Gradient (DPG) | Lec16 |
10 | 17 | 4/27 | Off-Policy Learning via Deterministic and Stochastic Policy Gradients | Lec17 |
10 | 18 | 4/30 | Off-Policy Learning via Deterministic and Stochastic Policy Gradients | Lec18 |
11 | 19 | 5/4 | Off-Policy Learning and Value-Based Methods | Lec19, Lec19 (annotated) |
11 | 20 | 5/7 | Value-Based Methods | Lec20 |
12 | 21 | 5/11 | Value-Based Methods - Expected Sarsa and Q-Learning | Lec21, Lec 21 (annotated) |
12 | 22 | 5/14 | Value-Based Methods - Q-Learning, Double Q-Learning | Lec22 |
13 | | 5/18 | Rescheduled for Final Presentation | |
13 | 23 | 5/21 | Value-Based Methods - DQN and Double DQN | Lec23 |
14 | | 5/25 | Rescheduled for Final Presentation | |
14 | | 5/28 | Rescheduled to 6/18 | |
15 | 24 | 6/1 | Distributional RL - C51 | Lec24, Lec24 (annotated) |
15 | 25 | 6/4 | Distributional RL - QR-DQN | Lec25, Lec 25 (annotated) |
16 | | 6/8 | Rescheduled for Final Presentation (Final Exam Week) | |
16 | | 6/11 | Rescheduled for Final Presentation (Final Exam Week) | |
17 | | 6/15 | No Class | |
17 | 26 | 6/18 | Implicit Quantile Networks and Soft Actor-Critic | Lec26,Lec26 (annotated) |
18 | | 6/23 | Final Presentation | |
18 | | 6/24 | Final Presentation |
|