535514 Reinforcement Learning (強化學習原理)
Grading
Assignments: 30%
Pre-lecture assignments: 15%
Team final project: 55% (Proposal: 6%, Baselines: 12%, Theoretical deepdive: 15%, Poster presentation: 10%, Final report: 12%)
Week | Lecture | Date | Topics | Lecture Slides |
1 | 1 | 2/18 | Introduction to RL and MDP | Lec1, Lec1 annotated |
2 | 2 | 2/25 | MDP and Optimal Control | Lec2, Lec2 annotated |
3 | 3 | 3/4 | Policy Iteration, Regularized MDP, and Policy Gradient | Lec3, Lec3 annotated |
4 | 4 | 3/11 | Policy Gradient | Lec4, Lec4 annotated |
5 | 5 | 3/18 | Variance Reduction and Model-Free Prediction | Lec5, Lec5 annotated |
6 | 6 | 3/25 | Value Function Approximation and Optimality of PG | Lec6, Lec6 annotated |
7 | 7 | 4/1 | Deterministic Policy Gradient | Lec7, Lec7 annotated |
8 | 8 | 4/8 | TRPO and PPO | Lec8, Lec8 annotated |
9 | 9 | 4/15 | Value-based RL | Lec9, Lec9 annotated |
10 | 10 | 4/22 | Deep Q Network, Stochastic Approximation, and Distributional RL | Lec10, Lec10 annotated |
11 | 11 | 5/6 | Distributional RL, SAC, and Model-based RL | Lec11, Lec11 annotated |
12 | 12 | 5/13 | Model-based RL | Lec12, Lec12 annotated |
13 | 13 | 5/20 | Inverse RL | Lec13, Lec13 annotated |
14 | 14 | 5/27 | Multi-Objective RL and Unsupervised RL | Lec14, Lec14 annotated |
15 | 15 | 6/3 | Final Poster Presentations | |
|
|