535514 Reinforcement Learning (強化學習原理)

Instructor: Ping-Chun Hsieh
Email: pinghsieh [AT] nycu [DOT] edu [DOT] tw
References:
- [SB] Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd edition, 2019
- [AJK] Alekh Agarwal, Nan Jiang Sham M. Kakade, Reinforcement Learning: Theory and Algorithms, 2020 (https://rltheorybook.github.io/rl_monograph_AJK.pdf)
- [BCN] Léon Bottou, Frank E. Curtis, and Jorge Nocedal, Optimization Methods for Large-Scale Machine Learning (https://arxiv.org/abs/1606.04838)
- [NW] Jorge Nocedal and Stephen Wright, Numerical optimization, 2006
- [LS] Tor Lattimore and Csaba Szepesvari, Bandit Algorithms, 2019 (https://tor-lattimore.com/downloads/book/book.pdf)

Grading
- Assignments: 30%
- Pre-lecture assignments: 15%
- Team final project: 55% (Proposal: 6%, Baselines: 12%, Theoretical deepdive: 15%, Poster presentation: 10%, Final report: 12%)

Week	Lecture	Date	Topics	Lecture Slides
1	1	2/18	Introduction to RL and MDP	Lec1, Lec1 annotated
2	2	2/25	MDP and Optimal Control	Lec2, Lec2 annotated
3	3	3/4	Policy Iteration, Regularized MDP, and Policy Gradient	Lec3, Lec3 annotated
4	4	3/11	Policy Gradient	Lec4, Lec4 annotated
5	5	3/18	Variance Reduction and Model-Free Prediction	Lec5, Lec5 annotated
6	6	3/25	Value Function Approximation and Optimality of PG	Lec6, Lec6 annotated
7	7	4/1	Deterministic Policy Gradient	Lec7, Lec7 annotated
8	8	4/8	TRPO and PPO	Lec8, Lec8 annotated
9	9	4/15	Value-based RL	Lec9, Lec9 annotated
10	10	4/22	Deep Q Network, Stochastic Approximation, and Distributional RL	Lec10, Lec10 annotated
11	11	5/6	Distributional RL, SAC, and Model-based RL	Lec11, Lec11 annotated
12	12	5/13	Model-based RL	Lec12, Lec12 annotated
13	13	5/20	Inverse RL	Lec13, Lec13 annotated
14	14	5/27	Multi-Objective RL and Unsupervised RL	Lec14, Lec14 annotated
15	15	6/3	Final Poster Presentations