535515 Spring 2023 - Reinforcement Learning (強化學習原理)
Instructor: Ping-Chun Hsieh
Email: pinghsieh [AT] nycu [DOT] edu [DOT] tw
Lectures:
Tuesdays 3:30pm-4:20pm @ EC115
Fridays 10:10am-12:00noon @ EC115
Note: The first lecture on 2/14 (Tue.) will be delivered via Webex: Webex Link
Office Hours: 4:30pm-5pm on Tuesdays or by appointment
References:
[SB] Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd edition, 2019
[AJK] Alekh Agarwal, Nan Jiang Sham M. Kakade, Reinforcement Learning: Theory and Algorithms, 2020 (https://rltheorybook.github.io/rl_monograph_AJK.pdf)
[BCN] Léon Bottou, Frank E. Curtis, and Jorge Nocedal, Optimization Methods for Large-Scale Machine Learning (https://arxiv.org/abs/1606.04838)
[NW] Jorge Nocedal and Stephen Wright, Numerical optimization, 2006
[LS] Tor Lattimore and Csaba Szepesvari, Bandit Algorithms, 2019 (https://tor-lattimore.com/downloads/book/book.pdf)
Week | Lecture | Date | Topics | Lecture Slides |
1 | 1 | 2/14 | Logistics and Introduction to RL | |
1 | 2 | 2/17 | Introduction to RL and MDPs | |
2 | 3 | 2/21 | Planning for MDPs | |
2 | 4 | 2/24 | Regularized and Distributional Perspective of MDPs | |
3 | | 2/28 | Peace Memorial day | |
3 | 5 | 3/3 | Policy Optimization | |
4 | 6 | 3/7 | Policy Optimization and First-Order Optimization Methods | |
4 | 7 | 3/10 | Policy Gradient | |
5 | 8 | 3/14 | Policy Gradient and Stochastic Gradient Descent | |
5 | 9 | 3/17 | Variance Reduction for Stochastic PG | |
6 | 10 | 3/21 | Variance Reduction for Model-Free Prediction | |
6 | 11 | 3/24 | Model-Free Prediction | |
7 | 12 | 3/28 | Global Convergence of PG | |
7 | 13 | 3/31 | Natural PG | |
8 | | 4/4 | Spring Break | |
8 | 14 | 4/7 | Value Function Approximation | |
9 | 15 | 4/11 | Value Function Approximation | |
9 | 16 | 4/14 | Trust Region Policy Optimization (TRPO) | |
10 | 17 | 4/18 | Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) | |
10 | 18 | 4/21 | Deterministic Policy Gradient (DPG) | |
11 | 19 | 4/25 | DPG, DDPG, and Off-Policy Learning | |
11 | 20 | 4/28 | Off-Policy Stochastic PG | |
12 | 21 | 5/2 | Value-Based Methods - Sarsa and Expected Sarsa | |
12 | 22 | 5/5 | Value-Based Methods - Q-Learning and Double Q-Learning | |
13 | 23 | 5/9 | Q-Learning With VFA, DQN and Double DQN | |
13 | 24 | 5/12 | Q-Learning for Continuous Control and Soft Actor-Critic | |
14 | 25 | 5/16 | Distributional RL (C51, QR-DQN, and IQN) | |
14 | 26 | 5/19 | Inverse RL | |
15 | 27 | 5/23 | Inverse RL | |
15 | 28 | 5/26 | Inverse RL | |
16 | | 5/30 | Rescheduled for Final Presentation (Final Exam Week) | |
16 | | 6/2 | Rescheduled for Final Presentation (Final Exam Week) | |
17 | | 6/6 | Final Presentation | |
17 | | 6/9 | Final Presentation |
|
|