535515 Spring 2023 - Reinforcement Learning (強化學習原理)

  • Instructor: Ping-Chun Hsieh

  • Email: pinghsieh [AT] nycu [DOT] edu [DOT] tw

  • Lectures:

    • Tuesdays 3:30pm-4:20pm @ EC115

    • Fridays 10:10am-12:00noon @ EC115

    • Note: The first lecture on 2/14 (Tue.) will be delivered via Webex: Webex Link

  • Office Hours: 4:30pm-5pm on Tuesdays or by appointment

  • References:

    • [SB] Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd edition, 2019

    • [AJK] Alekh Agarwal, Nan Jiang Sham M. Kakade, Reinforcement Learning: Theory and Algorithms, 2020 (https://rltheorybook.github.io/rl_monograph_AJK.pdf)

    • [BCN] Léon Bottou, Frank E. Curtis, and Jorge Nocedal, Optimization Methods for Large-Scale Machine Learning (https://arxiv.org/abs/1606.04838)

    • [NW] Jorge Nocedal and Stephen Wright, Numerical optimization, 2006

    • [LS] Tor Lattimore and Csaba Szepesvari, Bandit Algorithms, 2019 (https://tor-lattimore.com/downloads/book/book.pdf)

  • Grading

    • Assignments: 35%

    • Theory Project: 30%

    • Team Implementation Project: 35% (Report: 20%, Presentation: 15%)

  • Lecture Schedule:

Week Lecture Date Topics Lecture Slides
1 1 2/14 Logistics and Introduction to RL
1 2 2/17 Introduction to RL and MDPs
2 3 2/21 Planning for MDPs
2 4 2/24 Regularized and Distributional Perspective of MDPs
3 2/28 Peace Memorial day
3 5 3/3 Policy Optimization
4 6 3/7 Policy Optimization and First-Order Optimization Methods
4 7 3/10 Policy Gradient
5 8 3/14 Policy Gradient and Stochastic Gradient Descent
5 9 3/17 Variance Reduction for Stochastic PG
6 10 3/21 Variance Reduction for Model-Free Prediction
6 11 3/24 Model-Free Prediction
7 12 3/28 Global Convergence of PG
7 13 3/31 Natural PG
8 4/4 Spring Break
8 14 4/7 Value Function Approximation
9 15 4/11 Value Function Approximation
9 16 4/14 Trust Region Policy Optimization (TRPO)
10 17 4/18 Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO)
10 18 4/21 Deterministic Policy Gradient (DPG)
11 19 4/25 DPG, DDPG, and Off-Policy Learning
11 20 4/28 Off-Policy Stochastic PG
12 21 5/2 Value-Based Methods - Sarsa and Expected Sarsa
12 22 5/5 Value-Based Methods - Q-Learning and Double Q-Learning
13 23 5/9 Q-Learning With VFA, DQN and Double DQN
13 24 5/12 Q-Learning for Continuous Control and Soft Actor-Critic
14 25 5/16 Distributional RL (C51, QR-DQN, and IQN)
14 26 5/19 Inverse RL
15 27 5/23 Inverse RL
15 28 5/26 Inverse RL
16 5/30 Rescheduled for Final Presentation (Final Exam Week)
16 6/2 Rescheduled for Final Presentation (Final Exam Week)
17 6/6 Final Presentation
17 6/9 Final Presentation