535514 Reinforcement Learning (強化學習原理)

  • Instructor: Ping-Chun Hsieh

  • Email: pinghsieh [AT] nycu [DOT] edu [DOT] tw

  • References:

    • [SB] Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd edition, 2019

    • [AJK] Alekh Agarwal, Nan Jiang Sham M. Kakade, Reinforcement Learning: Theory and Algorithms, 2020 (https://rltheorybook.github.io/rl_monograph_AJK.pdf)

    • [BCN] Léon Bottou, Frank E. Curtis, and Jorge Nocedal, Optimization Methods for Large-Scale Machine Learning (https://arxiv.org/abs/1606.04838)

    • [NW] Jorge Nocedal and Stephen Wright, Numerical optimization, 2006

    • [LS] Tor Lattimore and Csaba Szepesvari, Bandit Algorithms, 2019 (https://tor-lattimore.com/downloads/book/book.pdf)

  • Grading

    • Assignments: 30%

    • Pre-lecture assignments: 15%

    • Team final project: 55% (Proposal: 6%, Baselines: 12%, Theoretical deepdive: 15%, Poster presentation: 10%, Final report: 12%)

  • Lecture Schedule:

Week Lecture Date Topics Lecture Slides
1 1 2/18 Introduction to RL and MDP Lec1, Lec1 annotated
2 2 2/25 MDP and Optimal Control Lec2, Lec2 annotated
3 3 3/4 Policy Iteration, Regularized MDP, and Policy Gradient Lec3, Lec3 annotated
4 4 3/11 Policy Gradient Lec4, Lec4 annotated
5 5 3/18 Variance Reduction and Model-Free Prediction Lec5, Lec5 annotated
6 6 3/25 Value Function Approximation and Optimality of PG Lec6, Lec6 annotated
7 7 4/1 Deterministic Policy Gradient Lec7, Lec7 annotated
8 8 4/8 TRPO and PPO Lec8, Lec8 annotated
9 9 4/15 Value-based RL Lec9, Lec9 annotated
10 10 4/22 Deep Q Network, Stochastic Approximation, and Distributional RL Lec10, Lec10 annotated
11 11 5/6 Distributional RL, SAC, and Model-based RL Lec11, Lec11 annotated
12 12 5/13 Model-based RL Lec12, Lec12 annotated
13 13 5/20 Inverse RL Lec13, Lec13 annotated
14 14 5/27 Multi-Objective RL and Unsupervised RL Lec14, Lec14 annotated
15 15 6/3 Final Poster Presentations