RL Pre-Training
Reward-Free RL
Learning to Optimize via RL Pre-Training
Test-Time RL Methods
Test-Time Alignment for Large Language Models via Textual Model Predictive Control
Kuang-Da Wang*, Teng-Ruei Chen*, Yu Heng Hung, Guo-Xun Ko, Shuoyang Ding, Yueh-Hua Wu, Yu-Chiang Frank Wang, Chao-Han Huck Yang, Wen-Chih Peng, and Ping-Chun Hsieh (*: equal contribution)
International Conference on Learning Representations (ICLR), 2026 (Acceptance rate = 28%)
[Project Page]
Single-Task RL
Cross-Domain RL
Action-Constrained RL
Action-Constrained Imitation Learning
Chia-Han Yeh*, Tse-Sheng Nan*, Risto Vuorio, Wei Hung, Hung Yen Wu, Shao-Hua Sun, and Ping-Chun Hsieh (*: equal contribution)
International Conference on Machine Learning (ICML), 2025 (Acceptance rate = 26.9%)
[Code][Video]
Misc (MORL, Offline RL, and Robust RL)
Diffusion-Reward Adversarial Imitation Learning
Chun-Mao Lai*, Hsiang-Chun Wang*, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, and Shao-Hua Sun (*: equal contribution)
Conference on Neural Information Processing Systems (NeurIPS), 2024 (Acceptance rate = 25.8%)
RL and Bandits Theory
Global Convergence of RL
Bandits
Applications
|