Reinforcement Learning and Bandits Lab

PhD Students

047 陳明宏 (MS: 09/2023–08/2025, PhD: 09/2025–present)
048 王廣達 (MS: 09/2022–08/2024, PhD: 09/2024–present; Co-advised by Prof. Wen-Chih Peng)

MS Students

066 陳奎元 (BS@ NYCU-CS)
- Topic: Zero-Shot RL
067 張家祐 (BS@ NTHU-Science)
- Topic: Policy Gradient
068 柯國勛 (BS@ NTHU-CS)
- Topic: Test-Time Alignment
069 張立宜 (BS@ NCHU-EECS)
- Topic: Model-Predictive Control
070 蔡皓宇 (BS@ NTHU-Math)
- Topic: Zero-Shot RL
049 吳秉澍 (BS@ NYCU-CS; Undergraduate Member: 02/2023–08/2024, MS: 09/2024–present)
- Topic: RL for LLMs
050 林睿騰 (BS@ NYCU-CS; Undergraduate Member: 02/2023–08/2024, MS: 09/2024–present)
- Topic: Restless Bandits
057 林禹亨 (BS@ NYCU-CS)
- Topic: Sequence Modeling for RL
058 皮恩亞 (BS@ NCU-Math)
- Topic: Cross-Domain RL
038 温柏萱 (BS@ NYCU-CS; Undergraduate Member: 09/2022–08/2024, MS: 09/2024–present)
- Topic: RLHF
043 楊竣傑 (BS@ NYCU-CS; Undergraduate Member: 08/2023–08/2024, MS: 09/2024–present)
- Topic: Multi-Objective RL

Research Assistants

056 林楷傑 (BS@ NYCU-CS; Undergraduate Member: 08/2023–07/2025, RA: 08/2025–present)

075 馬楷翔 (BS@ NYCU-CS; RA: 08/2025–present)

Undergraduate Students

Joining in 2020

012 鍾承佑 (now MS student in CMU)、013 柯秉志、014 徐煜倫

015 王耀德、016 蔡育呈、017 周俊毅

018 張祐銘、019 張祐閤

Joining in 2021

023 鄒翔傑 (now MS student in UCSD)、024 黃迺絜 (now PhD student in CMU)

025 洪婕庭、026 陳筱霓 (now in Google)

027 許承壹、028 林浩君

Joining in 2022

037 吳文心、038 温柏萱

039 孟祥蓉、040 廖兆琪

041 沈克軒 (now MS student in TAMU)、042 陳秉劼

043 楊竣傑

Joining in 2023

049 吳秉澍、050 林睿騰

052 王振倫、053 楊沁瑜、054 施柏江

055 林佑家

056 林楷傑

Joining in 2024

059 徐嘉亨

060 黃佑得

Joining in 2025

061 周世恩、062 陳帛祥

065 廖漢軒

071 劉家琪

072 余逸翔

073 江品寬

074 戴翊宸

PhD Alumni

001 洪偉
- Dissertation: Action-Constrained Reinforcement Learning
002 洪鈺恆
- Dissertation: Learning to Optimize: Bandit Algorithms and Reinforcement Learning for Black-box Decision Problems
029 連云暄 (now Assistant Research Fellow at CITI, Acedemia Sinica 中研院資創中心助研究員; Co-advised by Prof. Yu-Shuen Wang)
- Dissertation: Bridging the Gap in Reinforcement Learning: Robust Algorithms and Techniques for Practical Applications
030 何國豪 (now at MediaTek 聯發科技; Co-advised by Prof. I-Chen Wu)
- Dissertation: A Study of Deep Reinforcement Learning for Human-Like Behavior Modeling and Scheduling Optimization

RA Alumni

063 許嘉喻 (now PhD student at University of Michigan; RA: 08/2024–06/2025)

064 秦紫頤 (now PhD student at Max Planck Institute; RA: 03/2025–07/2025)

MS Alumni

003 林峻立 (09/2019–07/2021, BS@NCU-CS, now in MediaTek)
- Thesis: Frank-Wolfe Policy Optimization for Reinforcement Learning with Action Constraints
004 謝秉瑾 (09/2019–07/2021, BS@NCCU-CS, now in Inventec)
- Thesis: Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization
005 李昕 (09/2019–03/2022, BS@NCKU-CS, now in PHISON)
- Thesis: Accelerating Gaussian Process Regression via Meta-Learned Neural Processes: A Utility-Based MAML Approach
006 蘇信恩 (02/2021–09/2022, BS@NCTU-CS, BS+MS in 5 years, now in Google)
- Thesis: Coordinate Ascent Policy Optimization
007 黃柏愷 (09/2020–09/2022, BS@NCKU-CS, now in Trend Micro Inc.)
- Thesis: Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots
008 郭俊廷 (09/2020–10/2022, BS@NCTU-EE, now in MediaTek)
- Thesis: Adaptive-UCB for Online Restless Bandits
009 黃睿宇 (09/2020–10/2022, BS@NCTU-IEM, now in MediaTek)
- Thesis: Multi-Objective Time-Varying Bayesian Optimization
010 楊上萱 (09/2020–11/2022, BS@NCTU-CS, now in MediaTek)
- Thesis: Variance-Reduced Frank-Wolfe Policy Optimization for Action-Constrained RL
011 歐陽良雋 (09/2020–12/2022, BS@ NCTU-EE, now in MediaTek)
- Thesis: Robustifying Proximal Policy Optimization Against Noisy Critics
020 楊祐維 (09/2021–10/2023, BS@ NCTU-CS, now in MediaTek)
- Thesis: Model Selection for Offline Model-Based RL via Bayesian Optimization
021 吳程畯 (09/2021–12/2023, BS@ NCHU-Applied Math)
- Thesis: Learning From Expert Demonstrations With Incomplete Observations
022 朱文滔 (09/2021–03/2024, BS@ NCU-Math)
- Thesis: Provably Convergent Mixture-of-Experts in Reinforcement Learning
031 陳彥儒 (09/2022–06/2024, BS@ NCTU-Applied Math, now in Synopsys)
- Thesis: Accelerated Policy Gradient: On the Convergence Rates of Nesterov's Momentum for Reinforcement Learning
032 張千祐 (09/2022–06/2024, BS@ NCTU-CS)
- Thesis: Learning From Demonstrative Experts of Diverse Preference
033 潘冠蓁 (09/2022–07/2024, BS@ NCTU-CS, now in TSMC)
- Thesis: Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations
034 黃亭晅 (09/2022–07/2024, BS@ NCTU-IMF)
- Thesis: Cross-Domain Knowledge Transfer via Preference Consistency
035 葉佳翰 (09/2022–01/2025, BS@ NCTU-CS, now in TronFuture)
- Thesis: Action-Constrained Imitation Learning
036 詹昀銘 (09/2022–05/2025, BS@ NCTU-CS)
- Thesis: Improving Anesthesia Simulators via Reinforcement Learning
044 陳盈圖 (09/2023–07/2025, BS@ NYCU-CS, now in Google)
- Thesis: A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
045 陳騰睿 (09/2023–07/2025, BS@ NCCU-Applied Math)
- Thesis: Plan2Align: Predictive Planning Based Test-Time Preference Alignment in Paragraph-Level Machine Translation
046 陳子安 (09/2023–10/2025, BS@ NYCU-CS)
- Thesis: Plan2Cleanse: Test-Time Backdoor Detection and Mitigation via Monte-Carlo Planning in Deep Reinforcement Learning
051 朱立民 (09/2023–12/2025, BS@ NSYSU-Applied Math)
- Thesis: Semi-Supervised Cross-Domain Imitation Learning