Reinforcement Learning and Bandits Lab

 

PhD Students

  • 001 洪偉

    • Dissertation: Action-Constrained Reinforcement Learning

  • 002 洪鈺恆

    • Dissertation: Learning to Optimize: Bandit Algorithms and Reinforcement Learning for Black-box Decision Problems

  • 047 陳明宏 (MS: 09/2023–08/2025, PhD: 09/2025–present)

  • 048 王廣達 (MS: 09/2022–08/2024, PhD: 09/2024–present; Co-advised by Prof. Wen-Chih Peng)

MS Students

  • 066 陳奎元 (BS@ NYCU-CS)

    • Topic: Zero-Shot RL

  • 067 張家祐 (BS@ NTHU-Science)

    • Topic: Policy Gradient

  • 068 柯國勛 (BS@ NTHU-CS)

    • Topic: Test-Time Alignment

  • 069 張立宜 (BS@ NCHU-EECS)

    • Topic: Model-Predictive Control

  • 070 蔡皓宇 (BS@ NTHU-Math)

    • Topic: Zero-Shot RL

  • 049 吳秉澍 (BS@ NYCU-CS; Undergraduate Member: 02/2023–08/2024, MS: 09/2024–present)

    • Topic: RL for LLMs

  • 050 林睿騰 (BS@ NYCU-CS; Undergraduate Member: 02/2023–08/2024, MS: 09/2024–present)

    • Topic: Restless Bandits

  • 057 林禹亨 (BS@ NYCU-CS)

    • Topic: Sequence Modeling for RL

  • 058 皮恩亞 (BS@ NCU-Math)

    • Topic: Cross-Domain RL

  • 038 温柏萱 (BS@ NYCU-CS; Undergraduate Member: 09/2022–08/2024, MS: 09/2024–present)

    • Topic: RLHF

  • 043 楊竣傑 (BS@ NYCU-CS; Undergraduate Member: 08/2023–08/2024, MS: 09/2024–present)

    • Topic: Multi-Objective RL

  • 051 朱立民 (BS@ NSYSU-Applied Math)

    • Topic: Cross-Domain Imitation Learning

Research Assistants

  • 056 林楷傑 (BS@ NYCU-CS; Undergraduate Member: 08/2023–07/2025, RA: 08/2025–present)

  • 075 馬楷翔 (BS@ NYCU-CS; RA: 08/2025–present)

Undergraduate Students

Joining in 2020

  • 012 鍾承佑 (now MS student in CMU)、013 柯秉志、014 徐煜倫

  • 015 王耀德、016 蔡育呈、017 周俊毅

  • 018 張祐銘、019 張祐閤

Joining in 2021

  • 023 鄒翔傑 (now MS student in UCSD)、024 黃迺絜 (now PhD student in CMU)

  • 025 洪婕庭、026 陳筱霓 (now in Google)

  • 027 許承壹、028 林浩君

Joining in 2022

  • 037 吳文心、038 温柏萱

  • 039 孟祥蓉、040 廖兆琪

  • 041 沈克軒 (now MS student in TAMU)、042 陳秉劼

  • 043 楊竣傑

Joining in 2023

  • 049 吳秉澍、050 林睿騰

  • 052 王振倫、053 楊沁瑜、054 施柏江

  • 055 林佑家

  • 056 林楷傑

Joining in 2024

  • 059 徐嘉亨

  • 060 黃佑得

Joining in 2025

  • 061 周世恩、062 陳帛祥

  • 065 廖漢軒

  • 071 劉家琪

  • 072 余逸翔

  • 073 江品寬

  • 074 戴翊宸

PhD Alumni

  • 029 連云暄 (now Assistant Research Fellow at CITI, Acedemia Sinica 中研院資創中心助研究員; Co-advised by Prof. Yu-Shuen Wang)

    • Dissertation: Bridging the Gap in Reinforcement Learning: Robust Algorithms and Techniques for Practical Applications

  • 030 何國豪 (now at MediaTek 聯發科技; Co-advised by Prof. I-Chen Wu)

    • Dissertation: A Study of Deep Reinforcement Learning for Human-Like Behavior Modeling and Scheduling Optimization

RA Alumni

  • 063 許嘉喻 (now PhD student at University of Michigan; RA: 08/2024–06/2025)

  • 064 秦紫頤 (now PhD student at Max Planck Institute; RA: 03/2025–07/2025)

MS Alumni

  • 003 林峻立 (09/2019–07/2021, BS@NCU-CS, now in MediaTek)

    • Thesis: Frank-Wolfe Policy Optimization for Reinforcement Learning with Action Constraints

  • 004 謝秉瑾 (09/2019–07/2021, BS@NCCU-CS, now in Inventec)

    • Thesis: Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization

  • 005 李昕 (09/2019–03/2022, BS@NCKU-CS, now in PHISON)

    • Thesis: Accelerating Gaussian Process Regression via Meta-Learned Neural Processes: A Utility-Based MAML Approach

  • 006 蘇信恩 (02/2021–09/2022, BS@NCTU-CS, BS+MS in 5 years, now in Google)

    • Thesis: Coordinate Ascent Policy Optimization

  • 007 黃柏愷 (09/2020–09/2022, BS@NCKU-CS, now in Trend Micro Inc.)

    • Thesis: Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

  • 008 郭俊廷 (09/2020–10/2022, BS@NCTU-EE, now in MediaTek)

    • Thesis: Adaptive-UCB for Online Restless Bandits

  • 009 黃睿宇 (09/2020–10/2022, BS@NCTU-IEM, now in MediaTek)

    • Thesis: Multi-Objective Time-Varying Bayesian Optimization

  • 010 楊上萱 (09/2020–11/2022, BS@NCTU-CS, now in MediaTek)

    • Thesis: Variance-Reduced Frank-Wolfe Policy Optimization for Action-Constrained RL

  • 011 歐陽良雋 (09/2020–12/2022, BS@ NCTU-EE, now in MediaTek)

    • Thesis: Robustifying Proximal Policy Optimization Against Noisy Critics

  • 020 楊祐維 (09/2021–10/2023, BS@ NCTU-CS, now in MediaTek)

    • Thesis: Model Selection for Offline Model-Based RL via Bayesian Optimization

  • 021 吳程畯 (09/2021–12/2023, BS@ NCHU-Applied Math)

    • Thesis: Learning From Expert Demonstrations With Incomplete Observations

  • 022 朱文滔 (09/2021–03/2024, BS@ NCU-Math)

    • Thesis: Provably Convergent Mixture-of-Experts in Reinforcement Learning

  • 031 陳彥儒 (09/2022–06/2024, BS@ NCTU-Applied Math, now in Synopsys)

    • Thesis: Accelerated Policy Gradient: On the Convergence Rates of Nesterov's Momentum for Reinforcement Learning

  • 032 張千祐 (09/2022–06/2024, BS@ NCTU-CS)

    • Thesis: Learning From Demonstrative Experts of Diverse Preference

  • 033 潘冠蓁 (09/2022–07/2024, BS@ NCTU-CS, now in TSMC)

    • Thesis: Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations

  • 034 黃亭晅 (09/2022–07/2024, BS@ NCTU-IMF)

    • Thesis: Cross-Domain Knowledge Transfer via Preference Consistency

  • 035 葉佳翰 (09/2022–01/2025, BS@ NCTU-CS, now in TronFuture)

    • Thesis: Action-Constrained Imitation Learning

  • 036 詹昀銘 (09/2022–05/2025, BS@ NCTU-CS)

    • Thesis: Improving Anesthesia Simulators via Reinforcement Learning

  • 044 陳盈圖 (09/2023–07/2025, BS@ NYCU-CS, now in Google)

    • Thesis: A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

  • 045 陳騰睿 (09/2023–07/2025, BS@ NCCU-Applied Math)

    • Thesis: Plan2Align: Predictive Planning Based Test-Time Preference Alignment in Paragraph-Level Machine Translation

  • 046 陳子安 (09/2023–10/2025, BS@ NYCU-CS)

    • Thesis: Plan2Cleanse: Test-Time Backdoor Detection and Mitigation via Monte-Carlo Planning in Deep Reinforcement Learning