Tengyang Xie

I am an Assistant Professor of Computer Science at University of Wisconsin-Madison. Before that, I was a Postdoctoral Researcher at Microsoft Research New England (and New York City). I received my Ph.D. in Computer Science at University of Illinois at Urbana-Champaign, where I was fortunate to work with Nan Jiang. I obtained bachelor's degree in Physics from University of Science and Technology of China. I have also spent time at Simons Institute, Amazon AI, Microsoft Research, and Google Research.

Research Interests: I work on Reinforcement Learning / Machine Learning / Artificial Intelligence. The primary goal of my research is to explore the mathematical principles and design efficient algorithms relevant to artificial general intelligence (AGI). My current interests include: 1) emerging interactive learning paradigms with/for large language models (LLMs), 2) the mathematical principles of reinforcement learning (RL) and decision-making, 3) the algorithm and system challenges of scaling up new modalities.

Publications

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits. [PDF, arXiv]
Fan Chen, Zeyu Jia, Alexander Rakhlin, Tengyang Xie

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning. [PDF, arXiv]
Yurun Yuan, Fan Chen, Zeyu Jia, Alexander Rakhlin, Tengyang Xie

Reinforce LLM Reasoning with Multi-Agent Reflection. [PDF, arXiv]
Yurun Yuan, Tengyang Xie
International Conference on Machine Learning (ICML) 2025

Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective. [PDF, arXiv]
Zeyu Jia, Alexander Rakhlin, Tengyang Xie
International Conference on Machine Learning (ICML) 2025

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization. [PDF, arXiv]
Audrey Huang, Wenhao Zhan, Tengyang Xie, Jason D. Lee, Wen Sun, Akshay Krishnamurthy, Dylan J. Foster
International Conference on Learning Representations (ICLR) 2025 (Spotlight, top 5.1%)

Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees. [PDF]
Nan Jiang, Tengyang Xie
(STS, invited submission under review)

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts. [PDF, arXiv, Model, Blog]
Haoxiang Wang*, Wei Xiong*, Tengyang Xie, Han Zhao, Tong Zhang
Conference on Empirical Methods in Natural Language Processing, (EMNLP) 2024, Findings

Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models. [PDF, arXiv]
Xiang Ji, Sanjeev Kulkarni, Mengdi Wang, Tengyang Xie

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF. [PDF, arXiv, XPO Trainer in TRL]
Tengyang Xie*, Dylan J. Foster*, Akshay Krishnamurthy, Corby Rosset, Ahmed Awadallah, Alexander Rakhlin
International Conference on Learning Representations (ICLR) 2025

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences. [PDF, arXiv]
Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Awadallah, Tengyang Xie
Technical Report 2024

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data. [PDF, arXiv, Website]
Fahim Tajwar*, Anikait Singh*, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar
International Conference on Machine Learning (ICML) 2024

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples . [PDF, arXiv, Website]
Jianrui Zhang*, Mu Cai*, Tengyang Xie, Yong Jae Lee
Annual Meeting of the Association for Computational Linguistics (ACL) 2024, Findings

Harnessing Density Ratios for Online Reinforcement Learning. [PDF, arXiv]
Philip Amortila*, Dylan J. Foster*, Nan Jiang*, Ayush Sekhari*, Tengyang Xie*
International Conference on Learning Representations (ICLR) 2024 (Spotlight, top 5%)

Towards Principled Representation Learning from Videos for Reinforcement Learning. [PDF, arXiv]
Dipendra Misra*, Akanksha Saran*, Tengyang Xie, Alex Lamb, John Langford
International Conference on Learning Representations (ICLR) 2024 (Spotlight, top 5%)

Adversarial Model for Offline Reinforcement Learning. [PDF, arXiv]
Mohak Bhardwaj*, Tengyang Xie*, Byron Boots, Nan Jiang, Ching-An Cheng
Conference on Neural Information Processing Systems (NeurIPS) 2023

The Role of Coverage in Online Reinforcement Learning. [PDF, arXiv]
Tengyang Xie*, Dylan J. Foster*, Yu Bai, Nan Jiang, Sham M. Kakade
International Conference on Learning Representations (ICLR) 2023 (Notable-top-5% / Oral, top 1.8%)

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data. [PDF, arXiv]
Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng
Offline RL Workshop at NeurIPS 2022

Interaction-Grounded Learning with Action-Inclusive Feedback. [PDF, arXiv]
Tengyang Xie*, Akanksha Saran*, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford
Conference on Neural Information Processing Systems (NeurIPS) 2022
Complex Feedback in Online Learning Workshop at ICML 2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning. [PDF, arXiv, code, MSR blog]
Ching-An Cheng*, Tengyang Xie*, Nan Jiang, Alekh Agarwal
International Conference on Machine Learning (ICML) 2022 (Outstanding Paper Runner-up Award, top 0.3%)

Bellman-consistent Pessimism for Offline Reinforcement Learning. [PDF, arXiv, slides]
Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal
Conference on Neural Information Processing Systems (NeurIPS) 2021 (Oral Presentation, top 0.6%)

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning. [PDF, arXiv]
Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai
Conference on Neural Information Processing Systems (NeurIPS) 2021

Interaction-Grounded Learning. [PDF, arXiv, add'l supplement]
Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad
International Conference on Machine Learning (ICML) 2021

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency. [PDF, arXiv]
Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie
Submitted, 2021.

A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting. [PDF, arXiv]
Philip Amortila*, Nan Jiang*, Tengyang Xie*

Batch Value-function Approximation with Only Realizability. [PDF, arXiv]
Tengyang Xie, Nan Jiang
International Conference on Machine Learning (ICML) 2021

Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison. [PDF, arXiv]
Tengyang Xie, Nan Jiang
Conference on Uncertainty in Artificial Intelligence (UAI) 2020

Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling. [PDF, Poster, arXiv]
Tengyang Xie, Yifei Ma, Yu-Xiang Wang
Conference on Neural Information Processing Systems (NeurIPS) 2019
Spotlight presentation at the NeurIPS 2018 Workshop on Causal Learning.

Provably Efficient Q-Learning with Low Switching Cost. [PDF, Poster, arXiv]
Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang
Conference on Neural Information Processing Systems (NeurIPS) 2019

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization. [PDF, Poster, Link]
Tengyang Xie*, Bo Liu*, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu, Daesub Yoon
Conference on Neural Information Processing Systems (NeurIPS) 2018

(* indicates equal contribution or alphabetic ordering.)

Teaching

CS760 Machine Learning: Fall 2024, UW-Madison.

Service

Area Chair: NeurIPS, ACL ARR, RLC, ICML, ICLR
Conference Reviewer/Program Committee: ICML Workshop Proposals, NeurIPS, ICML, AISTATS, AAAI, EWRL
Journal Reviewer: Journal of the American Statistical Association (JASA), Journal of Machine Learning Research (JMLR), IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Annals of Statistics, IEEE Transactions on Information Theory, Springer Machine Learning Journal.
Workshop Organizer: Interactive Learning with Implicit Human Feedback @ ICML 2023.
Workshop Program Committee: Optimization Foundations of RL @ NeurIPS 2019, Theoretical Foundations of RL @ ICML 2020 & 2021, Offline RL @ NeurIPS 2020-2022, RL for Real Life @ ICML 2021 & NeurIPS 2022.