Ziqi Wang

Ziqi Wang (王子奇)

I am a Ph.D. student at the University of Illinois Urbana-Champaign, advised by Prof. Heng Ji and Prof. Tong Zhang.

I am currently a part-time intern at Yutori. I was an intern at Meta GenAI in 2024 Summer, working with Rui Wang. I also spent two summers at Google working with Dr. Crick Wu and Dr. Le Hou. Prior to my Ph.D. study, I obtained a Bachelor's Degree in Computer Science at Tsinghua University, where I was fortunate to work with Prof. Zhiyuan Liu, Prof. Xiaolin Hu, Prof. Minlie Huang, and Prof. Xiang Ren at the University of Southern California.

Email / Résumé / Google Scholar / Twitter / Linkedin

Research

My research goal is to empower AI with strong reasoning capabilities to help human solve real-world problems reliably, and eventually boost science discovery process. My current milestone is to investigate:

(1) the intrinsic bottleneck of AI reasoning, from both architecture and algorithm perspectives. [ICLR 2025]

(2) Improving AI reasoning through post-training, especially from the aspect of reinforcement learning. [ICLR 2024] [Preprint 2025]

Previously, I worked on representation learning and neuro-symbolic learning during my undergraduate study.

Highlighted Recent Publications (See full list on
Google Scholar
)

* denotes equal contribution

	RM-R1: Reward Modeling as Reasoning Xiusi Chen, Gaotang Li, Ziqi Wang, And other 9 authors. Preprint*, 2025 Paper Reward model with thinking improves the rewards accuracy.
	Eliminating Position Bias of Language Models: A Mechanistic Approach Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji ICLR, 2025 Paper / Twitter We propose a method to eliminate the position bias in LMs, which help LMs to better conduct reasoning.
	Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang ICML, 2024; ICLR ME-FoMo, 2024 (Oral Presentation)* Paper / Twitter On-Policy matters for Direct Policy Optimization!
	Enabling Language Models to Implicitly Learn Self-Improvement Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu, Heng Ji ICLR, 2024 Paper / Slides / Twitter Teaching models self-improvement with reinforcement learning.

Education

	University of Illinois Urbana-Champaign Ph.D. in Computer Science, 2021-2025 Advisor: Prof. Heng Ji and Prof. Tong Zhang
	Tsinghua University B.E. in Computer Science, 2016-2021 Advisor: Prof. Zhiyuan Liu, Prof. Xiaolin Hu, and Prof. Minlie Huang

Service

Reviewer: ICLR, NeurIPS (Top reviewer in 2024), ICML, ACL, EMNLP, NAACL, Pattern Recognition

Talks

Teaching LMs to Self-Improve by Reinforcement Learning. Cohere AI, 2024. [Slides][Video]
Enabling Language Models to Implicitly Learn Self-Improvement. Objective, Inc., 2023. [Slides]

Miscellanea

I am interested in Physics and Astronomy in my spare time (I was a Physics student before trained as a Computer Science student). I like Repairment and DIY since they help me understand the bottom mechanism. I am a big fan of John Carmack. I also learn a lot of life advice from Elon Musk.

The website is adapted from Jon Barron. Last update: May, 2025.