Ziqi Wang (王子奇)

I am a Ph.D. student at the University of Illinois Urbana-Champaign, advised by Prof. Heng Ji and Prof. Tong Zhang.

I am currently a part-time intern at Yutori AI. I was an intern at Meta GenAI in 2024 Summer, working with Rui Wang. I also spent two summers at Google working with Dr. Crick Wu and Dr. Le Hou. Prior to my Ph.D. study, I obtained a Bachelor's Degree in Computer Science at Tsinghua University, where I was fortunate to work with Prof. Zhiyuan Liu, Prof. Xiaolin Hu, Prof. Minlie Huang, and Prof. Xiang Ren at the University of Southern California.

Email  /  Résumé  /  Google Scholar  /  Twitter  /  Linkedin

profile photo

Research

My research goal is to empower AI with strong reasoning capabilities to help human solve real-world problems reliably, and eventually boost science discovery process. My current milestone is to investigate:

(1) the intrinsic bottleneck of AI reasoning, from both architecture and algorithm perspectives. [ICLR 2025]

(2) Improving AI reasoning through post-training, especially from the aspect of reinforcement learning. [ICLR 2024]

Previously, I worked on representation learning and neuro-symbolic learning during my undergraduate study.

Highlighted Recent Publications (See full list on

Google Scholar

)

* denotes equal contribution

Eliminating Position Bias of Language Models: A Mechanistic Approach
Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji
ICLR, 2025  
Paper / Twitter

We propose a method to eliminate the position bias in LMs, which help LMs to better conduct reasoning.

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong*, Hanze Dong*, Chenlu Ye*, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang
ICML, 2024; ICLR ME-FoMo, 2024   (Oral Presentation)
Paper / Twitter

On-Policy matters for Direct Policy Optimization!

Enabling Language Models to Implicitly Learn Self-Improvement
Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu, Heng Ji
ICLR, 2024  
Paper / Slides / Twitter

Teaching models self-improvement with reinforcement learning.

Education

University of Illinois Urbana-Champaign
Ph.D. in Computer Science, 2021-2025

Advisor: Prof. Heng Ji and Prof. Tong Zhang
Tsinghua University
B.E. in Computer Science, 2016-2021

Advisor: Prof. Zhiyuan Liu, Prof. Xiaolin Hu, and Prof. Minlie Huang

Service


Reviewer: ICLR, NeurIPS (Top reviewer in 2024), ICML, ACL, EMNLP, NAACL, Pattern Recognition

Talks


Teaching LMs to Self-Improve by Reinforcement Learning. Cohere AI, 2024. [Slides][Video]
Enabling Language Models to Implicitly Learn Self-Improvement. Objective, Inc., 2023. [Slides]

Miscellanea


I am interested in Physics and Astronomy in my spare time (I was a Physics student before trained as a Computer Science student). I like Repairment and DIY since they help me understand the bottom mechanism. I am a big fan of John Carmack. I also learn a lot of life advice from Elon Musk.

The website is adapted from Jon Barron. Last update: Feb, 2025.