profile photo

Selected Work

Automated Weak-to-Strong Researcher
Jiaxin Wen*, Liang Qiu*, Joe Benton, Jan Hendrik Kirchner, Jan Leike
Anthropic blog post 2026
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen, Chenglei Si, Chen Yueh-han, He He, Shi Feng
NeurIPS 2025
Unsupervised Elicitation of Language Models
Jiaxin Wen, Zachary Ankner, Arushi Somani, Peter Hase, Samuel Marks, Jacob Goldman-Wetzler, Linda Petrini, Henry Sleight, Collin Burns, He He, Shi Feng, Ethan Perez, Jan Leike
Anthropic blog post 2025
Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Bowman, He He, Shi Feng
ICLR 2025
Learning Task Decomposition to Assist Humans in Competitive Programming
Jiaxin Wen, Ruiqi Zhong, Pei Ke, Zhihong Shao, Hongning Wang, Minlie Huang
ACL 2024