Jiaxin Wen

I'm a CS PhD student at UC Berkeley. I'm also a part-time research scientist at Anthropic.

Before that, I was a visiting scholar at NYU. I finished my undergrad and Master at Tsinghua University.

Email  /  Google Scholar  /  Github  /  Twitter

profile photo

Research Overview

My past work focused on building AI to tackle hard tasks beyond human reach, addressing four key questions:

I'm now exploring some butterfly ideas.

Selected Papers

Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Bowman, He He, Shi Feng
ICLR 2025
[paper]
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen, Chenglei Si, Chen Yueh-han, He He, Shi Feng
preprint 2025
[paper]
Unsupervised Elicitation of Language Models
Jiaxin Wen, Zachary Ankner, Arushi Somani, Peter Hase, Samuel Marks, Jacob Goldman-Wetzler, Linda Petrini, Henry Sleight, Collin Burns, He He, Shi Feng, Ethan Perez, Jan Leike
preprint 2025
[paper]
Learning Task Decomposition to Assist Humans in Competitive Programming
Jiaxin Wen, Ruiqi Zhong, Pei Ke, Zhihong Shao, Hongning Wang, Minlie Huang
ACL 2024
[paper] [poster]

Others

Measuring Harmfulness of Computer-Using Agents
Aaron Xuxiang Tian, Ruofan Zhang, Janet Tang, Jiaxin Wen
preprint 2025
[paper]
Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning
Jiaxin Wen*, Jian Guan*, Hongning Wang, Wei Wu, Minlie Huang
ICLR 2025
[paper]
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Jiaxin Wen*, Vivek Hebbar*, Caleb Larson*, Aryan Bhatt, Ansh Radhakrishnan, Mrinank Sharma, Henry Sleight, Shi Feng, He He, Ethan Perez, Buck Shlegeris, Akbir Khan
ICLR 2025
[paper]
AdaptiveBackdoor: Backdoored Language Model Agents that Detect Human Overseers
Heng Wang, Ruiqi Zhong, Jiaxin Wen, Jacob Steinhardt
ICML 2024 Next Generation of AI Safety Workshop
[paper]
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Changfei Li, Jinfeng Bai, Minlie Huang
EMNLP 2023
[paper] [code]
ETHICIST: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Zhexin Zhang, Jiaxin Wen, Minlie Huang
ACL2023
[paper] [code]
Re3Dial: Retrieve, Reorganize and Rescale Dialogue Corpus for Long-Turn Open-Domain Dialogue Pre-training
Jiaxin Wen, Hao Zhou, Jian Guan, Minlie Huang
EMNLP 2023
[paper] [code]
AUGESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation
Chujie Zheng, Sahand Sabour, Jiaxin Wen, Zheng Zhang, Minlie Huang
ACL2023 findings
[paper] [code] [dataset]
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training
Yuxian Gu*, Jiaxin Wen*, Hao Sun*, Yi Song, Pei Ke, Chujie Zheng, Zheng Zhang, Jianzhu Yao, Xiaoyan Zhu, Jie Tang, Minlie Huang
Machine Intelligence Research
[paper] [code] [poster]
AutoCAD: Automatically Generate Counterfactuals for Mitigating Shortcut Learning
Jiaxin Wen, Yeshuang Zhu, Jinchao Zhang, Jie Zhou, Minlie Huang
EMNLP2022 findings
[paper] [code]
Persona-Guided Planning for Controlling the Protagonist’s Persona in Story Generation
Jiaxin Wen*, Zhexin Zhang*, Jian Guan, Minlie Huang
NAACL 2022
[paper] [code]
Robustness Testing of Language Understanding in Task-Oriented Dialog
Jiexi Liu*, Ryuichi Takanobu*, Jiaxin Wen, Dazhen Wan, Hongguang Li, Weiran Nie, Cheng Li, Wei Peng, Minlie Huang
ACL 2021
[paper] [code]

Selected Honors

  • Outstanding Graduate Thesis (Top-1), Tsinghua University   2025
  • Beijing Outstanding Graduate (Top-1), Tsinghua University   2025
  • Outstanding Undergraduate Thesis (Top-5), Tsinghua University   2022
  • Outstanding Graduate, Department of Computer Science and Technology, Tsinghua University   2022

  • Website design from Jon Barron.