Jiaxin Wen
I'm a CS PhD student at UC Berkeley. I'm also a part-time research scientist at Anthropic.
Before that, I was a visiting scholar at NYU. I finished my undergrad and Master at Tsinghua University.
Email /
Google Scholar /
Github /
Twitter
|
|
Research Overview
My past work focused on building AI to tackle hard tasks beyond human reach, addressing four key questions:
I'm now exploring some butterfly ideas.
|
Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen,
Ruiqi Zhong,
Akbir Khan,
Ethan Perez,
Jacob Steinhardt,
Minlie Huang,
Samuel R. Bowman,
He He,
Shi Feng
ICLR 2025
[paper]
|
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen,
Chenglei Si,
Chen Yueh-han,
He He,
Shi Feng
preprint 2025
[paper]
|
Unsupervised Elicitation of Language Models
Jiaxin Wen,
Zachary Ankner,
Arushi Somani,
Peter Hase,
Samuel Marks,
Jacob Goldman-Wetzler,
Linda Petrini,
Henry Sleight,
Collin Burns,
He He,
Shi Feng,
Ethan Perez,
Jan Leike
preprint 2025
[paper]
|
Learning Task Decomposition to Assist Humans in Competitive Programming
Jiaxin Wen,
Ruiqi Zhong,
Pei Ke,
Zhihong Shao,
Hongning Wang,
Minlie Huang
ACL 2024
[paper]
[poster]
|
Measuring Harmfulness of Computer-Using Agents
Aaron Xuxiang Tian, Ruofan Zhang, Janet Tang, Jiaxin Wen
preprint 2025
[paper]
|
Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning
Jiaxin Wen*,
Jian Guan*,
Hongning Wang,
Wei Wu,
Minlie Huang
ICLR 2025
[paper]
|
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Jiaxin Wen*,
Vivek Hebbar*, Caleb Larson*, Aryan Bhatt, Ansh Radhakrishnan, Mrinank Sharma, Henry Sleight, Shi Feng, He He, Ethan Perez, Buck Shlegeris, Akbir Khan
ICLR 2025
[paper]
|
AdaptiveBackdoor: Backdoored Language Model Agents that Detect Human Overseers
Heng Wang,
Ruiqi Zhong,
Jiaxin Wen,
Jacob Steinhardt
ICML 2024 Next Generation of AI Safety Workshop
[paper]
|
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen,
Pei Ke,
Hao Sun,
Zhexin Zhang,
Changfei Li,
Jinfeng Bai,
Minlie Huang
EMNLP 2023
[paper]
[code]
|
ETHICIST: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Zhexin Zhang,
Jiaxin Wen,
Minlie Huang
ACL2023
[paper]
[code]
|
Re3Dial: Retrieve, Reorganize and Rescale Dialogue Corpus for Long-Turn Open-Domain Dialogue Pre-training
Jiaxin Wen,
Hao Zhou,
Jian Guan,
Minlie Huang
EMNLP 2023
[paper]
[code]
|
AUGESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation
Chujie Zheng,
Sahand Sabour,
Jiaxin Wen,
Zheng Zhang,
Minlie Huang
ACL2023 findings
[paper]
[code]
[dataset]
|
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training
Yuxian Gu*,
Jiaxin Wen*,
Hao Sun*,
Yi Song, Pei Ke, Chujie Zheng, Zheng Zhang, Jianzhu Yao, Xiaoyan Zhu, Jie Tang, Minlie Huang
Machine Intelligence Research
[paper]
[code]
[poster]
|
AutoCAD: Automatically Generate Counterfactuals for Mitigating Shortcut Learning
Jiaxin Wen,
Yeshuang Zhu,
Jinchao Zhang,
Jie Zhou,
Minlie Huang
EMNLP2022 findings
[paper]
[code]
|
Persona-Guided Planning for Controlling the Protagonist’s Persona in Story Generation
Jiaxin Wen*,
Zhexin Zhang*,
Jian Guan,
Minlie Huang
NAACL 2022
[paper]
[code]
|
Robustness Testing of Language Understanding in Task-Oriented Dialog
Jiexi Liu*,
Ryuichi Takanobu*,
Jiaxin Wen,
Dazhen Wan, Hongguang Li, Weiran Nie, Cheng Li, Wei Peng, Minlie Huang
ACL 2021
[paper]
[code]
|
Outstanding Graduate Thesis (Top-1), Tsinghua University   2025
Beijing Outstanding Graduate (Top-1), Tsinghua University   2025
Outstanding Undergraduate Thesis (Top-5), Tsinghua University   2022
Outstanding Graduate, Department of Computer Science and Technology, Tsinghua University   2022
|
|