Jiaxin Wen

I'm an incoming CS PhD student at [?]. I'm also a part-time research scientist at Anthropic, working with Ethan Perez and Jan Leike.

Before that, I was a visiting scholar at NYU, advised by He He. I finished my undergrad and Master at Tsinghua University, advised by Minlie Huang.

Email  /  CV  /  Google Scholar  /  Github  /  Twitter

profile photo

Research Overview

I want to build AI to tackle challenging tasks beyond human reach. Recently, I'm thinking about these questions:

At my early career, I worked on improving robustness, long-context modeling and planning. I also (co-)led the development of multiple pre-trained LMs (EVA, OPD), and demos (Empathetic chatbot, Role-play chatbot, and ChatGLM3 Code Interpreter), which got millions of online queries.

Selected Papers

Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Bowman, He He, Shi Feng
ICLR2025
[paper]
Learning Task Decomposition to Assist Humans in Competitive Programming
Jiaxin Wen, Ruiqi Zhong, Pei Ke, Zhihong Shao, Hongning Wang, Minlie Huang
ACL 2024
[paper] [poster]
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Changfei Li, Jinfeng Bai, Minlie Huang
EMNLP 2023
[paper] [code]

Others

Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning
Jiaxin Wen*, Jian Guan*, Hongning Wang, Wei Wu, Minlie Huang
ICLR2025
[paper]
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Jiaxin Wen*, Vivek Hebbar*, Caleb Larson*, Aryan Bhatt, Ansh Radhakrishnan, Mrinank Sharma, Henry Sleight, Shi Feng, He He, Ethan Perez, Buck Shlegeris, Akbir Khan
ICLR2025
[paper]
AdaptiveBackdoor: Backdoored Language Model Agents that Detect Human Overseers
Heng Wang, Ruiqi Zhong, Jiaxin Wen, Jacob Steinhardt
ICML 2024 Next Generation of AI Safety Workshop
[paper]
ETHICIST: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Zhexin Zhang, Jiaxin Wen, Minlie Huang
ACL2023
[paper] [code]
Re3Dial: Retrieve, Reorganize and Rescale Dialogue Corpus for Long-Turn Open-Domain Dialogue Pre-training
Jiaxin Wen, Hao Zhou, Jian Guan, Minlie Huang
EMNLP 2023
[paper] [code]
AUGESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation
Chujie Zheng, Sahand Sabour, Jiaxin Wen, Zheng Zhang, Minlie Huang
ACL2023 findings
[paper] [code] [dataset]
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training
Yuxian Gu*, Jiaxin Wen*, Hao Sun*, Yi Song, Pei Ke, Chujie Zheng, Zheng Zhang, Jianzhu Yao, Xiaoyan Zhu, Jie Tang, Minlie Huang
Machine Intelligence Research
[paper] [code] [poster]
AutoCAD: Automatically Generate Counterfactuals for Mitigating Shortcut Learning
Jiaxin Wen, Yeshuang Zhu, Jinchao Zhang, Jie Zhou, Minlie Huang
EMNLP2022 findings
[paper] [code]
Persona-Guided Planning for Controlling the Protagonist’s Persona in Story Generation
Jiaxin Wen*, Zhexin Zhang*, Jian Guan, Minlie Huang
NAACL 2022
[paper] [code]
Robustness Testing of Language Understanding in Task-Oriented Dialog
Jiexi Liu*, Ryuichi Takanobu*, Jiaxin Wen, Dazhen Wan, Hongguang Li, Weiran Nie, Cheng Li, Wei Peng, Minlie Huang
ACL 2021
[paper] [code]

Honors & Awards

  • Comprehensive Scholarship (Top-2), Tsinghua University   2024
  • Research Excellence Scholarship, Tsinghua University   2023
  • Outstanding Undergraduate Thesis (Top-5), Tsinghua University   2022
  • Outstanding Graduate, Department of Computer Science and Technology, Tsinghua University   2022
  • Miscellaneous

    I have passions for a wide variety of fields, and I am constantly exploring new areas and possibilities. Some of my major interests include
    • Sports: I enjoy body building CrossFit recently. I'm aiming to run my first (half-)marathon this year, although it's been a month since my last 30KM running practice because it took me a week to recover :(. I'm also the member of Tsinghua hiking club and rugby team.
    • Literature: I have always been a reader in literature. I served as the teaching assistant for the course "Writing and Communication" in 2021. My favorite authors are Hermann Karl Hesse, Albert Camus, and Jerome David Salinger.

    Website design from Jon Barron.