Jiaxin Wen

I'm a second-year Master's student at Tsinghua University. At Tsinghua, I'm working with Minlie Huang and Hongning Wang.

I also work with and draw inspiration from a number of responsible AI researchers dedicated to developing human-centered artificial intelligence, such as Ruiqi Zhong, He He, Shi Feng, Ethan Perez, and Akbir Khan.

I will graduate in 2025 and am seeking PhD or research scientist positions in superalignment. Please reach out if you think I would be a good fit!

Email  /  CV  /  Google Scholar  /  Github  /  Twitter

profile photo

Research Overview

I have a broad interest in AI safety and alignment.

  • Identify emergent risks of super models (EMNLP2023, ACL2023, ICML 2024 workshop): Are there any safety risks that can arise in LLMs and evade modern oversight mechanisms?
  • Provide accurate and scalable supervision of super models
    • Robustness of Reward Model (EMNLP2022 findings): While using reward model as a proxy for human value, it may rely on spurious features and then lead to degenerate policy model. How can we automatically identify shortcuts and mitigate shortcut learning?
    • Synthetic Data (EMNLP2023): Can we directly create high-quality synthetic data to supervise models without human involvement. In particular, I'm interested in synthesizing data by grounding on physical world or symbolic engines, instead of distilling from LLMs.
    • Scalable Oversight (ACL2024): Can we leverage AI systems to assist humans in effectively and efficiently supervising models on tasks where humans alone struggle to supervise, such as generating complex mathematical proofs, complex programs, and frontier scentific research.
Moreover, I'm also intersted in building large-scale pre-trained foundation models.
  • Dialogue Model: I've led a series of popular large-scale pre-training dialogue models, including open-domain chit-chat (EVA, OPD), empathetic dialogue (Emohaa), and role-play dialogue (AI-Topia).
  • Tool-augmented Model: I've co-led the development of ChatGLM3 Code Interpreter, particularly in mathematical reasoning.

Selected Papers

Learning Task Decomposition to Assist Humans in Competitive Programming
Jiaxin Wen, Ruiqi Zhong, Pei Ke, Zhihong Shao, Hongning Wang, Minlie Huang
ACL 2024
[paper] [poster]
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Changfei Li, Jinfeng Bai, Minlie Huang
EMNLP 2023
[paper] [code]

Others

AdaptiveBackdoor: Backdoored Language Model Agents that Detect Human Overseers
Heng Wang, Ruiqi Zhong, Jiaxin Wen Jacob Steinhardt
ICML 2024 Next Generation of AI Safety Workshop
ETHICIST: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Zhexin Zhang, Jiaxin Wen, Minlie Huang
ACL2023
[paper] [code]
Re3Dial: Retrieve, Reorganize and Rescale Dialogue Corpus for Long-Turn Open-Domain Dialogue Pre-training
Jiaxin Wen, Hao Zhou, Jian Guan, Minlie Huang
EMNLP 2023
[paper] [code]
AUGESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation
Chujie Zheng, Sahand Sabour, Jiaxin Wen, Zheng Zhang, Minlie Huang
ACL2023 findings
[paper] [code] [dataset]
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training
Yuxian Gu*, Jiaxin Wen*, Hao Sun*, Yi Song, Pei Ke, Chujie Zheng, Zheng Zhang, Jianzhu Yao, Xiaoyan Zhu, Jie Tang, Minlie Huang
Machine Intelligence Research
[paper] [code] [poster]
AutoCAD: Automatically Generate Counterfactuals for Mitigating Shortcut Learning
Jiaxin Wen, Yeshuang Zhu, Jinchao Zhang, Jie Zhou, Minlie Huang
EMNLP2022 findings
[paper] [code]
Persona-Guided Planning for Controlling the Protagonist’s Persona in Story Generation
Jiaxin Wen*, Zhexin Zhang*, Jian Guan, Minlie Huang
NAACL 2022
[paper] [code]
Robustness Testing of Language Understanding in Task-Oriented Dialog
Jiexi Liu*, Ryuichi Takanobu*, Jiaxin Wen, Dazhen Wan, Hongguang Li, Weiran Nie, Cheng Li, Wei Peng, Minlie Huang
ACL 2021
[paper] [code]

Service

  • 2024: ACL (Safety, LLM for Programming, Dialogue), COLM
  • 2023: EMNLP (Dialogue, Safety)
  • 2023: ACL (Large-scale Pre-training)
  • 2022: EMNLP (Dialogue and Interactive Systems)
  • Experiences

  • Mar. 2024 - Jun. 2024. Research Intern, LM Reasoning Team, Ant Research Group
  • Jun. 2023 - Nov. 2023. Research Intern, Foundation Model Team, Zhipu AI.
  • Jun. 2021 - Dec. 2021. Research Intern, WeChat AI Team, Tencent.
  • Jun. 2020 - Oct. 2020. Algorithm Intern, WeChat AI Team, Tencent.
  • Awards

  • Global AI Innovation Contest (6nd out of 5000)   2023
  • Outstanding Undergraduate Thesis, Tsinghua University (Top-5 score in the DCST)   2022
  • Outstanding Graduate, Department of Computer Science and Technology, Tsinghua University   2022
  • Global AI Innovation Contest Runner-up (2nd out of 5000)   2021
  • Tsinghua Academic Excellence Award   2020
  • Tsinghua Volunteer Excellence Award   2019-2020
  • Tsinghua Philobiblion Award   2019
  • Miscellaneous

    I have passions for a wide variety of fields, and I am constantly exploring new areas and possibilities. Some of my major interests include
    • Sports: I enjoy body building CrossFit recently. I'm aiming to run my first (half-)marathon this year, although it's been a month since my last 30KM running practice because it took me a week to recover :(. I'm also the member of Tsinghua hiking club and rugby team.
    • Literature: I have always been a reader in literature. I served as the teaching assistant for the course "Writing and Communication" in 2021. My favorite authors are Hermann Karl Hesse, Albert Camus, and Jerome David Salinger.

    Website design from Jon Barron.