Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen,
Ruiqi Zhong,
Akbir Khan,
Ethan Perez,
Jacob Steinhardt,
Minlie Huang,
Samuel R. Bowman,
He He,
Shi Feng
ICLR2025
[paper]
|
Learning Task Decomposition to Assist Humans in Competitive Programming
Jiaxin Wen,
Ruiqi Zhong,
Pei Ke,
Zhihong Shao,
Hongning Wang,
Minlie Huang
ACL 2024
[paper]
[poster]
|
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen,
Pei Ke,
Hao Sun,
Zhexin Zhang,
Changfei Li,
Jinfeng Bai,
Minlie Huang
EMNLP 2023
[paper]
[code]
|
Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning
Jiaxin Wen*,
Jian Guan*,
Hongning Wang,
Wei Wu,
Minlie Huang
ICLR2025
[paper]
|
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Jiaxin Wen*,
Vivek Hebbar*, Caleb Larson*, Aryan Bhatt, Ansh Radhakrishnan, Mrinank Sharma, Henry Sleight, Shi Feng, He He, Ethan Perez, Buck Shlegeris, Akbir Khan
ICLR2025
[paper]
|
AdaptiveBackdoor: Backdoored Language Model Agents that Detect Human Overseers
Heng Wang,
Ruiqi Zhong,
Jiaxin Wen,
Jacob Steinhardt
ICML 2024 Next Generation of AI Safety Workshop
[paper]
|
ETHICIST: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Zhexin Zhang,
Jiaxin Wen,
Minlie Huang
ACL2023
[paper]
[code]
|
Re3Dial: Retrieve, Reorganize and Rescale Dialogue Corpus for Long-Turn Open-Domain Dialogue Pre-training
Jiaxin Wen,
Hao Zhou,
Jian Guan,
Minlie Huang
EMNLP 2023
[paper]
[code]
|
AUGESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation
Chujie Zheng,
Sahand Sabour,
Jiaxin Wen,
Zheng Zhang,
Minlie Huang
ACL2023 findings
[paper]
[code]
[dataset]
|
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training
Yuxian Gu*,
Jiaxin Wen*,
Hao Sun*,
Yi Song, Pei Ke, Chujie Zheng, Zheng Zhang, Jianzhu Yao, Xiaoyan Zhu, Jie Tang, Minlie Huang
Machine Intelligence Research
[paper]
[code]
[poster]
|
AutoCAD: Automatically Generate Counterfactuals for Mitigating Shortcut Learning
Jiaxin Wen,
Yeshuang Zhu,
Jinchao Zhang,
Jie Zhou,
Minlie Huang
EMNLP2022 findings
[paper]
[code]
|
Persona-Guided Planning for Controlling the Protagonist’s Persona in Story Generation
Jiaxin Wen*,
Zhexin Zhang*,
Jian Guan,
Minlie Huang
NAACL 2022
[paper]
[code]
|
Robustness Testing of Language Understanding in Task-Oriented Dialog
Jiexi Liu*,
Ryuichi Takanobu*,
Jiaxin Wen,
Dazhen Wan, Hongguang Li, Weiran Nie, Cheng Li, Wei Peng, Minlie Huang
ACL 2021
[paper]
[code]
|
Comprehensive Scholarship (Top-2), Tsinghua University   2024
Research Excellence Scholarship, Tsinghua University   2023
Outstanding Undergraduate Thesis (Top-5), Tsinghua University   2022
Outstanding Graduate, Department of Computer Science and Technology, Tsinghua University   2022
|
I have passions for a wide variety of fields, and I am constantly exploring new areas and possibilities. Some of my major interests include
- Sports: I enjoy
body building CrossFit recently. I'm aiming to run my first (half-)marathon this year, although it's been a month since my last 30KM running practice because it took me a week to recover :(. I'm also the member of Tsinghua hiking club and rugby team.
- Literature: I have always been a reader in literature. I served as the teaching assistant for the course "Writing and Communication" in 2021. My favorite authors are Hermann Karl Hesse, Albert Camus, and Jerome David Salinger.
|
|