Gabrielle Kaili-May Liu
Gabrielle Kaili-May Liu
Home
Publications
Teaching
CV
Light
Dark
Automatic
3
Auto-ARGUE: LLM-Based Report Generation Evaluation
Report generation (RG) is a RAG task that aims to produce a long-form, citation-attributed response to a complex user query. We present the first public, automated, LLM-based implementation of the ARGUE evaluation framework for RG.
William Gantt Walden
,
Marc Mason
,
Orion Weller
,
Laura Dietz
,
John Conroy
,
Neil Molino
,
Hannah Recknor
,
Bryan Li
,
Gabrielle Kaili-May Liu
,
Yu Hou
,
Dawn Lawrie
,
James Mayfield
,
Eugene Yang
2026
Submission to ECIR
PDF
Cite
MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs
For LLMs to be deployed reliably and responsibly, it is essential that their linguistically expressed confidence faithfully reflect their internal uncertainty. This paper presents the first study to systematically and comprehensively benchmark faithful calibration of LLMs and proposes MetaFaith, the first method to improve faithful calibration of any instruction-following LLM in a task-agnostic manner.
Gabrielle Kaili-May Liu
,
Gal Yona
,
Avi Caciularu
,
Idan Szpektor
,
Tim G. J. Rudner
,
Arman Cohan
2025
EMNLP Main
PDF
Cite
Code
Poster
Slides
Perspectives on the Social Impacts of Reinforcement Learning with Human Feedback
This paper discusses the social implications of reinforcement learning with human feedback (RLHF), identifying key social and ethical issues and discussing social impacts for stakeholders. Seven impact areas are examined, including misinformation, AI value-alignment, bias, AI access, cross-cultural dialogue, industry, and workforce.
Gabrielle Kaili-May Liu
2023
Envisioning the Future of Computing Prize, MIT Schwarzman College of Computing
PDF
Cite
Slides
Weight Friction: A Simple Method to Overcome Catastrophic Forgetting and Enable Continual Learning
Weight friction draws from principles in neuroscience and physics to help mitigate catatstrophic forgetting in neural networks. It operates over multiple task domains and has convergence comparable to SGD.
Gabrielle Kaili-May Liu
2019
Preprint
PDF
Cite
Evaluating Gammatone Frequency Cepstral Coefficients with Neural Networks for Emotion Recognition from Speech
GFCCs can be powerful speech features for use in speech emotion recognition. This preliminary study suggests they are more suitable versus MFCCs for emotion and intensity classification tasks with neural networks.
Gabrielle Kaili-May Liu
2018
Preprint
PDF
Cite
Cite
×