Gabrielle Kaili-May Liu
Gabrielle Kaili-May Liu
Home
Publications
Teaching
CV
Light
Dark
Automatic
1
Evaluating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries
We present the first pipeline for automatic, difficulty-controlled creation of uncheatable, realistic, unanswerable, and multi-hop queries (CRUMQs), adaptable to any corpus and domain. This offers a simple way to enhance benchmark difficulty and realism and drive development of more capable RAG systems.
Gabrielle Kaili-May Liu
,
Bryan Li
,
Arman Cohan
,
William Gantt Walden
,
Eugene Yang
2026
ECIR
PDF
Cite
Code
Incorporating Q&A Nuggets into Retrieval-Augmented Generation
Nugget-based evaluation methods have emerged as the standard for measuring relevance of long-form RAG responses. We argue that nuggets are valuable also for guiding retrieval and generation, and we present a nugget-centric RAG system that automatically constructs its own nugget bank and uses it as a control signal throughout the pipeline.
Laura Dietz
,
Bryan Li
,
Gabrielle Kaili-May Liu
,
Jia-Huei Ju
,
Eugene Yang
,
Dawn Lawrie
,
William Gantt Walden
,
James Mayfield
2026
ECIR
Cite
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
LLM benchmarks are essential for tracking progress and ensuring safety in AI, but most benchmarks don’t measure what matters, as suggested by our systematic review of 445 LLM benchmarks from top AI conferences. A taxonomy of these failures was therefore built and translated into an operational checklist to help future benchmark authors demonstrate construct validity.
Andrew M. Bean
,
Ryan Othniel Kearns
,
Angelika Romanou
,
Franziska Sofia Hafner
,
Harry Mayne
,
Others
,
Gabrielle Kaili-May Liu
,
Adam Mahdi
2025
NeurIPS
PDF
Cite
Code
Project
HF Repo
Checklist
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Multi-document (MD) processing is crucial for LLMs to handle real-world tasks across large sets of documents. This paper introduces MDCure, an effective and scalable procedure for generating high-quality multi-document instruction tuning data to improve MD capabilities of any base LLM.
Gabrielle Kaili-May Liu
,
Bowen Shi
,
Avi Caciularu
,
Idan Szpektor
,
Arman Cohan
2025
ACL Main
PDF
Cite
Code
Poster
Slides
HF Repo
Streaming Inference for Infinite Non-Stationary Clustering
We define the Dynamical Chinese Restaurant Process (Dynamical CRP), a novel stochastic process that provides a non-stationary prior over cluster assignments and yields an efficient streaming variational inference algorithm. Experiments show the Dynamical CRP can be applied on diverse synthetic and real data with Gaussian and non-Gaussian likelihoods.
Rylan Schaeffer
,
Gabrielle Kaili-May Liu
,
Yilun Du
,
Scott Linderman
,
Ila Rani Fiete
2022
ICLR Workshop on Agent Learning in Open-Endedness & Conference on Lifelong Learning Agents
PDF
Cite
Poster
Streaming Inference for Infinite Feature Models
R-IBP is a novel recursive form of the Indian Buffet Process that makes feature models applicable to streaming data. It enables creation of new features online and in a probabilistic, principled manner. As a prior for feature models, R-IBP yields efficient inference over an unbounded number of latent features, with quasilinear average time complexity and logarithmic average space complexity.
Rylan Schaeffer
,
Yilun Du
,
Gabrielle Kaili-May Liu
,
Ila Rani Fiete
2022
ICML
PDF
Cite
Poster
I know I’m happy, and I’m right: Metacognition of emotion
The first experimentally quantitative index of metacognition of emotion. Advancing understanding of awareness toward subjective feelings.
Hsing-Hao Lee
,
Gabrielle Kaili-May Liu
,
Su-Ling Yeh
2021
European Conference on Visual Perception
PDF
Cite
Poster
Cite
×