The LLM4Eval workshop will be colocated with SIGIR 2025 in Padua, Italy and takes place on July 17th, 2025.
Time | Agenda |
---|---|
9:00 - 9:15 | Opening Remarks |
9:15 - 10:00 | Keynote Talk: Chirag Shah |
10:00 - 10:30 | Coffee break + Poster Presentations |
10:30 - 12:30 | Paper Presentations |
12:30 - 14:00 | Lunch break + Poster Presentations |
14:00 - 15:30 | Breakout Discussion |
15:30 - 16:00 | Coffee break |
16:00 - 16:45 | Breakout Discussion Reports |
16:45 - 17:00 | Closing Remarks |
Keynote Talk
From Relevance to Reality: Scaling Human-Centered Evaluation in the LLM Era
Chirag Shah, University of Washington
Abstract. The widespread adoption of LLMs as evaluators represents a fundamental shift in how we assess information systems, but our current approaches remain rooted in ad-hoc prompt engineering rather than systematic scientific methodology. Drawing from our recent work on usefulness evaluation and Theory of Mind reasoning in LLMs, I’ll demonstrate that while LLMs show remarkable capabilities as evaluators, their outputs demand the same level of scrutiny we apply to any scientific instrument. This trust requires three foundational advances. First, we must move beyond static evaluation paradigms toward dynamic benchmark generation that can evolve with rapidly advancing models. Second, we need systematic verification methods that can validate LLM judgments without falling into infinite recursive loops. Most critically, we must abandon prompt engineering’s trial-and-error approach in favor of “prompt science”—a methodology that brings transparency, replicability, and rigor to evaluation tasks. The future of LLM-based evaluation lies not in more sophisticated prompts, but in more sophisticated science.
Bio. Chirag Shah is Professor of Information and Computer Science at University of Washington (UW) in Seattle. He is the Founding Director for InfoSeeking Lab and Founding Co-Director of Center for Responsibility in AI Systems & Experiences (RAISE). His research focuses on building, auditing, and correcting agentic information access systems. Shah is a Distinguished Member of ACM as well as ASIS&T, and a Senior Member of IEEE. He has published nearly 200 peer-reviewed articles and authored seven books, including textbooks on data science and machine learning. He also works closely with industrial research labs on cutting-edge problems, typically as a visiting researcher. The most recent engagements include Amazon, ByteDance, Getty Images, Microsoft Research, and Spotify.