The LLM4Eval workshop will be colocated with SIGIR 2025 in Padua, Italy and takes place on July 17th, 2025.

Time	Agenda
9:00 - 9:15	Opening Remarks
9:15 - 10:00	Keynote Talk: Chirag Shah
10:00 - 10:30	Paper Presentations Session 1
10:30 - 11:00	Coffee break
11:00 - 12:30	Paper Presentations Session 2
12:30 - 14:30	Lunch break + Poster Presentations
14:30 - 16:00	Breakout Discussion
16:00 - 16:30	Coffee break
16:30 - 17:15	Breakout Discussion Reports
17:15 - 17:30	Closing Remarks

Keynote Talk

From Relevance to Reality: Scaling Human-Centered Evaluation in the LLM Era

Chirag Shah, University of Washington

Abstract. The widespread adoption of LLMs as evaluators represents a fundamental shift in how we assess information systems, but our current approaches remain rooted in ad-hoc prompt engineering rather than systematic scientific methodology. Drawing from our recent work on usefulness evaluation and Theory of Mind reasoning in LLMs, I’ll demonstrate that while LLMs show remarkable capabilities as evaluators, their outputs demand the same level of scrutiny we apply to any scientific instrument. This trust requires three foundational advances. First, we must move beyond static evaluation paradigms toward dynamic benchmark generation that can evolve with rapidly advancing models. Second, we need systematic verification methods that can validate LLM judgments without falling into infinite recursive loops. Most critically, we must abandon prompt engineering’s trial-and-error approach in favor of “prompt science”—a methodology that brings transparency, replicability, and rigor to evaluation tasks. The future of LLM-based evaluation lies not in more sophisticated prompts, but in more sophisticated science.

Bio. Chirag Shah is Professor of Information and Computer Science at University of Washington (UW) in Seattle. He is the Founding Director for InfoSeeking Lab and Founding Co-Director of Center for Responsibility in AI Systems & Experiences (RAISE). His research focuses on building, auditing, and correcting agentic information access systems. Shah is a Distinguished Member of ACM as well as ASIS&T, and a Senior Member of IEEE. He has published nearly 200 peer-reviewed articles and authored seven books, including textbooks on data science and machine learning. He also works closely with industrial research labs on cutting-edge problems, typically as a visiting researcher. The most recent engagements include Amazon, ByteDance, Getty Images, Microsoft Research, and Spotify.

Program

Keynote Talk

From Relevance to Reality: Scaling Human-Centered Evaluation in the LLM Era

LLM4Eval