LLM4Eval is colocated with WSDM 2025 in Hannover, Germany and takes place on March 10th-14th, 2025. All times in the table below are according to the local time zone.
Time | Agenda |
---|---|
9:00 - 9:15 | Opening Remarks |
9:15 - 10:00 | Keynote 1: Edgar Meij |
10:00 - 11:00 | Booster Talks 1 |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | Booster Talks 2 |
12:30 - 13:30 | Lunch |
13:30 - 14:15 | Keynote 2: Sean MacAvaney |
14:15 - 14:30 | LLMJudge Presentation |
14:30 - 15:00 | Breakout Discussion |
15:00 - 15:30 | Coffee Break |
15:30 - 16:00 | Breakout Discussion |
16:00 - 16:55 | Panel Discussion |
16:55 - 17:00 | Closing |
Keynotes
Synthetic Evaluations and GenAI Application Development
Edgar Meij, Head of AI Platforms, Bloomberg
Abstract. The key to successfully building AI-driven applications is evaluation at all stages of the development lifecycle – ranging from ideation and development to post-release monitoring. At the same time, evaluation is still typically very much a manual process, and LLMs hold promise for faster, easier, and (sometimes) more accurate judgments and annotations.
This keynote focuses on evaluation in an industrial setting, the emergence of synthetic evaluations, the implications of their intersection in light of recent work, and open challenges in this emerging field. The rise of synthetic judgments for evaluation creates many opportunities to augment, support, or replace manual processes, thus enabling more effective and more efficient ways to develop applications. For instance, this includes utilizing explicit or additional context in a more programmatic way, as well as adding support for evaluating session simulations and additional evaluation dimensions, such as safety, succinctness, multi-linguality, model-based feedback, and tool selection.
Bio. Edgar Meij is the head of the AI Platforms division in Bloomberg’s Artificial Intelligence (AI) group, and leads 10+ teams of engineers and researchers that are responsible for all key AI, NLP, ML, LLM/GenAI, and Search technology platforms used across the company. Edgar holds a Ph.D. in computer science from the University of Amsterdam and has an extensive track record in artificial intelligence, information retrieval, natural language processing, machine learning, large-scale computing infrastructures, knowledge graphs, and semantic search. He has published more than 150 papers in top international venues, which have been cited more than 3,000 times. He is also a (Senior) Program Committee member of virtually every major conference in the field (including The Web Conference, WSDM, SIGIR, CIKM, ACL, and EMNLP), has organized tutorials and workshops at those same conferences, has served as sponsorship co-chair for The Web Conference and ECIR, and as co-chair for the Industry Track (SIRIP) for SIGIR 2024.
Ambiguity is King. Down with the King!
Sean MacAvaney, University of Glasgow
Abstract. Ambiguity is a central challenge in relevance estimation, whether as a component of a retrieval system or in IR system evaluation. Even seemingly straightforward queries can represent many underlying intents, making relevance estimation inherently ambiguous. I will argue that LLM-based evaluations offer us the chance to (partially) overcome these challenges in ambiguity and provide techniques for doing so.
Bio. Sean is a Lecturer in Machine Learning at the University of Glasgow and a member of the Terrier Team. He did his PhD at Georgetown University under Nazli Goharian and Ophir Frieder, where he was a member of the IR Lab and an ARCS Endowed Scholar. His research focuses on applying machine learning to problems in IR and NLP. He has done research internships at AI2, CNR, Adobe, and MPI-INF. Before graduate school, he worked as a software developer at IIT/SourceTech.
Panelists
Jaap Kamps, University of Amsterdam
Edgar Meij, Bloomberg
Filice Simone, Technology Innovation Institute
Sean MacAvaney, University of Glasgow