- Expanding Relevance Judgments for Medical Case-based Retrieval Task with Multimodal LLMs
- Catarina Pires, Sérgio Nunes and Luís Filipe Teixeira
- When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search
- William A. Ingram, Bipasha Banerjee and Edward A. Fox
- Let’s FACE the Evaluation of Conversations that Go with the Flow
- Hideaki Joko and Faegheh Hasibi
- Evaluating Causal Explanation in Medical Reports with LLM-Based and Human-Aligned Metrics
- Yousang Cho and Key-Sun Choi
- Deeper Than the Pool: Are Language Models Really Suited for Completion of Unlabeled Query–Document Judgments?
- Saber Zahhar, Nazim Kazar and Alexandre Dahan
- LLM-based relevance assessment still can’t replace human relevance assessment
- Charles Clarke and Laura Dietz
- CCRS: A Zero-Shot LLM-as-a-Judge Framework for Comprehensive RAG Evaluation
- Aashiq Muhamed
- Re-Rankers are Effective Relevance Judgment Predictors
- Chuan Meng, Jiqun Liu, Mohammad Aliannejadi, Fengran Mo and Maarten de Rijke
- LLM-Driven Usefulness Labeling for IR Evaluation
- Mouly Dewan, Jiqun Liu and Chirag Shah
- Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement
- Maryam Mousavian, Zahra Abbasiantaeb, Mohammad Aliannejadi and Fabio Crestani