Accepted Papers

Expanding Relevance Judgments for Medical Case-based Retrieval Task with Multimodal LLMs
- Catarina Pires, Sérgio Nunes and Luís Filipe Teixeira
When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search
- William A. Ingram, Bipasha Banerjee and Edward A. Fox
FACE: A Fine-grained Reference Free Evaluator for Conversational Recommender Systems
- Hideaki Joko and Faegheh Hasibi
Evaluating Causal Explanation in Medical Reports with LLM-Based and Human-Aligned Metrics
- Yousang Cho and Key-Sun Choi
Deeper Than the Pool: Are Language Models Really Suited for Completion of Unlabeled Query–Document Judgments?
- Saber Zahhar, Nazim Kazar and Alexandre Dahan
LLM-based relevance assessment still can’t replace human relevance assessment
- Charles Clarke and Laura Dietz
CCRS: A Zero-Shot LLM-as-a-Judge Framework for Comprehensive RAG Evaluation
- Aashiq Muhamed
Re-Rankers are Effective Relevance Judgment Predictors
- Chuan Meng, Jiqun Liu, Mohammad Aliannejadi, Fengran Mo and Maarten de Rijke
LLM-Driven Usefulness Labeling for IR Evaluation
- Mouly Dewan, Jiqun Liu and Chirag Shah
Towards Fair Rankings: Leveraging LLMs for Gender Bias Detection and Measurement
- Maryam Mousavian, Zahra Abbasiantaeb, Mohammad Aliannejadi and Fabio Crestani

LLM4Eval