The Third Workshop on Large Language Models (LLMs) for Evaluation in Information Retrieval
Large language models (LLMs) have demonstrated increasing task-solving abilities not present in smaller models. Utilizing the capabilities and responsibilities of LLMs for automated evaluation (LLM4Eval) has recently attracted considerable attention in multiple research communities. Building on the success of previous workshops, which established foundations in automated judgments and RAG evaluation, this third iteration aims to address emerging challenges as IR systems become increasingly personalized and interactive. The main goal of the third LLM4Eval workshop is to bring together researchers from industry and academia to explore three critical areas: the evaluation of personalized IR systems while maintaining fairness, the boundaries between automated and human assessment in subjective scenarios, and evaluation methodologies for systems that combine multiple IR paradigms (search, recommendations, and dialogue).