Accepted Papers

Improving Search Quality with LLMs: Predicting Relevance Labels in Data-Scarce Domains
- Ishitwa Viranchi, Subhadip Maji and Mahima Chandwani.
Validating LLM-Generated Relevance Labels for Educational Resource Search
- Ratan Sebastian and Anett Hoppe
Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana
- Simone Filice, Guy Horowitz, David Carmel, Zohar Karnin, Liane Lewin-Eytan and Yoelle Maarek
CRPA dataset: Abstract Screening in Systematic Reviews
- Mohammad Shariful Islam and Mohammad Abu Tareq Rony
Large Language Model Relevance Assessors Agree With One Another More Than With Human Assessors
- Maik Fröbe, Andrew Parry, Ferdinand Schlatt, Sean MacAvaney, Benno Stein, Martin Potthast and Matthias Hagen
LLM-as-a-Judge: Evaluating LLM Recommendations for Industrial Safety
- Sumit Koundanya, Shubham Kumbhar, Siddharth Tumre and Sangameshwar Patil
Identifying IR Data Labeling errors using LLMs
- Sean D Rosario
Augmented Relevance Datasets with Fine-Tuned Small LLMs
- Quentin Fitte-Rey, Matyas Amrouche and Romain Deveaud

LLM4Eval