LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Jan 1, 2024·,,,,,,,,,,,,,,,,,,,·
0 min read
Anna Bavaresco
Raffaella Bernardi
Leonardo Bertolazzi
Desmond Elliott
Raquel Fernández
Albert Gatt
Esam Ghaleb
Mario Giulianelli
Michael Hanna
Alexander Koller
André F. T. Martins
Philipp Mondorf
Vera Neplenbroek
Sandro Pezzelle
Barbara Plank
David Schlangen
Alessandro Suglia
Aditya K. Surikuchi
Ece Takmaz
Alberto Testoni