LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Jan 1, 2024·
Anna Bavaresco
,
Raffaella Bernardi
,
Leonardo Bertolazzi
,
Desmond Elliott
,
Raquel Fernández
,
Albert Gatt
,
Esam Ghaleb
,
Mario Giulianelli
,
Michael Hanna
,
Alexander Koller
,
André F. T. Martins
,
Philipp Mondorf
,
Vera Neplenbroek
,
Sandro Pezzelle
,
Barbara Plank
,
David Schlangen
,
Alessandro Suglia
,
Aditya K. Surikuchi
,
Ece Takmaz
,
Alberto Testoni
· 0 min read
Type
Publication
CoRR