Filippo Momentè, Alessandro Suglia, Mario Giulianelli, Ambra Ferrari, Alexander Koller, Oliver Lemon, David Schlangen, Raquel Fernández, Raffaella Bernardi(2025).
Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests.
Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, November 4-9, 2025.
Nicola Horst, Davide Mazzaccara, Antonia Schmidt, Michael Sullivan, Filippo Momentè, Luca Franceschetti, Philipp Sadler, Sherzod Hakimov, Alberto Testoni, Raffaella Bernardi, Raquel Fernández, Alexander Koller, Oliver Lemon, David Schlangen, Mario Giulianelli, Alessandro Suglia(2025).
Playpen: An Environment for Exploring Learning From Dialogue Game Feedback.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9, 2025.
Emmanouil Zaranis, António Farinhas, Saul Santos, Beatriz Canaverde, Miguel Moura Ramos, Aditya K Surikuchi, André Viveiros, Baohao Liao, Elena Bueno-Benito, Nithin Sivakumaran, Others(2025).
Movie Facts and Fibs (MF $^ 2$): A Benchmark for Long Movie Understanding.
arXiv preprint arXiv:2506.06275.