Publications

(2024). Visually Grounded Language Learning: A Review of Language Games, Datasets, Tasks, and Models. J. Artif. Intell. Res..
(2024). Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling. CoRR.
(2024). Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024.
(2024). Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models. CoRR.
(2024). Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024.
(2024). PIXAR: Auto-Regressive Language Modeling in Pixel Space. CoRR.
(2024). PIXAR: Auto-Regressive Language Modeling in Pixel Space. Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024.
(2024). Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers. CoRR.
(2024). Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, NAACL 2024, Mexico City, Mexico, June 16-21, 2024.
(2024). LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks. CoRR.