Publications

(2026). VoyagerVision: Investigating the Role of Multi-modal Information for Open-ended Learning Systems. Advances in Intelligent Systems and Computing ((AISC,volume 1468)).
(2026). VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models. ICML 2026.
(2026). Same Answer, Different Representations: Hidden instability in VLMs. CoRR.
(2026). Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures. CoRR.
(2025). CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts. NAACL 2025.
(2025). Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests. Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, November 4-9, 2025.
(2025). Playpen: An Environment for Exploring Learning From Dialogue Game Feedback. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9, 2025.
(2025). Movie Facts and Fibs (MF $^ 2$): A Benchmark for Long Movie Understanding. arXiv preprint arXiv:2506.06275.
(2024). Visually Grounded Language Learning: A Review of Language Games, Datasets, Tasks, and Models. J. Artif. Intell. Res..