Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language ModelingJan 1, 2024ยทGeorgios Pantazopoulos,Malvina Nikandrou,Alessandro Suglia,Oliver Lemon,Arash Eshghiยท 0 min read Cite DOI URLTypeJournal articlePublicationCoRRLast updated on Jan 1, 2024 ← Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Jan 1, 2024Visually Grounded Language Learning: A Review of Language Games, Datasets, Tasks, and Models Jan 1, 2024 →