Alessandro Suglia
  • Bio
  • Papers
  • Experience
  • Recent & Upcoming Talks
    • Example Talk
  • Teaching
    • Introduction to Natural Language Processing
    • Introduction to Programming II
  • Projects
  • Publications
    • AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding
    • AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding
    • CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts
    • Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
    • Human - Large Language Model Interaction: The dawn of a new era or the end of it all?
    • Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks
    • Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks
    • Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks
    • LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
    • Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers
    • Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers
    • PIXAR: Auto-Regressive Language Modeling in Pixel Space
    • PIXAR: Auto-Regressive Language Modeling in Pixel Space
    • Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models
    • Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models
    • Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
    • Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
    • Visually Grounded Language Learning: A Review of Language Games, Datasets, Tasks, and Models
    • 'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges
    • 'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges
    • An Analysis of Visually Grounded Instructions in Embodied AI Tasks
    • Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning
    • Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
    • Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
    • Visually Grounded Language Learning: a review of language games, datasets, tasks, and models
    • ACT-Thor: A Controlled Benchmark for Embodied Action Understanding in Simulated Environments
    • Combine to Describe: Evaluating Compositional Generalization in Image Captioning
    • Demonstrating EMMA: Embodied MultiModal Agent for Language-guided Action Execution in 3D Simulated Environments
    • Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge
    • Going for GOAL: A Resource for Grounded Football Commentaries
    • Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering
    • An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games
    • An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games
    • Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
    • CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning
    • CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning
    • Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games
    • Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games
    • Predicting Perceptual Speed from Search Behaviour
    • Bridging the gap between linked open data-based recommender systems and distributed representations
    • Deep Learning and Hierarchical Reinforcement Learning for modeling a Conversational Recommender System
    • A Deep Architecture for Content-based Recommendations Exploiting Recurrent Neural Networks
    • A Similarity-Based Abstract Argumentation Approach to Extractive Text Summarization
    • An Automatic Procedure for Generating Datasets for Conversational Recommender Systems
    • Converse-Et-Impera: Exploiting Deep Learning and Hierarchical Reinforcement Learning for Conversational Recommender Systems
    • Iterative Multi-document Neural Attention for Multiple Answer Prediction
    • Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks
    • Iterative Multi-document Neural Attention for Multiple Answer Prediction
    • An example conference paper
  • Blog
    • ๐ŸŽ‰ Easily create your own simple yet highly customizable blog
    • ๐Ÿง  Sharpen your thinking with a second brain
    • ๐Ÿ“ˆ Communicate your results effectively with the best data visualizations
    • ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿซ Teach academic courses
    • โœ… Manage your projects
  • Projects
    • Pandas
    • PyTorch
    • scikit-learn
  • Experience

Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers

Jan 1, 2024ยท
Georgios Pantazopoulos
,
Alessandro Suglia
,
Oliver Lemon
,
Arash Eshghi
ยท 0 min read
Cite DOI URL
Type
Journal article
Publication
CoRR
Last updated on Jan 1, 2024

← Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers Jan 1, 2024
PIXAR: Auto-Regressive Language Modeling in Pixel Space Jan 1, 2024 →

ยฉ 2025 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder โ€” the free, open source website builder that empowers creators.