About us
Espace utilisateur
Education
INSTN offers more than 40 diplomas from operator level to post-graduate degree level. 30% of our students are international students.
Professionnal development
Professionnal development
Find a training course
INSTN delivers off-the-self or tailor-made training courses to support the operational excellence of your talents.
Human capital solutions
At INSTN, we are committed to providing our partners with the best human capital solutions to develop and deliver safe & sustainable projects.
Thesis
Home   /   Thesis   /   Grounding and reasoning over space and time in Vision-Language Models (VLM)

Grounding and reasoning over space and time in Vision-Language Models (VLM)

Artificial intelligence & Data intelligence Computer science and software Engineering sciences Technological challenges

Abstract

Recent Vision-Language Models (VLMs) like BLIP, LLaVA, and Qwen-VL have achieved impressive results in multimodal tasks but still face limitations in true spatial and temporal reasoning. Many current benchmarks conflate visual reasoning with general knowledge and involve shallow reasoning tasks. Furthermore, these models often struggle with understanding complex spatial relations and dynamic scenes due to suboptimal visual feature usage. To address this, recent approaches such as SpatialRGPT, SpaceVLLM, VPD, and ST-VLM have introduced techniques like 3D scene graph integration, spatio-temporal queries, and kinematic instruction tuning to improve reasoning over space and time. This thesis proposes to build on these advances by developing new instruction-tuned models with improved data representation and architectural innovations. The goal is to enable robust spatio-temporal reasoning for applications in robotics, video analysis, and dynamic environment understanding.

Laboratory

Département Intelligence Ambiante et Systèmes Interactifs (LIST)
Service Intelligence Artificielle pour le Langage et la Vision
Laboratoire Vision et Apprentissage pour l’analyse de scènes
Paris-Saclay
Top envelopegraduation-hatlicensebookuserusersmap-markercalendar-fullbubblecrossmenuarrow-down