Fine-grained and spatio-temporally grounded large multimodal models

Artificial intelligence & Data intelligence Computer science and software Engineering sciences Technological challenges 

Abstract

This PhD project focuses on enhancing Large Multimodal Models (LMMs) through the integration of fine-grained and spatio-temporal information into training datasets. While current LMMs such as CLIP and Flamingo show strong performance, they rely on noisy and coarse-grained image-text pairs and often lack spatial or temporal grounding. The thesis aims to develop automatic pipelines to enrich image datasets with geographic and temporal metadata, refine captions using fine-grained semantic descriptors, and balance dataset diversity and compactness by controlling class-wise sample sizes.

Training strategies will incorporate hierarchical class structures and adapt protocols to improve alignment between caption elements and image regions. The work will also explore joint training regimes that integrate fine-grained, spatial, and temporal dimensions, and propose set-based inference to improve the diversity of generated outputs. The enriched datasets and models will be evaluated using existing or newly developed benchmarks targeting contextual relevance and output diversity. The project also addresses challenges in metadata accuracy, efficient model adaptation, and benchmarking methodologies for multi-dimensional model evaluation.

Applications include improved synthetic data generation for autonomous driving, enhanced annotation of media archives through contextual captioning, and better visual reasoning in industrial simulation scenarios.

Laboratory

Département Intelligence Ambiante et Systèmes Interactifs (LIST)

Service Intelligence Artificielle pour le Langage et la Vision

Laboratoire Analyse Sémantique Textes et Images

Paris-Saclay

Back

Share this thesis topic

Practicle information

Pre-requisite:

Master en informatique ou intelligence artificielle

University - graduate school:

Paris-Saclay

Starting date:

01-10-2025

Place:

Saclay

Contact Person

Sandra

KARA

CEA

DRT/DIASI//LASTI

Tel :

Email :

Thesis supervisor

Adrian

POPESCU

CEA

DRT/DIASI//LASTI