About us
Espace utilisateur
Education
INSTN offers more than 40 diplomas from operator level to post-graduate degree level. 30% of our students are international students.
Professionnal development
Professionnal development
Find a training course
INSTN delivers off-the-self or tailor-made training courses to support the operational excellence of your talents.
Human capital solutions
At INSTN, we are committed to providing our partners with the best human capital solutions to develop and deliver safe & sustainable projects.
Thesis
Home   /   Thesis   /   Efficient Multimodal Vision Transformers for Embedded System

Efficient Multimodal Vision Transformers for Embedded System

Artificial intelligence & Data intelligence Computer science and software Engineering sciences Technological challenges

Abstract

The proposed thesis focuses on the optimization of multimodal vision transformers (ViT) for panoptic object segmentation, exploring two main directions. The first is to develop a versatile fusion pipeline to integrate multimodal data (RGB, IR, depth, events, point clouds) by leveraging inter-modal alignment relationships. The second is to investigate an approach combining pruning and mixed-precision quantization. The overall goal is to design lightweight multimodal ViT models, tailored to the constraints of embedded systems, while optimizing their performance and reducing computational complexity.

Laboratory

Département Systèmes et Circuits Intégrés Numériques (LIST)
DSCIN
Laboratoire Intelligence Artificielle Embarquée
Evry Val d’Essonne
Top envelopegraduation-hatlicensebookuserusersmap-markercalendar-fullbubblecrossmenuarrow-down