The CEA welcomes 1,600 doctoral PhD students to its laboratories each year.
Thesis
Home / Thesis / Efficient Multimodal Vision Transformers for Embedded System
Efficient Multimodal Vision Transformers for Embedded System
Artificial intelligence & Data intelligenceComputer science and softwareEngineering sciencesTechnological challenges
Abstract
The proposed thesis focuses on the optimization of multimodal vision transformers (ViT) for panoptic object segmentation, exploring two main directions. The first is to develop a versatile fusion pipeline to integrate multimodal data (RGB, IR, depth, events, point clouds) by leveraging inter-modal alignment relationships. The second is to investigate an approach combining pruning and mixed-precision quantization. The overall goal is to design lightweight multimodal ViT models, tailored to the constraints of embedded systems, while optimizing their performance and reducing computational complexity.
Laboratory
Département Systèmes et Circuits Intégrés Numériques (LIST)
Nous utilisons des cookies pour vous garantir la meilleure expérience sur notre site web. Si vous continuez à utiliser ce site, nous supposerons que vous en êtes satisfait.OKNonPolitique de confidentialité