Automatic modelling language variations for socially responsive chatbots
Conversational agents are increasingly present in our daily lives thanks to advances in natural language processing and artificial intelligence and are attracting growing interest. However, their ability to understand human communication in all its complexity remains a major challenge. This PhD project aims to model linguistic variation to develop agents capable of socially adaptive interactions, taking into account the socio-demographic profile and emotional state of their interlocutors. It also focuses on evaluating linguistic cues at different levels, leveraging both spoken and written language varieties, and assessing the generalization capacity of models trained on multilingual and multi-situational data, with the goal of improving interaction modeling with conversational agents.
Compositional Generalization of Visual Language Models
The advent of the foundation models led to increase the state-of-the art performance on a large number of tasks in several fields of AI, in particular computer vision and natural language processing. However, despite the huge amount of data used to train them, these models are still limited in their ability to generalize, in particular for a use case of interest that is in a specific domain, not well represented on the Web. A way to formalize this issue is compositional generalization, i.e. generalising to a new, unseen concept from concepts learned during training. This "generalization" is the ability to learn disentangle concepts and to be able to recombine
them into unseen composition when the model is in production. The proposed thesis will address this issue, aiming at proposing visual representations that enable generic visual language models to generalize compositionally within specific domains. It will investigate strategies to reduce shortcut learning, promoting deeper understanding of compositional structures in multimodal data. It will also address the problem of compositional generalization beyond simple attribute–object pairs, capturing more subtle and complex semantics. The proposed thesis aims at proposing preogress at a quite theoretical level but has many potential practical interest, in the fields of health, administration and services sectors, security and defense, manufacturing and agriculture.
Artificial Intelligence for Integrated Electronics Design
As the technology fabrication processes improve towards nanometer-scale nodes, it is more and more complex to maintain the performance increase foreseen by Moore Law. To cope with this issue, technology processes provide various enhancing featuresi. More over, elementary components such as logic gates become legion. Providing a relevant design framework thus becomes a huge manual development task. As AI grows, it shows its skill to help decision making and hence components design, shaping a promising candidate to automate design flow. In this PhD subject, you will work on an AI model (LLM) capable of understanding electronic components. The works ultimately aim at developing a generation engine for electronic components.
Throughout this PhD, interdisciplinary research works will encompass a broad spectrum of knowledge around integrated electronics design, spanning microelectronics processes, electronic functions and logic gates implementation, neural networks architectures, large language models and generative AI.
Out-of-Distribution Detection with Vision Foundation Models and Post-hoc Methods
The thesis focuses on improving the reliability of deep learning models, particularly in detecting out-of-distribution (OoD) samples, which are data points that differ from the training data and can lead to incorrect predictions. This is especially important in critical fields like healthcare and autonomous vehicles, where errors can have serious consequences. The research leverages vision foundation models (VFMs) like CLIP and DINO, which have revolutionized computer vision by enabling learning from limited data. The proposed work aims to develop methods that maintain the robustness of these models during fine-tuning, ensuring they can still effectively detect OoD samples. Additionally, the thesis will explore solutions for handling changing data distributions over time, a common challenge in real-world applications. The expected results include new techniques for OoD detection and adaptive methods for dynamic environments, ultimately enhancing the safety and reliability of AI systems in practical scenarios.
Point Spread Function Modelling for Space Telescopes with a Differentiable Optical Model
Context
Weak gravitational lensing [1] is a powerful probe of the Large Scale Structure of our Universe. Cosmologists use weak lensing to study the nature of dark matter and its spatial distribution. Weak lensing missions require highly accurate shape measurements of galaxy images. The instrumental response of the telescope, called the point spread function (PSF), produces a deformation of the observed images. This deformation can be mistaken for the effects of weak lensing in the galaxy images, thus being one of the primary sources of systematic error when doing weak lensing science. Therefore, estimating a reliable and accurate PSF model is crucial for the success of any weak lensing mission [2]. The PSF field can be interpreted as a convolutional kernel that affects each of our observations of interest, which varies spatially, spectrally, and temporally. The PSF model needs to be able to cope with each of these variations. We use specific stars considered point sources in the field of view to constrain our PSF model. These stars, which are unresolved objects, provide us with degraded samples of the PSF field. The observations go through different degradations depending on the properties of the telescope. These degradations include undersampling, integration over the instrument passband, and additive noise. We finally build the PSF model using these degraded observations and then use the model to infer the PSF at the position of galaxies. This procedure constitutes the ill-posed inverse problem of PSF modelling. See [3] for a recent review on PSF modelling.
The recently launched Euclid survey represents one of the most complex challenges for PSF modelling. Because of the very broad passband of Euclid’s visible imager (VIS) ranging from 550nm to 900nm, PSF models need to capture not only the PSF field spatial variations but also its chromatic variations. Each star observation is integrated with the object’s spectral energy distribution (SED) over the whole VIS passband. As the observations are undersampled, a super-resolution step is also required. A recent model coined WaveDiff [4] was proposed to tackle the PSF modelling problem for Euclid and is based on a differentiable optical model. WaveDiff achieved state-of-the-art performance and is currently being tested with recent observations from the Euclid survey.
The James Webb Space Telescope (JWST) was recently launched and is producing outstanding observations. The COSMOS-Web collaboration [5] is a wide-field JWST treasury program that maps a contiguous 0.6 deg2 field. The COSMOS-Web observations are available and provide a unique opportunity to test and develop a precise PSF model for JWST. In this context, several science cases, on top of weak gravitational lensing studies, can vastly profit from a precise PSF model. For example, strong gravitational lensing [6], where the PSF plays a crucial role in reconstruction, and exoplanet imaging [7], where the PSF speckles can mimic the appearance of exoplanets, therefore subtracting an accurate and precise PSF model is essential to improve the imaging and detection of exoplanets.
PhD project
The candidate will aim to develop more accurate and performant PSF models for space-based telescopes exploiting a differentiable optical framework and focus the effort on Euclid and JWST.
The WaveDiff model is based on the wavefront space and does not consider pixel-based or detector-level effects. These pixel errors cannot be modelled accurately in the wavefront as they naturally arise directly on the detectors and are unrelated to the telescope’s optic aberrations. Therefore, as a first direction, we will extend the PSF modelling approach, considering the detector-level effect by combining a parametric and data-driven (learned) approach. We will exploit the automatic differentiation capabilities of machine learning frameworks (e.g. TensorFlow, Pytorch, JAX) of the WaveDiff PSF model to accomplish the objective.
As a second direction, we will consider the joint estimation of the PSF field and the stellar Spectral Energy Densities (SEDs) by exploiting repeated exposures or dithers. The goal is to improve and calibrate the original SED estimation by exploiting the PSF modelling information. We will rely on our PSF model, and repeated observations of the same object will change the star image (as it is imaged on different focal plane positions) but will share the same SEDs.
Another direction will be to extend WaveDiff for more general astronomical observatories like JWST with smaller fields of view. We will need to constrain the PSF model with observations from several bands to build a unique PSF model constrained by more information. The objective is to develop the next PSF model for JWST that is available for widespread use, which we will validate with the available real data from the COSMOS-Web JWST program.
The following direction will be to extend the performance of WaveDiff by including a continuous field in the form of an implicit neural representations [8], or neural fields (NeRF) [9], to address the spatial variations of the PSF in the wavefront space with a more powerful and flexible model.
Finally, throughout the PhD, the candidate will collaborate on Euclid’s data-driven PSF modelling effort, which consists of applying WaveDiff to real Euclid data, and the COSMOS-Web collaboration to exploit JWST observations.
References
[1] R. Mandelbaum. “Weak Lensing for Precision Cosmology”. In: Annual Review of Astronomy and Astro- physics 56 (2018), pp. 393–433. doi: 10.1146/annurev-astro-081817-051928. arXiv: 1710.03235.
[2] T. I. Liaudat et al. “Multi-CCD modelling of the point spread function”. In: A&A 646 (2021), A27. doi:10.1051/0004-6361/202039584.
[3] T. I. Liaudat, J.-L. Starck, and M. Kilbinger. “Point spread function modelling for astronomical telescopes: a review focused on weak gravitational lensing studies”. In: Frontiers in Astronomy and Space Sciences 10 (2023). doi: 10.3389/fspas.2023.1158213.
[4] T. I. Liaudat, J.-L. Starck, M. Kilbinger, and P.-A. Frugier. “Rethinking data-driven point spread function modeling with a differentiable optical model”. In: Inverse Problems 39.3 (Feb. 2023), p. 035008. doi:10.1088/1361-6420/acb664.
[5] C. M. Casey et al. “COSMOS-Web: An Overview of the JWST Cosmic Origins Survey”. In: The Astrophysical Journal 954.1 (Aug. 2023), p. 31. doi: 10.3847/1538-4357/acc2bc.
[6] A. Acebron et al. “The Next Step in Galaxy Cluster Strong Lensing: Modeling the Surface Brightness of Multiply Imaged Sources”. In: ApJ 976.1, 110 (Nov. 2024), p. 110. doi: 10.3847/1538-4357/ad8343. arXiv: 2410.01883 [astro-ph.GA].
[7] B. Y. Feng et al. “Exoplanet Imaging via Differentiable Rendering”. In: IEEE Transactions on Computational Imaging 11 (2025), pp. 36–51. doi: 10.1109/TCI.2025.3525971.
[8] Y. Xie et al. “Neural Fields in Visual Computing and Beyond”. In: arXiv e-prints, arXiv:2111.11426 (Nov.2021), arXiv:2111.11426. doi: 10.48550/arXiv.2111.11426. arXiv: 2111.11426 [cs.CV].
[9] B. Mildenhall et al. “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”. In: arXiv e-prints, arXiv:2003.08934 (Mar. 2020), arXiv:2003.08934. doi: 10.48550/arXiv.2003.08934. arXiv:2003.08934 [cs.CV].
AI-assisted generation of Instruction Set Simulators
The simulation tools for digital architectures rely on various types of models with different levels of abstraction to meet the requirements of hardware/software co-design and co-validation. Among these models, higher-level ones enable rapid functional validation of software on target architectures.
Developing these functional models often involves a manual process, which is both tedious and error-prone. When low-level RTL (Register Transfer Level) descriptions are available, they serve as a foundation for deriving higher-level models, such as functional ones. Preliminary work at CEA has resulted in an initial prototype based on MLIR (Multi-Level Intermediate Representation), demonstrating promising results in generating instruction execution functions from RTL descriptions.
The goal of this thesis is to further explore these initial efforts and subsequently automate the extraction of architectural states, leveraging the latest advancements in machine learning for EDA. The expected result is a comprehensive workflow for the automatic generation of functional simulators (a.k.a Instruction Set Simulators) from RTL, ensuring by construction the semantic consistency between the two abstraction levels.
Attention-based Binarized Visual Encoder for LLM-driven Visual Question Answering
In the context of smart image sensors, there is an increasing demand to go beyond simple inferences such as classification or object detection, to add more complex applications enabling a semantic understanding of the scene. Among these applications, Visual Question Answering (VQA) enables AI systems to answer questions by analyzing images. This project aims to develop an efficient VQA system combining a visual encoder based on Binary Neural Networks (BNN) with a compact language model (tiny LLM). Although LLMs are still far from a complete hardware implementation, this project represents a significant step in this direction by using a BNN to analyze the context and relationship between objects of the scene. This encoder processes images with low resource consumption, allowing real-time deployment on edge devices. Attention mechanisms can be taken into consideration to extract the semantic information necessary for scene understanding. The language model used can be stored locally and adjusted jointly with the BNN to generate precise and contextually relevant answers.
This project offers an opportunity for candidates interested in Tiny Deep Learning and LLMs. It proposes a broad field of research for significant contributions and interesting results for concrete applications. The work will consist of developing a robust BNN topology for semantic scene analysis under certain hardware constraints (memory and computation) and integrating and jointly optimizing the BNN encoder with the LLM, while ensuring a coherent and performant VQA system across different types of inquiries.
Predictive Diagnosis and Ageing Trajectory Estimation of New Generation Batteries through Multi-modalities Fusion and Physics-Informed Machine Learning
Context:
Lithium-ion and emerging Sodium-ion batteries are crucial for energy transition and transportation electrification. Ensuring battery longevity, performance, and safety requires understanding degradation mechanisms at multiple scales.
Research Objective:
Develop innovative battery diagnostic and prognostic methodologies by leveraging multi-sensor data fusion (acoustic sensors, strain gauge sensors, thermal sensors, electrical sensors, optical sensors) and Physics-Informed Machine Learning (PIML) approaches, combining physical battery models with deep learning algorithms.
Scientific Approach:
Establish correlations between multi-physical measurements and battery degradation mechanisms
Explore hybrid PIML approaches for multi-physical data fusion
Develop learning architectures integrating physical constraints while processing heterogeneous data
Extend methodologies to emerging Na-Ion battery technologies
Methodology:
The research will utilize an extensive multi-instrumented cell database, analyzing measurement signatures and developing innovative PIML algorithms that optimize multi-sensor data fusion and validate performance using real-world data.
Expected Outcomes:
The thesis aims to provide valuable recommendations for battery system instrumentation, develop advanced diagnostic algorithms, and contribute significantly to improving the reliability and sustainability of electrochemical storage systems, with potential academic and industrial impacts.
Defense of scene analysis models against adversarial attacks
In many applications, scene analysis modules such as object detection and recognition, or pose recognition, are required. Deep neural networks are nowadays among the most efficient models to perform a large number of vision tasks, sometimes simultaneously in case of multitask learning. However, it has been shown that they are vulnerable to adversarial attacks: Indeed, it is possible to add to the input data some perturbations imperceptible by the human eye which undermine the results during the inference made by the neural network. However, a guarantee of reliable results is essential for applications such as autonomous vehicles or person search for video surveillance, where security is critical. Different types of adversarial attacks and defenses have been proposed, most often for the classification problem (of images, in particular). Some works have addressed the attack of embedding optimized by metric learning, especially used for open-set tasks such as object re-identification, facial recognition or image retrieval by content. The types of attacks have multiplied: some universal, other optimized on a particular instance. The proposed defenses must deal with new threats without sacrificing too much of the initial performance of the model. Protecting input data from adversarial attacks is essential for decision systems where security vulnerabilities are critical. One way to protect this data is to develop defenses against these attacks. Therefore, the objective will be to study and propose different attacks and defenses applicable to scene analysis modules, especially those for object detection and object instance search in images.
Learning world models for advanced autonomous agent
World models are internal representations of the external environment that an agent can use to interact with the real world. They are essential for understanding the physics that govern real-world dynamics, making predictions, and planning long-horizon actions. World models can be used to simulate real-world interactions and enhance the interpretability and explainability of an agent's behavior within this environment, making them key components for advanced autonomous agent models.
Nevertheless, building an accurate world model remains challenging. The goal of this PhD is to develop methodology to learn world models and study their use in the context of autonomous driving, particularly for motion forecasting and developing autonomous agents for navigation.