Towards a better understanding of membrane proteins through AI

Despite the remarkable advances in artificial intelligence (AI), particularly with tools like AlphaFold, the prediction of membrane protein structures remains a major challenge in structural biology. These proteins, which represent 30% of the proteome and 60% of therapeutic targets, are still significantly underrepresented in the Protein Data Bank (PDB), with only 3% of their structures resolved. This rarity is due to the difficulty in maintaining their native state in an amphiphilic environment, which complicates their study, especially with classical structural techniques.

This PhD project aims to overcome these challenges by combining the predictive capabilities of AlphaFold with experimental small-angle scattering (SAXS/SANS) data obtained under physiological conditions. The study will focus on the translocator protein TSPO, a key marker in neuroimaging of several serious pathologies (cancers, neurodegenerative diseases) due to its strong affinity for various pharmacological ligands.

The work will involve predicting the structure of TSPO, both in the presence and absence of ligands, acquiring SAXS/SANS data of the TSPO/amphiphile complex, and refining the models using advanced modeling tools (MolPlay, Chai-1) and molecular dynamics simulations. By deepening the understanding of TSPO’s structure and function, this project could contribute to the design of new ligands for diagnostic and therapeutic purposes.

Online analysis of actinides surrogates in solution by LIBS and AI for nuclear fuel reprocessing processes

The construction of new nuclear reactors in the coming years will require an increase in fuel reprocessing capacity. This evolution requires scientific and technological developments to update process monitoring equipment. One of the parameters to be continuously monitored is the actinide content in solution, which is essential for process control and is currently measured using obsolete technologies. We therefore propose to develop LIBS (laser-induced breakdown spectroscopy) for this application, a technique well suited for quantitative online elemental analysis. As actinide spectra are particularly complex, we shall use multivariate data processing approaches, such as several artificial intelligence (AI) techniques, to extract quantitative information from LIBS data and characterize measurement uncertainty.
The aim of this thesis is therefore to evaluate the performance of online analysis of actinides in solution using LIBS and AI. In particular, we aim to improve the characterisation of uncertainties using machine learning techniques, in order to strongly reduce them and to meet the monitoring needs of the future reprocessing plant.
Experimental work will be carried out on non-radioactive actinide simulants, using a commercial LIBS equipment. The spectroscopic data will drive the data processing part of the thesis, and the determination of the uncertainty obtained by different quantification models.
The results obtained will enable publishing at least 2-3 articles in peer-reviewed journals, and even to file patents. The prospects of the thesis are to increase the maturity level of the method and instrumentation, and gradually move towards implementation on a pilot line representative of a reprocessing process.

Generative AI for Robust Uncertainty Quantification in Astrophysical Inverse Problems

Context
Inverse problems, i.e. estimating underlying signals from corrupted observations, are ubiquitous in astrophysics, and our ability to solve them accurately is critical to the scientific interpretation of the data. Examples of such problems include inferring the distribution of dark matter in the Universe from gravitational lensing effects [1], or component separation in radio interferometric imaging [2].

Thanks to recent deep learning advances, and in particular deep generative modeling techniques (e.g. diffusion models), it now becomes not only possible to get an estimate of the solution of these inverse problems, but to perform Uncertainty Quantification by estimating the full Bayesian posterior of the problem, i.e. having access to all possible solutions that would be allowed by the data, but also plausible under prior knowledge.

Our team has in particular been pioneering such Bayesian methods to combine our knowledge of the physics of the problem, in the form of an explicit likelihood term, with data-driven priors implemented as generative models. This physics-constrained approach ensures that solutions remain compatible with the data and prevents “hallucinations” that typically plague most generative AI applications.

However, despite remarkable progress over the last years, several challenges still remain in the aforementioned framework, and most notably:

[Imperfect or distributionally shifted prior data] Building data-driven priors typically requires having access to examples of non corrupted data, which in many cases do not exist (e.g. all astronomical images are observed with noise and some amount of blurring), or might exist but may have distribution shifts compared to the problems we would like to apply this prior to.
This mismatch can bias estimations and lead to incorrect scientific conclusions. Therefore, the adaptation, or calibration, of data-driven priors from incomplete and noisy observations becomes crucial for working with real data in astrophysical applications.

[Efficient sampling of high dimensional posteriors] Even if the likelihood and the data-driven prior are available, correctly sampling from non-convex multimodal probability distributions in such high-dimensions in an efficient way remains a challenging problem. The most effective methods to date rely on diffusion models, but rely on approximations and can be expensive at inference time to reach accurate estimates of the desired posteriors.

The stringent requirements of scientific applications are a powerful driver for improved methodologies, but beyond the astrophysical scientific context motivating this research, these tools also find broad applicability in many other domains, including medical images [3].

PhD project
The candidate will aim to address these limitations of current methodologies, with the overall aim to make uncertainty quantification for large scale inverse problems faster and more accurate.
As a first direction of research, we will extend recent methodology concurrently developed by our team and our Ciela collaborators [4,5], based on Expectation-Maximization, to iteratively learn (or adapt) diffusion-based priors to data observed under some amount of corruption. This strategy has been shown to be effective at correcting for distribution shifts in the prior (and therefore leading to well calibrated posteriors). However, this approach is still expensive as it requires iteratively solving inverse problems and retraining the diffusion models, and is critically dependent on the quality of the inverse problem solver. We will explore several strategies including variational inference and improved inverse problem sampling strategies to address these issues.
As a second (but connected) direction we will focus on the development of general methodologies for sampling complex posteriors (multimodal/complex geometries) of non-linear inverse problems. Specifically we will investigate strategies based on posterior annealing, inspired from diffusion model sampling, applicable in situations with explicit likelihoods and priors.
Finally, we will apply these methodologies to some challenging and high impact inverse problems in astrophysics, in particular in collaboration with our colleagues from the Ciela institute, we will aim to improve source and lens reconstruction of strong gravitational lensing systems.
Publications in top machine learning conferences are expected (NeurIPS, ICML), as well as publications of the applications of these methodologies in astrophysical journals.

References
[1] Benjamin Remy, Francois Lanusse, Niall Jeffrey, Jia Liu, Jean-Luc Starck, Ken Osato, Tim Schrabback, Probabilistic Mass Mapping with Neural Score Estimation, https://www.aanda.org/articles/aa/abs/2023/04/aa43054-22/aa43054-22.html

[2] Tobías I Liaudat, Matthijs Mars, Matthew A Price, Marcelo Pereyra, Marta M Betcke, Jason D McEwen, Scalable Bayesian uncertainty quantification with data-driven priors for radio interferometric imaging, RAS Techniques and Instruments, Volume 3, Issue 1, January 2024, Pages 505–534, https://doi.org/10.1093/rasti/rzae030

[3] Zaccharie Ramzi, Benjamin Remy, Francois Lanusse, Jean-Luc Starck, Philippe Ciuciu, Denoising Score-Matching for Uncertainty Quantification in Inverse Problems, https://arxiv.org/abs/2011.08698

[4] François Rozet, Gérôme Andry, François Lanusse, Gilles Louppe, Learning Diffusion Priors from Observations by Expectation Maximization, NeurIPS 2024, https://arxiv.org/abs/2405.13712

[5] Gabriel Missael Barco, Alexandre Adam, Connor Stone, Yashar Hezaveh, Laurence Perreault-Levasseur, Tackling the Problem of Distributional Shifts: Correcting Misspecified, High-Dimensional Data-Driven Priors for Inverse Problems, https://arxiv.org/abs/2407.17667

Prediction of Soiling on PV modules/systems through Real-World Environment Modeling and Data Fusion

Photovoltaic (PV) systems, particularly those installed in regions prone to soiling such as arid areas, coastal sites, and agricultural zones, can experience energy losses of up to 20–30% annually. These losses translate to financial impacts exceeding €10 billion in 2023.
This thesis aims to develop a robust and comprehensive method to predict soiling accumulation on PV modules and systems by combining real-world environmental modeling with operational PV data (electrical, thermal, optical). The research will follow a bottom-up approach in three stages:

1. Component/Module Level: Reproduction and modeling of soiling accumulation in laboratory conditions, followed by experimental validation. This stage will leverage the CEA’s expertise in degradation modeling, including accelerated testing.

2. Module/System Level: Implementation of monitoring campaigns to collect meteorological, operational, and imaging data, combined with field soiling tests on a pilot site. The data will validate and enhance CEA diagnostic tools by introducing innovative features such as AI-driven soiling propagation prediction.

3. System/Operational Level: Validation of the proposed method on commercial PV modules in utility-scale PV plants, aiming to demonstrate scalability and real-world applicability.

The outcomes of this thesis will contribute to the development of an innovative tool/method for comprehensive soiling diagnostics and prognostics in PV installations, enabling the minimization of energy losses while anticipating and optimizing cleaning strategies for PV plants.

Embedded systems for natural acoustic signals analysis while preserving privacy

The PhD topic aims at developping Embedded systems to record and analyze natural acoustic signals. When targeting city deployement, the privacy issue is raised: how can we keep a satisfactory analysis level while never record or transmit human voices?

Learning Interpretable Models for Stress Corrosion of Stainless Steels Exposed in the Primary Environment of PWRs

Stress corrosion cracking (SCC) of austenitic alloys in water-cooled nuclear reactors is one of the most significant component degradation phenomena. SCC occurs due to the synergistic effects of tensile stresses, environment and material susceptibility. For reactor life extension, understanding this mechanism is essential. The methodology most frequently employed to investigate SCC cracking is an experimental one, requiring lengthy and costly tests of several thousand hours. Furthermore, the considerable number of critical parameters that influence susceptibility to SCC cracking and coupling effects have resulted in test grids increasing in length and complexity. This thesis proposes a novel approach based on the use of interpretable models that are driven by the artificial intelligence of fuzzy logic. The aim is to reduce the length and cost of research activities by focusing on relevant tests and parameters that can improve environmental performance. The key issues here will be to add the performance of artificial intelligence to the experimental approach, with the aim of defining susceptibility domains for the initiation of SCC cracks as a function of the critical parameters identified in the model, and providing data for the development of new materials by additive manufacturing. The thesis will develop a numerical model that can be used as guidance in decision-making regarding the stress corrosion mechanism. The future PhD student will also carry out experimental work to validate this new numerical approach.

Scalable NoC-based Programmable Cluster Architecture for future AI applications

Context
Artificial Intelligence (AI) has emerged as a major field impacting various sectors, including healthcare, automotive, robotics, and more. Hardware architectures must now meet increasingly demanding requirements in terms of computational power, low latency, and flexibility. Network-on-Chip (NoC) technology is a key enabler in addressing these challenges, providing efficient and scalable interconnections within multiprocessor systems. However, despite its benefits, designing NoCs poses significant challenges, particularly in optimizing latency, energy consumption, and scalability.
Programmable cluster architectures hold great promise for AI as they enable resource adaptation to meet the specific needs of deep learning algorithms and other compute-intensive AI applications. By combining the modularity of clusters with the advantages of NoCs, it becomes possible to design systems capable of handling ever-increasing AI workloads while ensuring maximum energy efficiency and flexibility.
Summary of the Thesis Topic
This PhD project aims to design a scalable, programmable cluster architecture based on a Network-on-Chip tailored for future AI applications. The primary objective will be to design and optimize a NoC architecture capable of meeting the high demands of AI applications in terms of intensive computing and efficient data transfer between processing clusters.
The research will focus on the following key areas:
1. NoC Architecture Design: Developing a scalable and programmable NoC to effectively connect various AI processing clusters.
2. Performance and Energy Efficiency Optimization: Defining mechanisms to optimize system latency and energy consumption based on the nature of AI workloads.
3. Cluster Flexibility and Programmability: Proposing a modular and programmable architecture that dynamically allocates resources based on the specific needs of each AI application.
4. Experimental Evaluation: Implementing and testing prototypes of the proposed architecture to validate its performance on real-world use cases, such as image classification, object detection, and real-time data processing.
The outcomes of this research may contribute to the development of cutting-edge embedded systems and AI solutions optimized for the next generation of AI applications and algorithms.

The work performed during this thesis will be presented at international conferences and scientific journals. Certain results may be patented.

Hardware-aware Optimizations for Efficient Generative AI with Mamba Networks

Generative AI has the potential to transform various industries. However, current state-of-the-art models like transformers face significant challenges in computational and memory efficiency, especially when deployed on resource-constrained hardware. This PhD research aims to address these limitations by optimizing Mamba networks for hardware-aware applications. Mamba networks offer a promising alternative by reducing the quadratic complexity of self-attention mechanisms through innovative architectural choices. By leveraging techniques such as sparse attention patterns and efficient parameter sharing, Mamba networks can generate high-quality data with significantly lower resource demands. The research will focus on implementing hardware-aware optimizations to enhance the efficiency of Mamba networks, making them suitable for real-time applications and edge devices. This includes optimizing training and inference times, as well as exploring potential hardware accelerations. The goal is to advance the practical deployment of generative AI in resource-constrained domains, contributing to its broader adoption and impact.

Fast parameter inference of gravitational waves for the LISA space mission

Context
In 2016, the announcement of the first direct detection of gravitational waves ushered in an era in which the universe will be probed in an unprecedented way. At the same time, the complete success of the LISA Pathfinder mission validated certain technologies selected for the LISA (Laser Interferometer Space Antenna) project. The year 2024 started with the adoption of the LISA mission by the European Space Agency (ESA) and NASA. This unprecedented gravitational wave space observatory will consist of three satellites 2.5 million kilometres apart and will enable the direct detection of gravitational waves at undetectable frequencies by terrestrial interferometers. ESA plans a launch in 2035.
In parallel with the technical aspects, the LISA mission introduces several data analysis challenges that need to be addressed for the mission’s success. The mission needs to prove that with simulations, the scientific community will be able to identify and characterise the detected gravitational wave signals. Data analysis involves various stages, one of which is the rapid analysis pipeline, whose role is to detect new events and characterise the detected events. The last point concerns the rapid estimation of the position in the sky of the source of gravitational wave emission and their characteristic time, such as the coalescence time for a black hole merger.
These analysis tools form the low-latency analysis pipeline. As well as being of interest to LISA, this pipeline also plays a vital role in enabling multi-messenger astronomy, consisting of rapidly monitoring events detected by electromagnetic observations (ground-based or space-based observatories, from radio waves to Gamma rays).

PhD project
The PhD project focuses on the development of event detection and identification tools for the low-latency alert pipeline (LLAP) of LISA. This pipeline will be an essential part of the LISA analysis workflow, providing a rapid detection of massive black hole binaries, as well as a fast and accurate estimation of the sources’ sky localizations as well as coalescence time. These are key information for multi-messager follow-ups as well as for the global analysis of the LISA data.
While rapid analysis methods have been developed for ground-based interferometers, the case of space-based interferometers such as LISA remains a field to be explored. Adapted data processing will have to consider how data is transmitted in packets, making it necessary to detect events from incomplete data. Using data marred by artefacts such as glitches or missing data packages, these methods should enable the detection, discrimination and analysis of various sources: black hole mergers, EMRIs (spiral binaries with extreme mass ratios), bursts and binaries from compact objects. A final and crucial element of complexity is the speed of analysis, which constitutes a strong constraint on the methods to be developed.
To this end, the problems we will be tackling during this thesis will be:
1. The fast parameter inference of the gravitational waves, noticeably, the sky position, and the coalescence time. Two of the main difficulties reside in the multimodality of the posterior probability distribution of the target parameters and the stringent computing time requirements. To that end, we will consider different advanced inference strategies including:
(a) Using gradient-based sampling algorithms like Langevin diffusions or Hamiltonian Monte Carlo methods adapted to LISA’s gravitational wave problem,
(b) Using machine learning-assisted methods to accelerate the sampling (e.g. normalising flows),
(c) Using variational inference techniques.
2. The early detection of black hole mergers.
3. The increasing complexity of LISA data, including, among others, realistic noise, realistic instrument response, glitches, data gaps, and overlapping sources.
4. The online handling of the incoming 5-minute data packages with the developed fast inference framework.
This thesis will be based on applying Bayesian and statistical methods for data analysis and machine learning. However, an effort on the physics part is necessary, both to understand the simulations and the different waveforms considered (with their underlying hypotheses) and to interpret the results regarding the detectability of black hole merger signals in the context of the rapid analysis of LISA data.

Spectrometry and Artificial Intelligence: development of explainable, sober and reliable AI models for materials analysis

The discovery of new materials is crucial to meeting many current societal challenges. One of the pillars of this discovery capacity is to have means of characterizing these materials which are rapid, reliable and whose measurement uncertainties are qualified, even quantified.

This PhD project is part of this approach and aims to significantly improve the different ion beam induced spectrometry (IBA) techniques using advanced artificial intelligence (AI) methods. This project aims to develop explainable, sober and reliable AI models for materials analysis.
The PhD project proposed here has three main objectives:

- Develop an uncertainty model using probabilistic machine learning techniques in order to quantify the uncertainties associated with a prediction.
- Due to the very large number of possible combinatory-generated configurations, it is important to understand the intrinsic dimensionality of the problem. We wish to implement means of massive dimensionality reduction, in particular non-linear methods such as autoencoders, as well as PIML (Physics Informed Machine Learning) concepts.
- Evaluate the possibility of generalization of this methodology to other spectroscopic techniques.

Top