Towards a better understanding of membrane proteins through AI

Despite the remarkable advances in artificial intelligence (AI), particularly with tools like AlphaFold, the prediction of membrane protein structures remains a major challenge in structural biology. These proteins, which represent 30% of the proteome and 60% of therapeutic targets, are still significantly underrepresented in the Protein Data Bank (PDB), with only 3% of their structures resolved. This rarity is due to the difficulty in maintaining their native state in an amphiphilic environment, which complicates their study, especially with classical structural techniques.

This PhD project aims to overcome these challenges by combining the predictive capabilities of AlphaFold with experimental small-angle scattering (SAXS/SANS) data obtained under physiological conditions. The study will focus on the translocator protein TSPO, a key marker in neuroimaging of several serious pathologies (cancers, neurodegenerative diseases) due to its strong affinity for various pharmacological ligands.

The work will involve predicting the structure of TSPO, both in the presence and absence of ligands, acquiring SAXS/SANS data of the TSPO/amphiphile complex, and refining the models using advanced modeling tools (MolPlay, Chai-1) and molecular dynamics simulations. By deepening the understanding of TSPO’s structure and function, this project could contribute to the design of new ligands for diagnostic and therapeutic purposes.

Spectrometry and Artificial Intelligence: development of explainable, sober and reliable AI models for materials analysis

The discovery of new materials is crucial to meeting many current societal challenges. One of the pillars of this discovery capacity is to have means of characterizing these materials which are rapid, reliable and whose measurement uncertainties are qualified, even quantified.

This PhD project is part of this approach and aims to significantly improve the different ion beam induced spectrometry (IBA) techniques using advanced artificial intelligence (AI) methods. This project aims to develop explainable, sober and reliable AI models for materials analysis.
The PhD project proposed here has three main objectives:

- Develop an uncertainty model using probabilistic machine learning techniques in order to quantify the uncertainties associated with a prediction.
- Due to the very large number of possible combinatory-generated configurations, it is important to understand the intrinsic dimensionality of the problem. We wish to implement means of massive dimensionality reduction, in particular non-linear methods such as autoencoders, as well as PIML (Physics Informed Machine Learning) concepts.
- Evaluate the possibility of generalization of this methodology to other spectroscopic techniques.

Fast parameter inference of gravitational waves for the LISA space mission

Context
In 2016, the announcement of the first direct detection of gravitational waves ushered in an era in which the universe will be probed in an unprecedented way. At the same time, the complete success of the LISA Pathfinder mission validated certain technologies selected for the LISA (Laser Interferometer Space Antenna) project. The year 2024 started with the adoption of the LISA mission by the European Space Agency (ESA) and NASA. This unprecedented gravitational wave space observatory will consist of three satellites 2.5 million kilometres apart and will enable the direct detection of gravitational waves at undetectable frequencies by terrestrial interferometers. ESA plans a launch in 2035.
In parallel with the technical aspects, the LISA mission introduces several data analysis challenges that need to be addressed for the mission’s success. The mission needs to prove that with simulations, the scientific community will be able to identify and characterise the detected gravitational wave signals. Data analysis involves various stages, one of which is the rapid analysis pipeline, whose role is to detect new events and characterise the detected events. The last point concerns the rapid estimation of the position in the sky of the source of gravitational wave emission and their characteristic time, such as the coalescence time for a black hole merger.
These analysis tools form the low-latency analysis pipeline. As well as being of interest to LISA, this pipeline also plays a vital role in enabling multi-messenger astronomy, consisting of rapidly monitoring events detected by electromagnetic observations (ground-based or space-based observatories, from radio waves to Gamma rays).

PhD project
The PhD project focuses on the development of event detection and identification tools for the low-latency alert pipeline (LLAP) of LISA. This pipeline will be an essential part of the LISA analysis workflow, providing a rapid detection of massive black hole binaries, as well as a fast and accurate estimation of the sources’ sky localizations as well as coalescence time. These are key information for multi-messager follow-ups as well as for the global analysis of the LISA data.
While rapid analysis methods have been developed for ground-based interferometers, the case of space-based interferometers such as LISA remains a field to be explored. Adapted data processing will have to consider how data is transmitted in packets, making it necessary to detect events from incomplete data. Using data marred by artefacts such as glitches or missing data packages, these methods should enable the detection, discrimination and analysis of various sources: black hole mergers, EMRIs (spiral binaries with extreme mass ratios), bursts and binaries from compact objects. A final and crucial element of complexity is the speed of analysis, which constitutes a strong constraint on the methods to be developed.
To this end, the problems we will be tackling during this thesis will be:
1. The fast parameter inference of the gravitational waves, noticeably, the sky position, and the coalescence time. Two of the main difficulties reside in the multimodality of the posterior probability distribution of the target parameters and the stringent computing time requirements. To that end, we will consider different advanced inference strategies including:
(a) Using gradient-based sampling algorithms like Langevin diffusions or Hamiltonian Monte Carlo methods adapted to LISA’s gravitational wave problem,
(b) Using machine learning-assisted methods to accelerate the sampling (e.g. normalising flows),
(c) Using variational inference techniques.
2. The early detection of black hole mergers.
3. The increasing complexity of LISA data, including, among others, realistic noise, realistic instrument response, glitches, data gaps, and overlapping sources.
4. The online handling of the incoming 5-minute data packages with the developed fast inference framework.
This thesis will be based on applying Bayesian and statistical methods for data analysis and machine learning. However, an effort on the physics part is necessary, both to understand the simulations and the different waveforms considered (with their underlying hypotheses) and to interpret the results regarding the detectability of black hole merger signals in the context of the rapid analysis of LISA data.

Bayesian Inference with Differentiable Simulators for the Joint Analysis of Galaxy Clustering and CMB Lensing

The goal of this PhD project is to develop a novel joint analysis for the DESI galaxy clustering
and Planck PR4/ACT CMB lensing data, based on numerical simulations of the surveys and
state-of-the-art machine learning and statistical inference techniques. The aim is to overcome
many of the limitations of the traditional approaches and improve the recovery of cosmological
parameters. The joint galaxy clustering - CMB lensing inference will significantly improve
constraints on the growth of structure upon DESI-only analyses and refine even more the test of general relativity.

X-ray diffusion assisted by Artificial Intelligence: the problem of the representativeness of synthetic databases and the indistinguishability of predictions.

The advent of artificial intelligence makes it possible to accelerate and democratize the processing of small-angle X-ray scattering (SAXS) data, an expert technique for characterizing nanomaterials that allows to determine the specific surface area, volume fraction and characteristic sizes of structures between 0.5 to 200 nm.

However, there is a double problem around SAXS assisted by Artificial Intelligence: 1) the scarcity of data requires training the models on synthetic data, which poses the problem of their representativeness of real data, and 2) the laws of physics stipulate that several candidate nanostructures can correspond to a SAXS measurement, which poses the problem of the indistinguishability of predictions. This thesis therefore aims to build an artificial intelligence model adapted to SAXS trained on experimentally validated synthetic data, and on the expert response which weights the categorization of predictions by their indistinguishability.

Integrity, availability and confidentiality of embedded AI in post-training stages

With a strong context of regulation of AI at the European scale, several requirements have been proposed for the "cybersecurity of AI" and more particularly to increase the security of complex modern AI systems. Indeed, we are experience an impressive development of large models (so-called “Foundation” models) that are deployed at large-scale to be adapted to specific tasks in a wide variety of platforms and devices. Today, models are optimized to be deployed and even fine-tuned in constrained platforms (memory, energy, latency) such as smartphones and many connected devices (home, health, industry…).

However, considering the security of such AI systems is a complex process with multiple attack vectors against their integrity (fool predictions), availability (crash performance, add latency) and confidentiality (reverse engineering, privacy leakage).

In the past decade, the Adversarial Machine Learning and privacy-preserving machine learning communities have reached important milestones by characterizing attacks and proposing defense schemes. Essentially, these threats are focused on the training and the inference stages. However, new threats surface related to the use of pre-trained models, their unsecure deployment as well as their adaptation (fine-tuning).

Moreover, additional security issues concern the fact that the deployment and adaptation stages could be “on-device” processes, for instance with cross-device federated learning. In that context, models are compressed and optimized with state-of-the-art techniques (e.g., quantization, pruning, Low Rank Adaptation) for which their influence on the security needs to be assessed.

The objectives are:
(1) Propose threat models and risk analysis related to critical steps, typically model deployment and continuous training for the deployment and adaptation of large foundation models on embedded systems (e.g., advanced microcontroller with HW accelerator, SoC).
(2) Demonstrate and characterize attacks, with a focus on model-based poisoning.
(3) Propose and develop protection schemes and sound evaluation protocols.

Embedded systems for natural acoustic signals analysis while preserving privacy

The PhD topic aims at developping Embedded systems to record and analyze natural acoustic signals. When targeting city deployement, the privacy issue is raised: how can we keep a satisfactory analysis level while never record or transmit human voices?

Top