Few-shot and zero-shot models for Information Extraction

Information Extraction aims to identify concepts or facts in texts and to structure the information. In this field, a major challenge is to design high-performance models using only few annotated data (few-shot), or even no annotated data at all (zero-shot). The proposed topic for this PhD falls within this framework, and will focus in particular on exploiting the capabilities of large pre-trained language models (LLMs) for this task. More specifically, the avenues explored could cover approaches for large models distillation in order to produce training data for information extraction, a study of possible synergies between large-scale model pre-training and episodic meta-learning, or the proposal of new methods for building pre-training data, using for example distant supervision from structured knowledge bases.

Towards trustworthy business process management with blockchain

Blockchain and distributed ledgers are promising technologies for managing inter-organizational business processes, particularly among participants who do not trust each other. Operating in a decentralized and distributed manner, they eliminate the need for a central authority, enabling secure and efficient interactions.

The deployment of blockchain-based architectures relies on specific components such as smart contracts, as well as external services like cloud-based data storage and web services called through oracles. This ecosystem requires deep expertise to define and implement needs related to trust and traceability.

The objective of this thesis is to develop a tool to aid in designing trustworthy business process management applications. A no-code/low-code approach will enable the specification and generation of the corresponding blockchain-based architecture. The use of large language models (LLMs) to support model-based engineering will be considered. The generated architectures will aim to leverage blockchain in a frugal manner, thus minimizing overall energy consumption. Additionally, the reliability of the generated smart contracts will be ensured through formal verification approaches.

Privacy-preserving federated learning over vertically partitioned data from heterogeneous participants

Federated learning enables multiple participants to collaboratively train a global model, without sharing their data, but only model parameters are exchanged between the participants and the server. In vertical federated learning (VFL), datasets of the participants share similar samples, but have different features. For instance, companies and institutions from different fields own data with different features of overlapping samples collaborate to solve a machine learning task. Though data are private, VFL remains vulnerable to attacks such as label and feature inference attacks. Various privacy measures (e.g., differential privacy, homomorphic encryption) have been investigated to prevent privacy leakage. Choosing the appropriate measures is a challenging task as it depends on the VLF architecture and the desired level of privacy (e.g., local models, intermediate results, learned models). The variability of each participant’s system can also result in high latency and asynchronous updates, affecting training efficiency and model effectiveness.

The aim of this thesis is to propose methods to enable privacy-preserving VFL, taking into account the heterogeneity of the participants. First, the candidate will study the architectures of VFL models and the privacy measures to propose privacy-preserving protocols for VFL. Second, the candidate will investigate the impacts of the heterogeneity of the participants’ system such as computation and communication resources to devise solutions to render the VFL protocols robust to such heterogeneity. Third, the trade-offs among effectiveness, privacy, and efficiency in VFL will be explored to propose a practical framework for adjusting the protocols according to the requirements of a given machine learning problem.

Modeling and Simulation of Human Behavior for Human-Centric Digital Twins

Thanks to synchronized virtual representation, digital twins are a means to produce analyses, predictions and optimizations of real-world systems. However, some of these systems tightly engage with humans so that the role of the latter is determining in the system’s operation. This is for example the case in contexts such as industry 5.0 or the management of the control of critical systems, where the quality of collaboration between humans and machines will depend on the anticipation of their respective actions, interactions and decisions. Thus, to improve the accuracy of predictions and expand the applicability in various fields, it is necessary, based on knowledge from the human and social sciences, to develop digital twins that account for the complexity and richness of human behaviors (decision-making processes, interactions, emotions, etc.). These behavioral models may notably rely on machine learning, data mining, agent-based modeling and knowledge engineering. After having identified the useful human behavior models, we will study their conceptual articulation and their technical integration with the models of cyber-physical entities in the digital twin system. Additionally, we will explore how digital twin services are impacted and can be revised to account for these human-centric aspects. Finally, we will evaluate the effectiveness of human-centric digital twins in various applications by implementing experiments on representative real cases.
This research work aims to make the following contributions:
• The development of an approach based on human behavior models to achieve human-centric digital twins.
• New knowledge on the impact of human behavior on the control of a system and vice versa.
• Practical applications and guidelines for using human-centric digital twins in real-world scenarios.
This PhD will be carried out at Grenoble.

Quantum Machine Learning in the era of NISQ: can QML provide an advantage for the learning part of Neural Networks?

Quantum computing is believed to offer a future advantage in a variety of algorithms, including those challenging for traditional computers (e.g., Prime Factorization). However, in an era where Noisy Quantum Computers (QCs) are the norm, practical applications of QC would be centered around optimization approaches and energy efficiency rather than purely algorithmic performance.

In this context, this PhD thesis aims to address the utilization of QC to enhance the learning process of Neural Networks (NN). The learning phase of NN is arguably the most power-hungry aspect with traditional approaches. Leveraging quantum optimization techniques or quantum linear system solving could potentially yield an energy advantage, coupled with the ability to perform the learning phase with a less extensive set of training examples.

Deep Learning Inverse Problem Solving Applied to Interferometry

In-memory analog computing for AI attention mechanisms

The aim of this thesis is to explore the execution of attention mechanisms for Artificial Intelligence directly within a cutting-edge Non-Volatile Memory (NVM) technology.

Attention mechanisms represent a breakthrough in Artificial Intelligence (AI) algorithms and represent the performance booster behind “Transformers” neural networks.
Initially designed for natural language processing, such as ChatGPT, these mechanisms are widely employed today in embedded application domains such as: predicting demand in an energy/heat network, predictive maintenance, and monitoring of transport infrastructures or industrial sites.
Despite their widespread use, attention-based workloads demand extensive data access and computing power, resulting in high power consumption, which may be impractical to target embedded hardware systems.

The non-volatile memristor technology offers a promising solution by enabling analog computing functions with minimal power consumption while serving as non-volatile storage for AI model parameters. Massive linear algebra algorithms can be executed faster, at an ultra-low energy cost, when compared with their fully-digital implementation.
However, the technology comes with limitations, e.g., variability, the number of bits to encode model parameters (i.e. quantization), the maximum size of vectors processed in parallel, etc.

This thesis focuses on overcoming these challenges in the context of embedded time-series analysis and prediction.
The key task is exploring the mapping of attention-based mechanisms to a spin-based memristor technology developed by the SPINTEC Laboratory.
This involves quantizing and partitioning AI models to align with the hardware architecture without compromising the performance of the prediction, and exploring the implementation of particular AI blocks into the memristor analog fabric.

This thesis is part of a collaboration between CEA List, Laboratoire d’Intelligence Intégrée Multi-Capteur, the Grenoble Institute of Engineering and Management and the SPINTEC Laboratory.
Joining this research presents a unique opportunity to work within an interdisciplinary and dynamic team at the forefront of the AI ecosystem in France, with strong connections to influential industrial players in the field.

Clip approach for improving energy efficiency of hardware embedding combinations

In a global context of task automation, artificial neural networks are currently used in many domains requiring the processing of data from sensors: vision, sound, vibration.
Depending on different constraints, the information processing can be done on the Cloud (SIRI, AWS, TPU) or in an embedded way (NVidia's Jetson platform, Movidius, CEA-LIST's PNeuro/DNeuro). In this second case, many hardware constraints must be taken into account when dimensioning the algorithm. In order to improve the porting on hardware platforms, LIST has developed innovative state-of-the-art methods allowing to aggressively quantize the parameters of a neural network as well as to modify the coding of the activations to reduce the number of calculations to be performed.
The energy efficiency of neuromorphic architectures with equivalent technology is constrained by the classic paradigm of flexibility vs. efficiency. In other words, the more different tasks (and networks) an architecture is capable of performing, the less energy-efficient it becomes. While this relationship cannot be circumvented for a wide variety of algorithms, neural networks are parametric functions, learned for one and therefore potentially adaptable to other tasks by partial modification of the topology and/or parameters.
One technique, CLIP, seems to provide an answer, with a strong capacity for adaptation to a variety of tasks and the possibility of using multimodality. In its original form, CLIP is presented as a method for matching text and images to create a classification task.
The aim of this thesis is to study the hardware implementation of CLIP by proposing a dedicated architecture. The thesis is organized into 3 main phases, beginning with a study of CLIP's mechanisms, the operations to be performed and the consequences for embedding networks. Secondly, hardware optimizations applicable to CLIP, such as quantization (or others) and an estimation of flexibility vs. applicative generality. Finally, an architectural and implementation proposal to measure energy efficiency.

Artificial Intelligence for Mass Measurement of Exotic Isotopes

Artificial intelligence opens new perspectives for basic science. It is no exception for nuclear structure studied at the extreme of the nuclear chart by the Super Separator Spectrometer (S3) under construction at GANIL-SPIRAL2. The Piège à Ions Linéaire du Ganil pour la Résolution des Isotopes en Masse (PILGRIM) is a Multi-Reflection time-of-flight Mass Spectrometer (MR-ToF-MS), with state-of-the-art performances that can only be exploited fully thanks to a joint development with the FASTER (http://faster.in2p3.fr/) data acquisition at LPC Caen. The PhD thesis will consist in carrying out this development with the FASTER developers and the physicist in charge of PILGRIM. Machine learning techniques will have to be employed to recognize patterns in the time-of-flight of ions extracted as bunches from the S3 Low Energy Branch. For each individual ion, the time of flight will have to be determined with sub-nanosecond precision, correcting for effects due to pile-up, gain and baseline fluctuations. This development should lead to the determination of masses of exotic nuclei with exquisite precision, enabling tests of nuclear physics models in previously uncharted territories.

ADVANCED ARTIFICIAL INTELLIGENCE TECHNIQUES FOR PARTICLE RECONSTRUCTION IN THE CMS DETECTOR USING PRECISION TIMING AND ATTENTION MECHANISM

Particle reconstruction in collider detectors is a multidimensional problem where machine learning algorithms offer the potential for significant improvements over traditional techniques. In the Compact Muon Solenoid (CMS) detector at the Large Hadron Collider (LHC), photons and electrons produced by the collisions at the interaction point are recorded by the CMS Electromagnetic Calorimeter (ECAL). The large number of collisions, coupled with the detector's complex geometry, make the reconstruction of clusters in the calorimeter a formidable challenge. Traditional algorithms struggle to distinguish between overlapping clusters created by proximate particles. In contrast, It has been shown that graph neural networks offer significant advantages, providing better differentiation between overlapping clusters without being negatively affected by the sparse topology of the events. However, it is crucial to understand which extracted features contribute to this superior performance and what kind of physics information they contain. This understanding is particularly important for testing the robustness of the algorithms under different operating conditions and for preventing any biases the network may introduce due to the difference between data and simulated samples (used to train the network).
In this project, we propose to use Gradient-weighted Class Activation Mapping (Grad-CAM) and its attention mechanism aware derivatives to interpret the algorithm's decisions. By evaluating the extracted features, we aim to derive analytical relationships that can be used to modify existing lightweight traditional algorithms.
Furthermore, with the upcoming High Luminosity upgrade of the LHC, events involving overlapping clusters are expected to become even more frequent, thereby increasing the need for advanced deep learning techniques. Additionally, precision timing information of the order of 30 ps will be made available to aid in particle reconstruction. In this PhD project, we also aim to explore deep learning techniques that utilize Graph and Attention mechanisms (Graph Attention Networks) to resolve spatially proximate clusters using timing information. We will integrate position and energy deposition data from the ECAL with precision timing measurements from both the ECAL and the new MIP Timing Detector (MTD). Ultimately, the developed techniques will be tested in the analysis of a Higgs boson decaying into two beyond-the-standard-model scalar particles.

We are seeking an enthusiastic PhD candidate who holds an MSc degree in particle physics and is eager to explore cutting-edge artificial intelligence techniques. The selected candidate will also work on the upgrade of the CMS detector for the high-luminosity LHC.

Top