AI-assisted generation of Instruction Set Simulators
The simulation tools for digital architectures rely on various types of models with different levels of abstraction to meet the requirements of hardware/software co-design and co-validation. Among these models, higher-level ones enable rapid functional validation of software on target architectures.
Developing these functional models often involves a manual process, which is both tedious and error-prone. When low-level RTL (Register Transfer Level) descriptions are available, they serve as a foundation for deriving higher-level models, such as functional ones. Preliminary work at CEA has resulted in an initial prototype based on MLIR (Multi-Level Intermediate Representation), demonstrating promising results in generating instruction execution functions from RTL descriptions.
The goal of this thesis is to further explore these initial efforts and subsequently automate the extraction of architectural states, leveraging the latest advancements in machine learning for EDA. The expected result is a comprehensive workflow for the automatic generation of functional simulators (a.k.a Instruction Set Simulators) from RTL, ensuring by construction the semantic consistency between the two abstraction levels.
Design and Analysis of Side-Channel Feedback for Vulnerability Discovery
Fuzzing is a dynamic testing technique that enables vulnerabilities to be discovered very efficiently. Hundreds or even thousands of vulnerabilities are detected (and repaired) every year in the software we use. When we try to transpose the fuzzing approach to embedded systems, we are faced with a number of problems: the source code is not always available, very little information is available about the behaviour of the system at runtime and, finally, it is difficult to detect whether a bug has appeared. For several years now, the LTSO laboratory has been developing state-of-the-art techniques for analysing auxiliary channels, in particular the electromagnetic radiation produced by systems during operation. These measurements make it possible to infer information (data, executed code) about the behaviour of the system in a non-intrusive way. The aim of this thesis is therefore to determine whether these side-channel measurements can be used to improve the fuzzing process on embedded systems. The use of this new source of information also opens the door to the discovery of new classes of vulnerabilities, such as micro-architectural vulnerabilities.
The PhD will take place at CEA Grenoble, within the LETI institute, in a research team dedicated to the study and development of solutions for the security of present and future electronic systems (http://www.leti-cea.com/cea-tech/leti/english/Pages/Applied-Research/Facilities/cyber-security-platform.aspx).
Translated with www.DeepL.com/Translator (free version)
Generative artificial intelligence algorithms for understanding and countering online polarization
Digital platforms enable the widespread dissemination of information, but their engagement-centric business models often promote the spread of ideologically homogeneous or controversial political content. These models can lead to the polarization of political opinions and impede the healthy functioning of democratic systems. The PhD will investigate innovative generative AI models devised for a deep understanding of political polarization and for countering its effects. It will mobilize several areas of AI: generative learning, frugal AI, continual learning, and multimedia learning. Advances will be associated with the following challenges:
-the modeling of political polarization, and the translation of the obtained domain model into actionable implementation requirements that will be used as inputs of AI algorithms;
-the curation of massive and diversified multimodal political data to ensure topical and temporal coverage, and to map these data to a common semantic representation space;
-the training of politics-oriented generative models to encode relevant knowledge effectively and efficiently and to generate labeled training data for downstream tasks;
-the specialization of the models for the specific tasks needed for a fine-grained understanding of polarization (topic detection, entity recognition, sentiment analysis);
-the continual update of the politics-oriented generative models and polarization-specific tasks to keep pace with the evolution of political events and news.
Combining over and under-approximations for low-level code analysis
Because program analysis is an undecidable problem, static analyzes fall into 2 categories:
- Sound analyses, which calculate an over-approximation of all the program's behaviors, and make it possible to verify the absence of errors.
- Complete analyses, which calculate an under-approximation of the possible behaviors of the program, and allow errors to be found.
The aim of the thesis is to study this combination of techniques for the verification of low-level programs, in particular binary executables, by targeting in particular the analyzes of memories and control flow, which are the capital information for understanding the possible behaviors of a binary executable.
[POPL'23]: SSA Translation is an Abstract Interpretation, M. Lemerre (Distinguished paper award)
[VMCAI'22]: Lightweight Shape Analysis Based on Physical Types, O. Nicole, M. Lemerre, X. Rival
[RTAS'21]: No Crash, No Exploit: Automated Verification of Embedded Kernels, O. Nicole, M. Lemerre, S. Bardin, X. Rival (best paper award)
[LPAR'18] Arrays Made Simpler: An Efficient, Scalable and Thorough Preprocessing B. Farinier, S. Bardin, R. David, M. Lemerre
Proof of functional equivalence of binary codes in the context of embedded program hardening
The general context of this thesis is the cyber-security of embedded
systems. The research background of this thesis is tied to the
automatic application of counter-measures against the so-called physical
attacks, which encompass observation attacks (side-channel attacks) and
perturbation attacks (fault-injection attacks).
The CEA List is working on COGITO, a compiler toolchain based on LLVM
for the automatic application of software counter-measures against
physical attacks. Given a source-level implementation of an unprotected
program, our toolchain produces an optimised binary program including
targeted counter-measures, such that the compiled program is hardened
against a specified threat model. Two key points are today crucial to
trust the compiled programs:
1. the proof of robustness of programs produced by our toolchain,
2. the proof that adding counter-measures does not alter the
functionality of the target programs.
This thesis will target the second point: bringing formal guarantees
about the functional correctness of the secured programs. We will use
sound and exhaustive symbolic reasoning, supported by BINSEC
(). BINSEC is an open-source toolset
developed at CEA List to help improve software security at the binary
level. It relies on cutting-edge research in binary code analysis, at
the intersection of formal methods, program analysis, security and
software engineering.
The PhD thesis will be hosted at the CEA in Saclay, within the BINSEC team.
Short-term stays
at CEA Grenoble will be planned throughout the three
years of the thesis to collaborate with experts and developers of
COGITO.
Monte Carlo simulation of the reactor transfer function to improve neutron noise measurement analyses
The neutron population in a reactor fluctuates due to the random nature of neutron emission and various sources of mechanical vibrations, which can impact macroscopic neutron cross sections. The reactor can be seen as a system with a transfer function that connects an excitation (such as a vibration or the random nature of neutron emissions from fission) to the neutron population. The study and measurement of this transfer function allow us to deduce essential neutron parameters related to the kinetics of delayed neutron emission or even the source of thge vibrations. However, the theoretical expression of this transfer function is often based on the kinetics of the point reactor, which in some cases does not reliably exploit the measurements.
In this thesis work, we propose to study various extensions of the neutron transfer function formalism using Monte Carlo simulations. First, we will simulate fluctuations using a simplified C++ model to confirm the assumptions of theoretical equations for "neutron noise" that can be used to "measure" the effective fraction of delayed neutrons. We will then seek to optimize the positioning of detectors in a reactor and interpret certain effects related to positioning already observed in past experiments conducted by CEA.
Portable GPU-based parallel algorithms for nuclear fuel simulation on exascale supercomputers
Dans le cadre de cette thèse, nous cherchons un candidat francophone uniquement.
Assisted generation of complex computational kernels in solid mechanics
The behavior laws used in numerical simulations describe the physical characteristics of simulated materials. As our understanding of these materials evolves, the complexity of these laws increases. Integrating these laws is a critical step for the performance and robustness of scientific computations. Therefore, this step can lead to intrusive and complex developments in the code.
Many digital platforms, such as FEniCS, FireDrake, FreeFEM, and Comsol, offer Just-In-Time (JIT) code generation techniques to handle various physics. This JIT approach significantly reduces the time required to implement new simulations, providing great versatility to the user. Additionally, it allows for optimization specific to the cases being treated and facilitates porting to various architectures (CPU or GPU). Finally, this approach hides implementation details; any changes in these details are invisible to the user and absorbed by the code generation layer.
However, these techniques are generally limited to the assembly steps of the linear systems to be solved and do not include the crucial step of integrating behavior laws.
Inspired by the successful experience of the open-source project mgis.fenics [1], this thesis aims to develop a Just-In-Time code generation solution dedicated to the next-generation structural mechanics code Manta [2], developed by CEA. The objective is to enable strong coupling with behavior laws generated by MFront [3], thereby improving the flexibility, performance, and robustness of numerical simulations.
The doctoral student will benefit from guidance from the developers of MFront and Manta (CEA), as well as the developers of the A-Set code (a collaboration between Mines-Paris Tech, Onera, and Safran). This collaboration within a multidisciplinary team will provide a stimulating and enriching environment for the candidate.
Furthermore, the thesis work will be enhanced by the opportunity to participate in conferences and publish articles in peer-reviewed scientific journals, offering national and international visibility to the thesis results.
[1] https://thelfer.github.io/mgis/web/mgis_fenics.html
[2] MANTA : un code HPC généraliste pour la simulation de problèmes complexes en mécanique. https://hal.science/hal-03688160
[3] https://thelfer.github.io/tfel/web/index.html
Automatic reverse-engineering of BIM models using Machine Learning
Today, BIM (Building Information Modelling) has become a standard for information management in factories, buildings and industrial facilities. The BIM approach is particularly well-suited to complex industrial environments, and nuclear facilities in particular, as it proves to be a relevant tool throughout the facility's lifecycle, from design to construction, operation and decommissioning, by offering shared, smart and structured modelling.
This approach is centered on a 3D model, generally produced from a point cloud obtained by lasergrammetry. In most cases, panoramic photos can be acquired at the same time. From this point cloud, a 3D model is generally rebuilt to represent the equipment present as set of single solid objects. This 3D model reconstruction stage is often long and tedious, and is currently carried out manually by a CAD designer.
This thesis proposes to develop an automatic method for reconstructing BIM models from point clouds using machine learning and image analysis, exploiting both the point cloud and available panoramic photos. The environments of nuclear facilities are composed of very specific steel or alloy processes, and mainly include piping equipment. By combining machine learning and computer vision, using both clustering and classification methods on the one hand, and shape and image recognition on the other, the work consists in directly identifying in the point cloud objects belonging to business object families such as pipes, elbows, valves, supports, fittings, tanks, etc., as well as some of their metadata: the material they are made of, their geometric properties (diameter, thickness, length), their volume and mass.
Artful guidance of test generation tools
Fuzzing is an automatic test generation technique. It consists in repeatedly executing a program with automatically generated inputs, in order to trigger crashes, symptoms of underlying bugs in the code, which can then be fixed. A major challenge in this area is moving from indiscriminate exploration of how programs work to artful guidance towards the discovery of important bugs. Taking advantage of the expertise of our team and our previous work, the objective of the thesis is to propose, implement and evaluate means to meet this challenge, taking advantage of finer-grained guidance mechanisms.