Post-training neural architecture optimization for small language models

Generative AI, and particularly language models (LLM), have sparked a new revolution in AI with applications across all domains. However, LLMs are highly resource-intensive and, hence, difficult to implement on autonomous embedded systems. LLMs can be optimized by modifying their architecture to replace heavy Transformer layers with lighter alternatives. Given the difficulty of training LLM "from scratch," this thesis aims to develop post-training neural architecture optimization methods applicable to small LLM (SLM). Additionally, the thesis seeks to propose performance metrics of different layers of an SLM and their alternatives, to guide the replacement, and thus propose a comprehensive methodology for optimizing SLMs while considering hardware constraints. The work will be valorized through publications in major AI conferences and journals, and the developed codes and methods could be integrated into the tools developed at CEA.

Reconciling predictability and performance in processor architectures for critical systems

Critical systems have both functional and timing requirements, the latter ensuring that deadlines are always met during operation; failure to do so may lead to catastrophic consequences. The critical nature of such systems demands specialized hardware and software solutions. This PhD thesis topic focuses on the development of computer architecture designs for critical systems, known as predictable architectures, capable of providing the necessary timing guarantees. Several such architectures exist, typically based on in-order pipelines and incorporating behavioral restrictions (e.g., disabling complex speculation mechanisms) or structural specializations (e.g., redesigned caches or deterministic arbitration for shared resources). These restrictions and specializations inevitably impact performance, and the design of predictable architectures must therefore address the predictability–performance tradeoff directly. This PhD thesis aims to explore this tradeoff in a novel way, by adapting a high-performance variant of an in-order processor (CVA6) and developing top-down techniques to make it predictable. Performance in such processors is usually achieved through mechanisms like branch prediction, prefetching, and value prediction, implemented via specialized storage elements (e.g., buffers) and supported by control mechanisms such as rollback on misprediction. Within this context, the goal of the thesis is to define a general predictability scheme for speculative execution, covering both storage organization and rollback behavior.

Sofware support for computing accelerators and memory transferts accelerators

For energy reasons, future computers will have to use accelerators for both computation and memory access (GPUs, TPUs, NPUs, smart DMAs). AI applications have intensive computational requirements in terms of both computing power and memory throughput.

These accelerators are not based on a simple instruction set (ISA), they break the Von Neuman model: they require specialized code to be written manually.

Furthermore, it is difficult to compare the use of these accelerators with code using a non-specialized processor, as the initial source codes are very different.

HybroLang is a hardware-close programming language that allows programs to be written using all of a processor's computing capabilities, while also allowing code to be specialized based on data known at runtime.

The HybroGen compiler has already demonstrated its ability to program in-memory computing accelerators, as well as to optimize code on conventional CPUs by performing innovative optimizations.

This thesis proposes to extend the HybroLang language in order to

- facilitate the programming of AI applications by providing support for complex data: stencils, convolution, sparse computing

- enable code generation both on CPUs and with hardware accelerators currently under development at the CEA (sparse computing, in-memory computing, memory access)

- allow to benchmark different computing architectures with the same initial source code

Ideally, a candidate should have knowledge of computer architecture, programming language implementation, code optimization and compilation.

Development of Machine Learning algorithm to optimize the control of absorption machines

The Thermal and Solar Technologies Laboratory (L2TS) and the Energy Systems for Territories Laboratory (LSET), located at the CEA LITEN site in Le Bourget-de-Lac, are offering a cross-disciplinary PhD thesis combining thermodynamics and optimization using Artificial Intelligence.

Specifically, this doctoral research project involves developing a machine learning algorithm to optimize the control of absorption machines. These machines are thermodynamic cycles able to produce heat or cold from an intermediate heat input; thus, offering potential valorization of industrial waste heat or renewable energies, such as solar thermal. Heat exchange is made possible by the absorption and desorption reactions of a gaseous refrigerant in a fluid. Specifically, the NH3-H2O mixture will be used. The dynamic operation of these cycles is extremely complex because the operational variables, physical parameters, and hydrodynamic aspects are highly intertwined. Thus, the use of a neural network is particularly relevant for establishing an adaptive control strategy for these machines.

The thesis will have a theoretical aspect, involving the study and selection of the most suitable algorithm to address the problem, and an experimental aspect of validation on a prototype absorption machine. The project will also involve the design of a controller for implementation.

The thesis will take place in a CEA laboratory in Bourget du Lac.

Model-Driven DevOps for Cloud Orchestration : Bridging Design-Time and Runtime Guarantees

Model-Driven Engineering (MDE) has traditionally relied on a clear separation between design and runtime, but this boundary no longer holds in today's cloud-native and edge environments, where infrastructures are heterogeneous, dynamic, and continuously evolving. Assumptions validated at design time may become invalid during execution, and modern orchestration platforms such as Kubernetes or OpenStack, while effective, remain weakly connected to architectural modeling environments. This results in a structural gap between architectural specification and actual operational behavior. To bridge this gap, this thesis proposes to develop a formal modeling framework for placement constraints across heterogeneous orchestration platforms, ensuring continuity between design-time validation and runtime guarantees. This framework would elevate placement constraints — resource locality, affinity, network latency, security isolation, and quality-of-service objectives — to first-class modeling constructs. At design time, it would enable static feasibility analysis and automated generation of deployment artifacts; at runtime, it would ensure continuous compliance monitoring and adaptive reconfiguration in response to violations. Expected contributions include a formal modeling language, bidirectional transformations between design-time models and runtime representations, and integration with Papyrus-based tooling. The ultimate goal is to ensure that architectural intent remains consistent and verifiable throughout the entire system lifecycle, from initial design through to production operation.

Learning Mechanisms for Detecting Abnormal Behaviors in Embedded Systems

Embedded systems are increasingly used in critical infrastructures (e.g., energy production networks) and are therefore prime targets for malicious actors. The use of intrusion detection systems (IDS) that dynamically analyze the system's state is becoming necessary to detect an attack before its impacts become harmful.
The IDS that interest us are based on machine learning anomaly detection methods and allow learning the normal behavior of a system and raising an alert at the slightest deviation. However, the learning of normal behavior by the model is done only once beforehand on a static dataset, even though the embedded systems considered can evolve over time with updates affecting their nominal behavior or the addition of new behaviors deemed legitimate.
The subject of this thesis therefore focuses on studying re-learning mechanisms for anomaly detection models to update the model's knowledge of normal behavior without losing information about its prior knowledge. Other learning paradigms, such as reinforcement learning or federated learning, may also be studied to improve the performance of IDS and enable learning from the behavior of multiple systems.

Vulnerability analysis of protocols on hardware devices

The Information Technology Security Evaluation Facility (ITSEF) conducts activities in the field of security evaluation of electronic systems, embedded software components, either within the framework of certification schemes, for example the one led by the l’Agence Nationale de la Sécurité des Systèmes d’information (ANSSI), or at the direct request of developers.
In the context of security evaluations conducted by the ITSEF, evaluators are required, among other things, to test the resistance of cryptographic mechanisms embedded on smart cards against physical attacks, such as chip tampering attacks or attacks by observing compromising signals. In an application context (banking, healthcare, identity), these mechanisms are used within cryptographic protocols, such as key exchanges or authentications. When a vulnerability is detected in a product, the evaluator must analyze its impact on the protocol. Currently, this analysis relies on the evaluator's expertise, but the use of formal methods would be advantageous for tracing attack paths or for providing greater assurance that the vulnerability will not be exploited.
Initially, this thesis will focus on studying existing verification tools (e.g., Tamarin [1]) in order to test them on the protocols used in commonly evaluated applications. The thesis will then aim to examine the different ways in which a vulnerability can be expressed within the protocol, and to evaluate the tool's ability to formally analyze its impacts by identifying attack paths. Finally, the PhD student will be required to enhance the tool with new components to address the identified needs.
References
[1] Tamarin : https://github.com/tamarin-prover/tamarin-prover

Lightweight CNN and Causal GNN for scene understanding

Scene understanding is a major challenge in computer vision, with recent approaches dominated by transformers (ViT, LLM, MLLM), which offer high performance but at a significant computational cost. This thesis proposes an innovative alternative combining lightweight convolutional neural networks (Lightweight CNN) and causal graph neural networks (Causal GNN) for efficient spatio-temporal analysis while optimizing computational resources. Lightweight CNNs enable high-performance extraction of visual features, while causal GNNs model dynamic relationships between objects in a scene graph, addressing challenges in object detection and relationship prediction in complex environments. Unlike current transformer-based models, this approach aims to reduce computational complexity while maintaining competitive accuracy, with potential applications in embedded vision and real-time systems.

Implementation of TFHE on RISC-V based embedded systems

Fully Homomorphic Encryption (FHE) is a technology that allows computations to be performed directly on encrypted data, meaning that we can process information without ever knowing its actual content. For example, it could enable online searches where the server never sees what you are looking for, or AI inference tasks on private data that remain fully confidential. Despite its potential, current FHE implementations remain computationally intensive and require substantial processing power, typically relying on high-end CPUs or GPUs with significant energy consumption. In particular, the bootstrapping operation represents a major performance bottleneck that prevents large-scale adoption. Existing CPU-based FHE implementations can take over 20 seconds on standard x86 architectures, while custom ASIC solutions, although faster, are prohibitively expensive, often exceeding 150 mm² in silicon area. This PhD project aims to accelerate the TFHE scheme, a more lightweight and efficient variant of FHE. The objective is to design and prototype innovative implementations of TFHE on RISC-V–based systems, targeting a significant reduction in bootstrapping latency. The research will explore synergies between hardware acceleration techniques developed for post-quantum cryptography and those applicable to TFHE, as well as tightly coupled acceleration approaches between RISC-V cores and dedicated accelerators. Finally, the project will investigate the potential for integrating a fully homomorphic computation domain directly within the processor’s instruction set architecture (ISA).

Investigation of polytopal methods apllied to CFD and optimized on GPU architecture

This research proposal focuses on the study and implementation of polytopal methods for solving the equations of fluid mechanics. These methods aim to handle the most general meshes possible, overcoming geometric constraints or those inherited from CAD operations such as extrusions or assemblies that introduce non-conformities. This work also falls within the scope of high-performance computing, addressing the increase in computational resources and, in particular, the development of massively parallel computing on GPUs.

The objective of this thesis is to build upon existing polytopal methods already implemented in the TRUST software, specifically the Compatible Discrete Operator (CDO) and Discontinuous Galerkin (DG) methods. The study will be extended to include convection operators and will investigate other methods from the literature, such as Hybrid High Order (HHO), Hybridizable Discontinuous Galerkin (HDG), and Virtual Element Method (VEM).

The main goals are to evaluate:
1. The numerical behavior of these different methods on the Stokes/Navier-Stokes equations;
2. The adaptability of these methods to heterogeneous architectures such as GPUs.

Top