Post-training neural architecture optimization for small language models

Generative AI, and particularly language models (LLM), have sparked a new revolution in AI with applications across all domains. However, LLMs are highly resource-intensive and, hence, difficult to implement on autonomous embedded systems. LLMs can be optimized by modifying their architecture to replace heavy Transformer layers with lighter alternatives. Given the difficulty of training LLM "from scratch," this thesis aims to develop post-training neural architecture optimization methods applicable to small LLM (SLM). Additionally, the thesis seeks to propose performance metrics of different layers of an SLM and their alternatives, to guide the replacement, and thus propose a comprehensive methodology for optimizing SLMs while considering hardware constraints. The work will be valorized through publications in major AI conferences and journals, and the developed codes and methods could be integrated into the tools developed at CEA.

Reconciling predictability and performance in processor architectures for critical systems

Critical systems have both functional and timing requirements, the latter ensuring that deadlines are always met during operation; failure to do so may lead to catastrophic consequences. The critical nature of such systems demands specialized hardware and software solutions. This PhD thesis topic focuses on the development of computer architecture designs for critical systems, known as predictable architectures, capable of providing the necessary timing guarantees. Several such architectures exist, typically based on in-order pipelines and incorporating behavioral restrictions (e.g., disabling complex speculation mechanisms) or structural specializations (e.g., redesigned caches or deterministic arbitration for shared resources). These restrictions and specializations inevitably impact performance, and the design of predictable architectures must therefore address the predictability–performance tradeoff directly. This PhD thesis aims to explore this tradeoff in a novel way, by adapting a high-performance variant of an in-order processor (CVA6) and developing top-down techniques to make it predictable. Performance in such processors is usually achieved through mechanisms like branch prediction, prefetching, and value prediction, implemented via specialized storage elements (e.g., buffers) and supported by control mechanisms such as rollback on misprediction. Within this context, the goal of the thesis is to define a general predictability scheme for speculative execution, covering both storage organization and rollback behavior.

Sofware support for computing accelerators and memory transferts accelerators

For energy reasons, future computers will have to use accelerators for both computation and memory access (GPUs, TPUs, NPUs, smart DMAs). AI applications have intensive computational requirements in terms of both computing power and memory throughput.

These accelerators are not based on a simple instruction set (ISA), they break the Von Neuman model: they require specialized code to be written manually.

Furthermore, it is difficult to compare the use of these accelerators with code using a non-specialized processor, as the initial source codes are very different.

HybroLang is a hardware-close programming language that allows programs to be written using all of a processor's computing capabilities, while also allowing code to be specialized based on data known at runtime.

The HybroGen compiler has already demonstrated its ability to program in-memory computing accelerators, as well as to optimize code on conventional CPUs by performing innovative optimizations.

This thesis proposes to extend the HybroLang language in order to

- facilitate the programming of AI applications by providing support for complex data: stencils, convolution, sparse computing

- enable code generation both on CPUs and with hardware accelerators currently under development at the CEA (sparse computing, in-memory computing, memory access)

- allow to benchmark different computing architectures with the same initial source code

Ideally, a candidate should have knowledge of computer architecture, programming language implementation, code optimization and compilation.

LLM-Assisted Generation of Functional and Formal Hardware Models

Modern hardware systems, such as RISC-V processors and hardware accelerators, rely on functional simulators and formal verification models to ensure correct, reliable, and secure operation. Today, these models are mostly developed manually from design specifications, which is time-consuming and increasingly difficult as hardware architectures become more complex.

This PhD proposes to explore how Large Language Models (LLMs) can be used to assist the automatic generation of functional and formal hardware models from design specifications. The work will focus on defining a methodology that produces consistent and executable models while increasing confidence in their correctness. To achieve this, the approach will combine LLM-based generation with feedback from simulation and formal verification tools, possibly using reinforcement learning to refine the generation process.

The expected outcomes include a significant reduction in manual modeling effort, improved consistency between functional and formal models, and experimental validation on realistic hardware case studies, particularly RISC-V architectures and hardware accelerators.

Sustainable development of digital circuits and systems: Taking planetary boundaries into account

Technological developments in the electronics sector are experiencing rapid growth, accompanied by increasing interest in accounting for their environmental impacts. However, current approaches remain largely focused on relative impact reductions (energy efficiency, resource optimization), without ensuring compatibility with planetary boundaries. In this context, the concept of absolute sustainability emerges as an essential framework for guiding future developments of electronic systems.
This PhD thesis addresses several major scientific challenges: how can carrying capacities and sharing principles (core concepts of absolute sustainability) be identified for the electronics sector and consistently translated down to the levels of digital systems and integrated circuits? How can planetary boundaries be concretely integrated into the design of systems and circuits?
The main objective of the thesis is to move from a logic of relative environmental impact reduction toward designs that are compatible with planetary boundaries. It aims to define socio-technical scenarios to identify sharing principles, to conduct the first absolute life cycle assessment of a digital system, and to propose the first design of a circuit based on absolute limits, paving the way for sustainable development in electronics.

Implementation of TFHE on RISC-V based embedded systems

Fully Homomorphic Encryption (FHE) is a technology that allows computations to be performed directly on encrypted data, meaning that we can process information without ever knowing its actual content. For example, it could enable online searches where the server never sees what you are looking for, or AI inference tasks on private data that remain fully confidential. Despite its potential, current FHE implementations remain computationally intensive and require substantial processing power, typically relying on high-end CPUs or GPUs with significant energy consumption. In particular, the bootstrapping operation represents a major performance bottleneck that prevents large-scale adoption. Existing CPU-based FHE implementations can take over 20 seconds on standard x86 architectures, while custom ASIC solutions, although faster, are prohibitively expensive, often exceeding 150 mm² in silicon area. This PhD project aims to accelerate the TFHE scheme, a more lightweight and efficient variant of FHE. The objective is to design and prototype innovative implementations of TFHE on RISC-V–based systems, targeting a significant reduction in bootstrapping latency. The research will explore synergies between hardware acceleration techniques developed for post-quantum cryptography and those applicable to TFHE, as well as tightly coupled acceleration approaches between RISC-V cores and dedicated accelerators. Finally, the project will investigate the potential for integrating a fully homomorphic computation domain directly within the processor’s instruction set architecture (ISA).

CORTEX: Container Orchestration for Real-Time, Embedded/edge, miXed-critical applications

This PhD proposal will develop a container orchestration scheme for real-time applications, deployed on a continuum of heterogeneous computing resources in the embedded-edge-cloud space, with a specific focus on applications that require real-time guarantees.

Applications, from autonomous vehicles, environment monitoring, or industrial automation, applications traditionally require high predictability with real-time guarantees, but they increasingly ask for more runtime flexibility as well as a minimization of their overall environmental footprint.

For these applications, a novel adaptive runtime strategy is required that can optimize dynamically at runtime the deployment of software payloads on hardware nodes, with a mixed-critical objective that combines real-time guarantees with the minimization of the environmental footprint.

Adaptive Orchestration for Proactive Security in Distributed Systems

Modern distributed architectures are becoming increasingly heterogeneous and dynamic, expanding the attack surface and challenging traditional, static security mechanisms. To address these challenges, proactive defense approaches, and particularly Moving Target Defense (MTD), have been introduced to disrupt attackers by regularly modifying the system configuration — for instance, by randomizing network addresses, reallocating containers, or deploying decoy services. However, most existing strategies remain static, rely on a single defense mechanism, and ignore the underlying hardware state. In parallel, hardware-level countermeasures such as cache partitioning, randomization, and scheduling have been proposed against side-channel attacks, yet they are seldom integrated into the decision logic of orchestration frameworks.

The objective of this PhD is to design an adaptive MTD orchestration framework that is aware of the underlying hardware state, capable of dynamically adjusting defense strategies according to system load, performance, and observed vulnerability. The central idea is to feed a reinforcement learning (RL) agent with information derived from hardware performance counters and local security metrics linked to shared cache dynamics, enabling it to select the optimal combination of MTD strategies based on the current system context.

The expected contributions include the definition of a hardware-informed local security metric capturing cache behavior, the graph-based modeling of dependencies between services, resources, and attack surfaces, the design of a unified RL-based decision agent for adaptive MTD selection, and a multi-criteria evaluation (security, performance, energy) on a realistic automotive use case.

This thesis aims to bridge system-level and hardware-level perspectives to build trustworthy orchestrators capable of anticipating and adapting defenses against evolving threats, paving the way toward intelligent and hardware-aware proactive security in distributed systems.