Software support for sparse computation

The performance of computers has become limited by data movement in the fields of AI, HPC and embedded computing. Hardware accelerators do exist to handle data movement in an energy-efficient way, but there is no programming language that allows them to be implemented in the code supporting the calculations.

It's up to the programmer to explicitly configure DMAs and use function calls for data transfers and do program analysis to identify memory bottleneck

In addition, compilers were designed in the 80s, when memories worked at the same frequency as computing cores.

The aim of this thesis will be to integrate into a compiler the ability to perform optimizations based on data transfers.

Scalable thermodynamic computing architectures

Large-scale optimisation problems are increasingly prevalent in industries such as finance, materials development, logistics and artificial intelligence. These algorithms are typically realised on hardware solutions comprising clusters of CPUs and GPUs. However, at scale, this can quickly translate into latencies, energies and financial costs that are not sustainable. Thermodynamic computing is a new computing paradigm in which analogue components are coupled together in a physical network. It promises extremely efficient implementations of algorithms such as simulated annealing, stochastic gradient descent and Markov chain Monte Carlo using the intrinsic physics of the system. However, no clear vision of how a realistic programmable and scalable thermodynamic computer exists. It is this ambitious challenge that will be addressed in this PhD topic. Aspects ranging from the development computing macroblocks, their partitioning and interfacing to a digital system to the adaptation and compilation of algorithms to thermodynamic hardware may be considered. Particular emphasis will be put on understanding the trade-offs required to maximise the scalability and programmability of thermodynamic computers on large-scale optimisation benchmarks and their comparison to implementations on conventional digital hardware.

HW/SW Contracts for Security Analysis Against Fault Injection Attacks on Open-source Processors

This thesis focuses on the cybersecurity of embedded systems, particularly the vulnerability of processors and programs to fault injection attacks. These attacks disrupt the normal functioning of systems, allowing attackers to exploit weaknesses to access sensitive information. Although formal methods have been developed to analyze the robustness of systems, they often limit their analyses to hardware or software separately, overlooking the interaction between the two.

The proposed work aims to formalize hardware/software (HW/SW) contracts specifically for security analysis against fault injection. Building on a hardware partitioning approach, this research seeks to mitigate scalability issues related to the complexity of microarchitecture models. Expected outcomes include the development of techniques and tools for effective security verification of embedded systems, as well as the creation of contracts that facilitate the assessment of compliance for both hardware and software implementations. This approach could also reduce the time-to-market for secure systems.

Combined Software and Hardware Approaches for Large Scale Sparse Matrix Acceleration

Computational physics, artificial intelligence and graph analytics are important compute problems which depend on processing sparse matrices of huge dimensions. This PhD thesis focuses on the challenges related to efficiently processing such sparse matrices, by applying a systematic software are hardware approach.

Although the processing of sparse matrices has been studied from a purely software perspective for decades, in recent years many dedicated, and very specific hardware, accelerators for sparse data have been proposed. What is missing is a vision of how to properly exploit these accelerators, as well as standard hardware such as GPUs, to efficiently solve a full problem. Prior to solving a matrix problem, it is common to perform pre-processing of the matrix. This can include techniques to improve the numerical stability, to adjust the form of the matrix, and techniques to divide it into smaller sub-matrices (tiling) which can be distributed to processing cores. In the past, this pre-processing has assumed homogenous compute cores. New approaches are needed, to take advantage of heterogeneous cores which can include dedicated accelerators and GPUs. For example, it may make sense to dispatch the sparsest regions to specialized accelerators and to use GPUs for the denser regions, although this has yet to be shown. The purpose of this PhD thesis is to take a broad overview of the processing of sparse matrices and to analyze what software techniques are required to exploit existing and future accelerators. The candidate will build on an existing multi-core platform based on RISC-V cores and an open-source GPU to develop a full framework and will study which strategies are able to best exploit the available hardware.

Embedded systems for natural acoustic signals analysis while preserving privacy

The PhD topic aims at developping Embedded systems to record and analyze natural acoustic signals. When targeting city deployement, the privacy issue is raised: how can we keep a satisfactory analysis level while never record or transmit human voices?

CCA-secure constructions for FHE

Fully Homomorphic Encryption (FHE) is a corpus of cryptographic techniques that allow to compute directly over encrypted data. Since its inception around 15 years ago, FHE has been the subject of a lot of research towards more efficiency and better practicality. From a security perspective, however, FHE still raises a number of questions and challenges. In particular, all the FHE used in practice, mainly BFV, BGV, CKKS and TFHE, achieve only CPA-security, which is sometimes referred to as security against passive adversaries.

Over the last few years, a number of works have investigated the security of FHE in the beyond-CPA regime with new security notions (CPAD, FuncCPA, vCCA, vCCAD, and others) being proposed and studied, leading to new attacks and constructions and, overall, a better understanding of FHE security in that regime.

With respect to CCA security, recent works (2024) have defined new security notions, which are stronger than CCA1 and shown to be achievable by both exact and approximate FHE schemes. Leveraging on these advances, the present thesis will aim to design practical FHE-style malleable schemes enforcing CCA security properties, at least for specific applications.

Scalable NoC-based Programmable Cluster Architecture for future AI applications

Context
Artificial Intelligence (AI) has emerged as a major field impacting various sectors, including healthcare, automotive, robotics, and more. Hardware architectures must now meet increasingly demanding requirements in terms of computational power, low latency, and flexibility. Network-on-Chip (NoC) technology is a key enabler in addressing these challenges, providing efficient and scalable interconnections within multiprocessor systems. However, despite its benefits, designing NoCs poses significant challenges, particularly in optimizing latency, energy consumption, and scalability.
Programmable cluster architectures hold great promise for AI as they enable resource adaptation to meet the specific needs of deep learning algorithms and other compute-intensive AI applications. By combining the modularity of clusters with the advantages of NoCs, it becomes possible to design systems capable of handling ever-increasing AI workloads while ensuring maximum energy efficiency and flexibility.
Summary of the Thesis Topic
This PhD project aims to design a scalable, programmable cluster architecture based on a Network-on-Chip tailored for future AI applications. The primary objective will be to design and optimize a NoC architecture capable of meeting the high demands of AI applications in terms of intensive computing and efficient data transfer between processing clusters.
The research will focus on the following key areas:
1. NoC Architecture Design: Developing a scalable and programmable NoC to effectively connect various AI processing clusters.
2. Performance and Energy Efficiency Optimization: Defining mechanisms to optimize system latency and energy consumption based on the nature of AI workloads.
3. Cluster Flexibility and Programmability: Proposing a modular and programmable architecture that dynamically allocates resources based on the specific needs of each AI application.
4. Experimental Evaluation: Implementing and testing prototypes of the proposed architecture to validate its performance on real-world use cases, such as image classification, object detection, and real-time data processing.
The outcomes of this research may contribute to the development of cutting-edge embedded systems and AI solutions optimized for the next generation of AI applications and algorithms.

The work performed during this thesis will be presented at international conferences and scientific journals. Certain results may be patented.

Hardware-aware Optimizations for Efficient Generative AI with Mamba Networks

Generative AI has the potential to transform various industries. However, current state-of-the-art models like transformers face significant challenges in computational and memory efficiency, especially when deployed on resource-constrained hardware. This PhD research aims to address these limitations by optimizing Mamba networks for hardware-aware applications. Mamba networks offer a promising alternative by reducing the quadratic complexity of self-attention mechanisms through innovative architectural choices. By leveraging techniques such as sparse attention patterns and efficient parameter sharing, Mamba networks can generate high-quality data with significantly lower resource demands. The research will focus on implementing hardware-aware optimizations to enhance the efficiency of Mamba networks, making them suitable for real-time applications and edge devices. This includes optimizing training and inference times, as well as exploring potential hardware accelerations. The goal is to advance the practical deployment of generative AI in resource-constrained domains, contributing to its broader adoption and impact.

Exploration of unsupervised approaches for modeling the environment from RADAR data

Radar technologies have gained significant interest in recent years, particularly with the emergence of MIMO radars and "Imaging Radars 4D". This new generation of radar offers both opportunities and challenges for the development of perception algorithms. Traditional algorithms such as FFT, CFAR, and DOA are effective for detecting moving targets, but the generated point clouds are still too sparse for precise environment model. This is a critical issue for autonomous vehicles and robotics.

This thesis proposes to explore unsupervised Machine Learning techniques to improve environment model from radar data. The objective is to produce a richer model of the environment to enhance data density and scene description, while controlling computational costs for real-time computing. The thesis will address the question of which types of radar data are best suited as inputs for algorithms and for representing the environment. The candidate will need to explore non-supervised algorithmic solutions and seek computational optimizations to make these solutions compatible with real-time execution.

Ultimately, these solutions must be designed to be embedded as close as possible to the sensor, allowing them to be executed on constrained targets.

CORTEX: Container Orchestration for Real-Time, Embedded/edge, miXed-critical applications

This PhD proposal will develop a container orchestration scheme for real-time applications, deployed on a continuum of heterogeneous computing resources in the embedded-edge-cloud space, with a specific focus on applications that require real-time guarantees.

Applications, from autonomous vehicles, environment monitoring, or industrial automation, applications traditionally require high predictability with real-time guarantees, but they increasingly ask for more runtime flexibility as well as a minimization of their overall environmental footprint.

For these applications, a novel adaptive runtime strategy is required that can optimize dynamically at runtime the deployment of software payloads on hardware nodes, with a mixed-critical objective that combines real-time guarantees with the minimization of the environmental footprint.

Top