Internalisation of external knowledge by foundation models
To perform an unknown task, a subject (human or robot) has to consult external information, which involves a cognitive cost. After several similar experiments, it masters the situation and can act automatically. The 1980s and 1990s saw explorations in AI using conceptual graphs and schemas, but their large-scale implementation was limited by the technology available at the time.
Today's neural models, including transformers and LLM/VLMs, learn universal representations through pre-training on huge amounts of data. They can be used with prompts to provide local context. Fine-tuning allows these models to be specialised for specific tasks.
RAG and GraphRAG methods can be used to exploit external knowledge, but their use for inference is resource-intensive. This thesis proposes a cognitivist approach in which the system undergoes continuous learning. It consults external sources during inference and uses this information to refine itself regularly, as it does during sleep. This method aims to improve performance and reduce resource consumption.
In humans, these processes are linked to the spatial organisation of the brain. The thesis will also study network architectures inspired by this organisation, with dedicated but interconnected “zones”, such as the vision-language and language models.
These concepts can be applied to the Astir and Ridder projects, which aim to exploit foundation models for software engineering in robotics and the development of generative AI methods for the safe control of robots.
Fine-grained and spatio-temporally grounded large multimodal models
This PhD project focuses on enhancing Large Multimodal Models (LMMs) through the integration of fine-grained and spatio-temporal information into training datasets. While current LMMs such as CLIP and Flamingo show strong performance, they rely on noisy and coarse-grained image-text pairs and often lack spatial or temporal grounding. The thesis aims to develop automatic pipelines to enrich image datasets with geographic and temporal metadata, refine captions using fine-grained semantic descriptors, and balance dataset diversity and compactness by controlling class-wise sample sizes.
Training strategies will incorporate hierarchical class structures and adapt protocols to improve alignment between caption elements and image regions. The work will also explore joint training regimes that integrate fine-grained, spatial, and temporal dimensions, and propose set-based inference to improve the diversity of generated outputs. The enriched datasets and models will be evaluated using existing or newly developed benchmarks targeting contextual relevance and output diversity. The project also addresses challenges in metadata accuracy, efficient model adaptation, and benchmarking methodologies for multi-dimensional model evaluation.
Applications include improved synthetic data generation for autonomous driving, enhanced annotation of media archives through contextual captioning, and better visual reasoning in industrial simulation scenarios.
Machine Learning-Accelerated Electron Density Calculations
Density Functional Theory (DFT) in the Kohn-Sham formalism is one of the most widespread methods for simulating microscopic properties in solid-state physics and chemistry. Its main advantage lies in its ability to strike a favorable balance between accuracy and computational cost. The continuous evolution of increasingly efficient numerical techniques has constantly broadened the scope of its applicability.
Among these techniques that can be associated with DFT, machine learning is being used more and more. Today, a very common application consists in producing potentials capable of predicting interactions between atoms using supervised learning models, relying on properties computed by DFT.
The objective of the project proposed as part of this thesis is to use machine learning techniques at a deeper level, notably to predict the electronic density in crystals or molecules. Compared to predicting properties such as forces between atoms, calculating the electronic density presents certain challenges: the electronic density is high-dimensional since it must be calculated throughout all space; its characteristics vary strongly from one material to another (metals, insulators, charge transfer, etc.). Ultimately, this can represent a significant computational cost. There are several options to reduce the dimensionality of the electronic density, such as computing projections or using localization functions.
The final goal of this project is to be able to predict, with the highest possible accuracy, the electronic density, in order to use it as a prediction or as a starting point for calculations of electron-specific properties (magnetism, band structure, for example).
In a first stage, the candidate will be able to implement methods recently proposed in the literature; in a second part of the thesis, it will then be necessary to propose new ideas. Finally, the implemented method will be used to accelerate the prediction of properties of large systems involving charge transfers, such as defect migration in crystals.
Automatic modelling language variations for socially responsive chatbots
Conversational agents are increasingly present in our daily lives thanks to advances in natural language processing and artificial intelligence and are attracting growing interest. However, their ability to understand human communication in all its complexity remains a major challenge. This PhD project aims to model linguistic variation to develop agents capable of socially adaptive interactions, taking into account the socio-demographic profile and emotional state of their interlocutors. It also focuses on evaluating linguistic cues at different levels, leveraging both spoken and written language varieties, and assessing the generalization capacity of models trained on multilingual and multi-situational data, with the goal of improving interaction modeling with conversational agents.
Compositional Generalization of Visual Language Models
The advent of the foundation models led to increase the state-of-the art performance on a large number of tasks in several fields of AI, in particular computer vision and natural language processing. However, despite the huge amount of data used to train them, these models are still limited in their ability to generalize, in particular for a use case of interest that is in a specific domain, not well represented on the Web. A way to formalize this issue is compositional generalization, i.e. generalising to a new, unseen concept from concepts learned during training. This "generalization" is the ability to learn disentangle concepts and to be able to recombine
them into unseen composition when the model is in production. The proposed thesis will address this issue, aiming at proposing visual representations that enable generic visual language models to generalize compositionally within specific domains. It will investigate strategies to reduce shortcut learning, promoting deeper understanding of compositional structures in multimodal data. It will also address the problem of compositional generalization beyond simple attribute–object pairs, capturing more subtle and complex semantics. The proposed thesis aims at proposing preogress at a quite theoretical level but has many potential practical interest, in the fields of health, administration and services sectors, security and defense, manufacturing and agriculture.
Towards a Sustainable Blockchain: Reducing Energy Consumption While Ensuring Security and Integrity
Blockchain technology, a key component of distributed ledger systems, enables decentralized digital interactions without the need for a central authority but raises environmental concerns due to its energy consumption, particularly with proof-of-work (PoW) mechanisms like Bitcoin. The literature highlights the sustainability challenges associated with this energy consumption. Several strategies have been proposed to mitigate these impacts, such as optimizing cryptographic puzzles, implementing two-round mining processes, and integrating renewable energy sources. Alternative consensus mechanisms like Proof-of-Stake (PoS) and Proof-of-Authority (PoA) are also explored. This research project aims to evaluate the energy consumption profiles of existing blockchain systems and propose new, more efficient consensus algorithms. It also focuses on integrating renewable energy sources and optimizing smart contracts to reduce their resource consumption. A thorough security analysis will ensure that energy efficiency improvements do not compromise network security and decentralization. Using simulation tools, this research will quantify the improvements brought by new algorithms and strategies, contributing to the sustainability and broader adoption of blockchain technology in an environmentally conscious manner.
Attention-based Binarized Visual Encoder for LLM-driven Visual Question Answering
In the context of smart image sensors, there is an increasing demand to go beyond simple inferences such as classification or object detection, to add more complex applications enabling a semantic understanding of the scene. Among these applications, Visual Question Answering (VQA) enables AI systems to answer questions by analyzing images. This project aims to develop an efficient VQA system combining a visual encoder based on Binary Neural Networks (BNN) with a compact language model (tiny LLM). Although LLMs are still far from a complete hardware implementation, this project represents a significant step in this direction by using a BNN to analyze the context and relationship between objects of the scene. This encoder processes images with low resource consumption, allowing real-time deployment on edge devices. Attention mechanisms can be taken into consideration to extract the semantic information necessary for scene understanding. The language model used can be stored locally and adjusted jointly with the BNN to generate precise and contextually relevant answers.
This project offers an opportunity for candidates interested in Tiny Deep Learning and LLMs. It proposes a broad field of research for significant contributions and interesting results for concrete applications. The work will consist of developing a robust BNN topology for semantic scene analysis under certain hardware constraints (memory and computation) and integrating and jointly optimizing the BNN encoder with the LLM, while ensuring a coherent and performant VQA system across different types of inquiries.
Scalability of the Network Digital Twin in Complex Communication Networks
Communication networks are experiencing an exponential growth both in terms of deployment of network infrastructures (particularly observed in the gradual and sustained evolution towards 6G networks), but also in terms of machines, covering a wide range of devices ranging from Cloud servers to lightweight embedded IoT components (e.g. System on Chip: SoC), and including mobile terminals such as smartphones.
This ecosystem also encompasses a variety of software components ranging from applications (e.g. A/V streaming) to the protocols from different communication network layers. Furthermore, such an ecosystem is intrinsically dynamic because of the following features:
- Change in network topology: due, for example, to hardware/software failures, user mobility, operator network resource management policies, etc.
- Change in the usage/consumption ratio of network resources (bandwidth, memory, CPU, battery, etc.). This is due to user needs and operator network resource management policies, etc.
To ensure effective supervision or management, whether fine-grained or with an abstract view, of communication networks, various network management services/platforms, such as SNMP, CMIP, LWM2M, CoMI, SDN, have been proposed and documented in the networking literature and standard bodies. Furthermore, the adoption of such management platforms has seen broad acceptance and utilization within the network operators, service providers, and the industry, where the said management platforms often incorporate advanced features, including automated control loops (e.g. rule-based, expert-system-based, ML-based), further enhancing their capability to optimize the performance of the network management operations.
Despite the extensive exploration and exploitation of these network management platforms, they do not guarantee an effective (re)configuration without intrinsic risks/errors, which can cause serious outage to network applications and services. This is particularly true when the objective of the network (re)configuration is to ensure real-time optimization of the network, analysis/ tests in operational mode (what- if analysis), planning updates/modernizations/extensions of the communication network, etc. For such (re)configuration objectives, a new network management paradigm has to be designed.
In the recent years, the communication network research community started exploring the adoption of the digital twin concept for the networking context (Network Digital Twin: NDT). The objective behind this adoption is to help for the management of the communication network for various purposes, including those mentioned in the previous paragraph.
The NDT is a digital twin of the real/physical communication network (Physical Twin Network: PTN), making it possible to manipulate a digital copy of the real communication network, without risk. This allow in particular for visualizing/predicting the evolution (or the behavior, the state) of the real network, if this or that network configuration is to be applied. Beyond this aspect, the NDT and the PTN network exchange information via one or more communication interfaces with the aim of maintaining synchronized states between the NDT and the PTN.
Nonetheless, setting up a network digital twin (NDT) is not a simple task. Indeed, frequent and real-time PTN-NDT synchronization poses a scalability problem when dealing with complex networks, where each network information is likely to be reported at the NDT level (e.g. a very large number of network entities, very dynamic topologies, large volume of information per node/per network link).
Various scientific contributions have attempted to address the question of the network digital twin (NDT). The state-of-the-art contributions focus on establishing scenarios, requirements, and architecture for the NDT. Nevertheless, the literature does not tackle the scalability problem of the NDT.
The objective of this PhD thesis is to address the scalability problem of network digital twins by exploring new machine learning models for network information selection and prediction.
Learning world models for advanced autonomous agent
World models are internal representations of the external environment that an agent can use to interact with the real world. They are essential for understanding the physics that govern real-world dynamics, making predictions, and planning long-horizon actions. World models can be used to simulate real-world interactions and enhance the interpretability and explainability of an agent's behavior within this environment, making them key components for advanced autonomous agent models.
Nevertheless, building an accurate world model remains challenging. The goal of this PhD is to develop methodology to learn world models and study their use in the context of autonomous driving, particularly for motion forecasting and developing autonomous agents for navigation.
Secure and Agile Hardware/Software Implementation of new Post-Quantum Cryptography Digital Signature Algorithms
Cryptography plays a fundamental role in securing modern communication systems by ensuring confidentiality, integrity, and authenticity. Public-key cryptography, in particular, has become indispensable for secure data exchange and authentication processes. However, the advent of quantum computing poses an existential threat to many of the traditional public-key cryptographic algorithms, such as RSA, DSA, and ECC, which rely on problems like integer factorization and discrete logarithms that quantum computers can solve efficiently. Recognizing this imminent challenge, the National Institute of Standards and Technology (NIST) initiated in 2016 a global effort to develop and standardize Post-Quantum Cryptography (PQC). After three rigorous rounds of evaluation, NIST announced its first set of standardized algorithms in 2022. While these algorithms represent significant progress, NIST has expressed an explicit need for additional digital signature schemes that leverage alternative security assumptions, emphasizing the importance of schemes that offer shorter signatures and faster verification times to enhance practical applicability in resource-constrained environments. Building on this foundation, NIST opened a new competition to identify additional general-purpose signature schemes. The second-round candidates, announced in October 2024, reflect a diverse array of cryptographic families.
This research focuses on the critical intersection of post-quantum digital signature algorithms and hardware implementations. As the cryptographic community moves toward adoption, the challenge lies not only in selecting robust algorithms but also in deploying them efficiently in real-world systems. Hardware implementations, in particular, must address stringent requirements for performance, power consumption, and security, while also providing the flexibility to adapt to multiple algorithms—both those standardized and those still under evaluation. Such agility is essential to future-proof systems against the uncertainty inherent in cryptographic transitions. The primary objective of this PhD research is to design and develop hardware-agile implementations for post-quantum digital signature algorithms. The focus will be on supporting multiple algorithms within a unified hardware framework, enabling seamless adaptability to the diverse needs of evolving cryptographic standards. This involves an in-depth study of the leading candidates from NIST’s fourth-round competition, as well as those already standardized, to understand their unique computational requirements and security properties. Special attention will be given to designing modular architectures that can support different signatures, ensuring versatility and extensibility. The proposed research will also explore optimizations for resource efficiency, balancing trade-offs between performance, power consumption, and area utilization. Additionally, resilience against physical attacks (side-channel attacks and fault injection attacks) will be a key consideration in the design process. This PhD project will be conducted within the PEPR PQ-TLS project in collaboration with the TIMA laboratory (Grenoble), the Agence nationale de la sécurité des systèmes d’information (ANSSI) and INRIA.