Few-shot event and complex relation extraction from text applied to scientific literature

Artificial intelligence & Data intelligence Computer science and software Engineering sciences Technological challenges 

Abstract

Information extraction from text, which falls under the broader field of Natural Language Processing, has been the subject of research for many years. These efforts have primarily focused on Named Entity Recognition, relation extraction between entities, and, in its most complex form, event extraction, a task typically formulated as filling predefined templates from unstructured text. Within this framework, the objective of this thesis is to design, develop, and evaluate event extraction models operating on scientific articles. In this context, an "event" may correspond to a set of entities and relations characterizing, for instance, a chemical reaction or an experiment. Furthermore, these models must be capable of being defined from a highly restricted set of annotated data to allow for rapid adaptation to new scientific domains.

From a methodological standpoint, the proposed thesis seeks to move beyond the current, almost reflexive tendency to rely exclusively on Large Language Models (LLMs). Instead, it advocates for a potential synergy between LLMs and smaller encoder-based models within a few-shot context. In this synergy, the former are leveraged, through the generation of synthetic data and annotations, to build the resources necessary to implement the latter via pre-training mechanisms. This thesis will be conducted within the framework of the AIKO project of the Digital Programs Agency, which focuses on knowledge extraction from scientific publications.

Laboratory

Département Intelligence Ambiante et Systèmes Interactifs (LIST)

Service Intelligence Artificielle pour le Langage et la Vision

Laboratoire Analyse Sémantique Textes et Images

Paris-Saclay

Back

Share this thesis topic

Practicle information

Pre-requisite:

Master 2 ou école d'ingénieur avec spécialité en traitement automatique des langues et apprentissage automatique

University - graduate school:

Sciences et Technologies de l’Information et de la Communication (STIC)

Paris-Saclay

Starting date:

01-10-2026

Place:

Saclay

Contact Person

Olivier

FERRET

CEA

DRT/DIASI/SIALV/LASTI

Tel : 01 69 08 01 47

Email : olivier.ferret@cea.fr

Thesis supervisor

Olivier

FERRET

CEA

DRT/DIASI/SIALV/LASTI