The race to exascale has caused the emergence of highly heteorgenous supercomputers. The "MSA" model (Modular Supercomputing Architecture) is the result of this evolution. An MSA system is composed of several "modules", each module being in itself a smaller supercomputer with a specific architecture to address specific computation needs. These modules are linked together with a fast interconnect and a common software stack, providing the possibility ot launch a unique job on multiple modules. It is possible that the computationel units and the interconnect link inside each module is different from the other modules.
Having mulitple interconnection network impose an increased pressure on internode communication libraries, such as MPI implementations. Indeed, if an application can run on multiple modules at the same time, the MPI implementation needs to be able to make messages travel between two MPI processes located on two distinct modules. It is then necessary for the MPI implementation to 1) support all networks involved and 2) make a unique message go through several networks.
The purpose of the thesis is to analyze the features and constraints required by the MSA model, and to offer solutions for efficient runs on such supercomputer.
The Ph.D. candidate will rely on the expertise of the advising team, along with the network support in the MPC framework offering multi-rail and multi-networks capabilities.
Through this thesis, we aim to study and provide solution for:
- A user interface providing a multi-module job launcher
- Gathering cross-module network topology information, and providing a comprehensive and useful representation of such network
- Analyze, develop and implements hierarchical algorithms adapted to MSA, and aknowledging the underlying network topology of the allocated resources