Romain Couillet

GSTATS DataScience Chair @ University Grenoble-Alpes

MIAI LargeDATA Chair @ University Grenoble-Alpes

The Project

At the core of Artificial Intelligence (AI) lies a set of elaborate non-linear, data-driven or implictly-defined machine learning methods and algorithms. The latter however largely rely on "small dimensional intuitions" and heuristics which have recently been shown to be mostly inappropriate and behave strikingly differently in large dimensions (see for instance the case of kernel spectral clustering in Fig. 1, or semi-supervised learning in Fig. 2). Recent advances in tools from large dimensional statistics, random matrix theory and statistical physics have provided a series of answers to this curse of dimensionality in proposing a renewed understanding and means of striking improvements through novel algorithms of elementary ML methods for bigdata (in the context of community detection, graph semi surpervised learning, subspace clustering, etc.). Of particular interest is the random matrix analysis of simple neural network structures.
More importantly, while mostly relying on simple modelling (iid Gaussian, simple mixture models, etc.), these tools are adequate and resiliant to realistic datasets, as they provably demonstrate universality features. Precisely, leveraging on a new approach to the concentration of measure theory, these results fully explain realistic advanced ML algorithm behaviors, such as deep learners and GANs (see Fig. 3).

The GSTATS and MIAI LargeDATA chairs aim at gathering these findings in a coherent new random matrix paradigm for big data machine learning. In particular, the project relies on innovative key theoretical directions:
  • (i) large dimensional statistics (random matrix theory) for the analysis and improvement non-linear optimization, kernel methods, generalized linear mixed models, etc.
  • (ii) concentration of measure theory and universality for deep learning understanding,
  • (iii) statistical physics methods for sparse graph mining, clustering, and neural network analysis,

Research Topics

A mathematical perspective of the GSTATS and LargeDATA chair activity can be delineated as the following three main domains.

Collaborative Projects

Publications within the MIAI LargeDATA project

The Team

Main Findings