At the core of Artificial Intelligence (AI) lies a set of elaborate non-linear, data-driven or implictly-defined machine learning methods and algorithms. The latter however largely rely on "small dimensional intuitions" and heuristics which have recently been shown to be *mostly inappropriate and behave strikingly differently in large dimensions* (see for instance the case of kernel spectral clustering in Fig. 1, or semi-supervised learning in Fig. 2). Recent advances in tools from large dimensional statistics, random matrix theory and statistical physics have provided a series of answers to this curse of dimensionality in proposing a renewed understanding and means of *striking improvements through novel algorithms* of elementary ML methods for bigdata (in the context of community detection, graph semi surpervised learning, subspace clustering, etc.). Of particular interest is the *random matrix analysis of simple neural network structures*.

More importantly, while mostly relying on simple modelling (iid Gaussian, simple mixture models, etc.), these tools are adequate and resiliant to realistic datasets, as they provably demonstrate*universality features*. Precisely, leveraging on a new approach to the concentration of measure theory, these results *fully explain realistic advanced ML algorithm behaviors, such as deep learners and GANs* (see Fig. 3).

The GSTATS chair aims at gathering these findings in a coherent new*random matrix paradigm for big data machine learning*. In particular, the project relies on innovative key theoretical directions:

More importantly, while mostly relying on simple modelling (iid Gaussian, simple mixture models, etc.), these tools are adequate and resiliant to realistic datasets, as they provably demonstrate

The GSTATS chair aims at gathering these findings in a coherent new

- (i) large dimensional statistics (random matrix theory) for the analysis and improvement non-linear optimization, kernel methods, generalized linear mixed models, etc.
- (ii) concentration of measure theory and universality for deep learning understanding,
- (iii) statistical physics methods for sparse graph mining, clustering, and neural network analysis,

**1) Random Matrix Theory for AI:**- RMT analysis and improvement of ML methods in large dimensional regimes (kernel random matrices, spectral methods, random neural nets)
- Asymptotics of optimization problems in machine learning, generalized linear mixed models
- Large dimensional estimation and detection
- Statistical learning on large dimensional graphs

**2) Statistical Physics Approaches:**- Statistical physics for large and sparse data and graphs
- Neural network asymptotics

**3) Universality Results: from Theory to Practice:**- Universality through concentration of measure advances for ML
- Universal models and performance in applied areas (from electrical engineering to computer vision, statistical biology, finance, BCI, etc.).

**Kernel Methods don't behave the same in Large Dimensions.** A first key finding consists in demonstrating that, under a "non-trivial" Gaussian mixture model (that is for not too easily separable mixtures), as the dimension **Standard Semi-Supervised Learning Methods are Suboptimal but can be Improved.** A consequence of the large dimensional "concentration" of distances lies in the inappropriateness of many classical machine learning methods which, initially developed to tackle finite dimensional (small **Gaussian Mixtures are Universal Models.** A main frustration of large dimensional statistics versus practice often lies in the inaccuracy of modelling real datasets through basic Gaussian mixture models. We showed that this state of fact is much less true in large dimensional data which seem to behave much more like Gaussian random variables than in small dimensions. We theoretically proved this statement as follows: (i) random matrix universality results occur in large dimensional data which, in particular, make asymptotics of kernel and neural network classification