High-Dimensional Data Analysis

High-Dimensional Data Analysis

High-dimensional statistics focuses on problem settings in which the number of features is of comparable size, or larger than the number of observations. Problems of this type present a variety of new challenges, since classical theory and methodology can break down in surprising and unexpected ways.

Berkeley researchers study both the statistical and computational challenges that arise in the high-dimensional setting. On the theoretical side, they bring to bear a range of techniques from statistics, probability, and information theory, including empirical process theory, concentration inequalities, as well as random matrix theory and free probability. Methodological innovations include new estimators in high-dimensional regression, classification, and multivariate analysis, as well as randomized algorithms for optimization, and techniques for prediction, inference, and decision-making in sequential settings. The work is motivated and applied to various scientific and engineering disciplines, including computational biology, astronomy, financial time series, epidemic forecasting, and climate forecasting.

Researchers

Photo of Peter Bickel

statistics, machine learning, semiparametric models, asymptotic theory, hidden Markov models, applications to molecular biology

Photo of Jennifer Chayes

phase transitions, networks, graphs, graphons, algorithmic game theory, machine learning, applications in cancer immunotherapy, ethical decision-making, climate change, materials science

Sandrine Dudoit photo

high-dimensional statistical learning, statistical computing, computational biology and genomics, precision medicine and health

Will Fithian

selective inference, multiple testing, multivariate analysis, risks of artificial intelligence, ecological statistics

Vadim Gorin

integrable probability, 2d statistical mechanics, random matrices, interacting particle systems, asymptotic representation theory, high-dimensional statistics

Aditya Guntuboyina

nonparametric estimation, shape-constrained estimation, high-dimensional statistics, Bayesian and empirical Bayes methods

Photo of Haiyan Huang

high-dimensional and integrative genomic data analysis, network modeling, hierarchical classification, translational bioinformatics

Michael Mahoney

scientific/engineering machine learning, randomized numerical linear algebra, random matrix theory, stochastic optimization, spectral graph theory, time series forecasting, fluid solid subsurface and chemistry/physics applications, internet and social media analysis

Song Mei

language models and diffusion models, deep learning theory, reinforcement learning theory, high-dimensional statistics, quantum algorithms, and uncertainty quantification

Headshot

AI and machine learning, applied probability, computational biology, computational genomics, evolutionary biology, human genetics

photo of P.B. Stark

uncertainty quantification and inference, inverse problems, nonparametrics, risk assessment, elections, geophysics, astrophysics, cosmology, litigation, health

Alexander Strang

Bayesian inference, inverse problems, stochastic processes, biological systems, empirical game theory, nonequilibrium thermodynamics, optimization, and computational topology

Photo of Ryan Tibshirani.

high-dimensional statistics, nonparametric estimation, distribution-free inference, machine learning, optimization, numerical methods, probabilistic forecasting, computational epidemiology 

Nikita Photo

nonparametric estimation, hypothesis testing, applied probability, statistical learning theory, online learning