Faculty Research Interests

(Joint with Mathematics, Member of the Probability Group, Graduate Group in Communication, Computation & Statistics)

I do research in mathematical probability. A central theme is the study of large finite random structures, obtaining asymptotic behavior as the size tends to infinity via consideration of some suitable infinite random structure. Much current work involves random network models. For more detailed prose see my research page. You will find a recent talk on this style of research here.

Back to top


Peter Bartlett

(Joint with EECS, Member of the Statistical Machine Learning Group, Graduate Group in Communication, Computation & Statistics)

My research interests are in the areas of machine learning, statistical learning theory, and reinforcement learning. I work on the theoreticalanalysis of computationally efficient methods for large or otherwise complex prediction problems. One example is structured prediction problems, where there is considerable complexity to the space of possible predictions. Such methods are important in a variety of application areas, including natural language processing, computer vision, and bioinformatics. A second area of interest is the analysis of prediction methods in a deterministic, game-theoretic setting. As well as being of interest in areas such as computer security, where an adversarial environment is a reasonable model, this analysis also provides insight into the design and understanding of prediction methods in a probabilistic setting. A third area of interest is the design of methods for large scale sequential decision problems, such as control of Markov decision processes. Again, computational efficiency is a crucial requirement. This is a common feature in all of these areas: the interplay between the constraint of computational efficiency and the statistical properties of a method.

Back to top

Peter Bickel

(Member of the Biostatistics Group, Graduate Group in Computational & Genomic Biology, Statistical Machine Learning Group, Graduate Group in Communication, Computation & Statistics)

My main theoretical interest is in understanding why we are able to do statistics as well as we do on very high dimensional datasets without knowing much, even though least favorable (malicious God) formulations suggest we should not be able to do anything. Currently this has led me to focus on estimation of covariance matrices and their eigenstructures in high dimensions. Parallel applied interests are in:

  1. Computational biology, specifically at the moment regulatory networks in the cell. To my surprise some of the methods flowing out of my primary interest are relevant to this one.
  2. Atmospheric sciences ... in part as a ready source of questions based on very high dimensional data.
Back to top

David Brillinger

(Member of the Biostatistics Group)

My recent research focuses on statistical methods for random processes, random process data analysis and applications in engineering and science generally. Particular topics include: modelling the motion of animals and other entities and risk analysis for earthquakes, wildfires, floods, and similar phenomena. I also work some on sports statistics, am President of The International Environmetrics Society (TIES) and am Deputy Editor of the journal Environmetrics.

Back to top

Ching-Shui Cheng

Statistical design of experiments, originated from agricultural applications, is used extensively in a wide range of scientific and industrial investigations. Experiments need to be properly designed so that valid information can be extracted at a lower cost. I am interested in efficient experimental designs and the related construction and combinatorial problems. Currently, I work mostly on design of experiments in the situation where the response depends on a large number of factors (variables), the so called factorial design. When a large number of factors have to be studied, but the experimental runs are expensive, it is not feasible to observe all possible combinations of the factors. For example, with just two settings for each factor, an experiment with 10 factors requires 2^10 runs to observe all the combinations. One aspect of my research deals with how to choose a "good" small subset of the factor combinations. There are interesting connections with combinatorics, coding theory and finite geometry.

Back to top

Sandrine Dudoit

(Joint with Biostatistics, Member of the Graduate Group in Computational & Genomic Biology)

My research and teaching activities concern the development and application of statistical and computational methods to address problems in biomedical and genomic research. The statistical inference questions are truly multivariate and involve the joint analysis of multiple, diverse, and high-dimensional datasets. High-throughput assays such as microarrays and next-generation sequencers allow biologists to monitor expression levels for entire genomes. A challenging task is to relate these genome-wide genotypes to biological and clinical covariates (e.g., age, sex, environmental exposure) and outcomes (e.g., cell type/state, affectedness/unaffectedness, survival time, response to treatment) as well as to the wealth of biological annotation metadata available on the web (e.g. Gene Ontology, KEGG pathways, PubMed literature).
Motivated by these biological challenges, my methodological research interests fall in the following two areas: loss-based estimation with cross-validation (parametric and non-parametric density estimation and regression, variable selection) and resampling-based multiple hypothesis testing. I am also interested in statistical computing and I am a core developer of the Bioconductor Project, an open-source and open-development software project for the analysis of biomedical and genomic data.

Back to top

Noureddine El Karoui

(Member of the Graduate Group in Communication, Computation & Statistics)

I'm mostly interested in statistical problems for high-dimensional data. More specifically, I have been working on covariance estimation for large dimensional data. It's an important practical topic because a lot of data analysis relies on having good estimates of covariance. It's also very interesting because a lot of classical results break down (rather badly) in high-dimensions. So on the theory side, I study questions motivated by applications and ask myself how badly standard things break down and how I can describe what sort of "strange" phenomena happen. More practically, I try to use this understanding to come up with new methods that are computationally feasible and fix the problems I've identified. At the end of the day, this covers a wide range of things, from data analysis, to algorithms to questions that straddle theoretical statistics and modern probability theory. Right now, I'm starting to use these insights in some problems arising in Finance, having mostly to do with portfolio optimization. We have a good group of people in the department working on different aspects of these questions of high-dimensional data analysis, and it's really exciting to be doing this at Berkeley right now!

Back to top

Steven Evans

(Joint with Mathematics, Member of the Probability Group, Graduate Group in Computational & Genomic Biology)

I am a probabilist and statistician working in the general area of stochastic processes and their applications. In the past, I have collaborated with Persi Diaconis and others on random matrices and various other aspects of probability on algebraic structures. I have numerous publications with Martin Barlow, Ed Perkins, Klaus Fleischmann, Tom Kurtz, Xiaowen Zhou, and Peter Donnelly on Dawson-Watanabe superprocesses and other measure-valued processes that arise in population biology, as well as with Jim Pitman on various coalescent models that appear in biology, chemistry and astrophysics. In the past, I have worked with Terry Speed, Mary Sara McPeek, Xiaowen Zhou, and others on phylogenetic invariants and interference regarding recombination.

I share an ongoing interest in biodemography with David Steinsaltz and Ken Wachter that has resulted in papers on fitness landscapes, mutation-selection balance, stochastic PDE models of bacteria and yeast aging, and applications of quasistationarity to mortality modeling.

I continue research on probability and real trees, particularly applications of ideas from metric geometry such as the Gromov-Hausdorff metric, some of it in collaboration with Tye Lidman, Jim Pitman, and Anita Winter. I am investigating tree statistics and most recent common ancestors in diploid populations with Erick Matsen. Monty Slatkin and I are researching allele frequency spectra for time-varying population sizes.

I am in the middle of an extensive project involving Tandy Warnow, Don Ringe, Luay Nakhleh, and Francois Barbancon on several aspects of phylogenetic inference - particularly applications of computational phylogenetic methods in historical linguistics.

I currently have students working on stepping stone models and coalescent sticky flows, the population genetics of hybrid zones, random matrices associated with Coxeter groups, random matrices arising from random trees and random networks, infinite-dimensional dynamical systems applied to mutation-selection balance, and connections between matrix-valued orthogonal polynomials and queuing theory.

Back to top

Lisa Goldberg

My primary interest is the development of a broad, widely applicable, statistically sound, quantitative framework for measuring and managing financial risk. This is very topical and important, given the turbulence that has plagued financial markets during the last twelve months. With colleagues at MSCI Barra, I am working on extreme risk attribution, generalized portfolio optimization, and the development of downside safe financial indices. Many of my research articles can be found in the MSCI Barra Research Library, and some are posted on the Social Sciences Research Network.

Back to top

Leo Goodman

(Joint with Sociology)

My research interests include the development of statistical methods for the analysis of data that are qualitative or categorical, and statistical methodology in the social sciences. I have contributed to the theory and development of log-linear models, latent-structure models, association models, and correspondence analysis models. Articles that I have published recently include "On the assignment of individuals to latent classes", "Statistical magic and/or statistical serendipity: An age of progress in the analysis of categorical data", "Contributions to the statistical analysis of contingency tables: ...", "Latent class analysis: The empirical study of latent types, latent variables, and latent structures."

Back to top

Haiyan Huang

(Member of the Biostatistics Group, Graduate Group in Computational & Genomic Biology)

My research areas are in Computational Biology and Applied Statistics. Particularly, I am interested in solving practical problems in emerging bio data-intensive systems, and in understanding and developing theoretical principles of the practical methods. My current focuses are: 1) develop statistical methods that provide a consistent formulation between the statistical modeling and the biological nature of data, 2) understand and solve the problem of unreliable estimates in analyzing high dimensional structured data, and 3) tackle the challenges posed by the high level of noise and the lack of reproducibility in the datasets from different resources.

Back to top

Nick Jewell

(Joint with Biostatistics, Member of the Graduate Group in Computational & Genomic Biology)

As a biostatistician, my research focuses on the application of novel statistical techniques to the design, analysis and interpretation of data arising from public health and human disease studies. From a statistical point of view I am interested in survival analysis-- particularly current status data, statistical methods for epidemiology--particularly for (i) infectious diseases and (ii) environmental exposures, causal inference in intervention studies, survey methods and longitudinal data analysis, and applications of statistics to molecular and cell biology. My work is motivated by application of such statistical techniques most recently to studies of HIV disease and AIDS including intervention trials to reduce HIV transmission in Africa, the impact of pesticide exposure to pregnancy outcomes and infant neurodevelopment, measuring key factors in epidemic growth in situations such as SARS, assessing drug safety with particular interest in the adverse cardiovascular side effects of Cox-2 inhibitors, the measurement of PBDEs in peregrine falcon eggs in California, and the assessment of civilian casualties in times of conflict.

Back to top

Michael Jordan

(Joint with EECS, Member of the Statistical Machine Learning Group, Graduate Group in Communication, Computation & Statistics, Graduate Group in Computational & Genomic Biology)

I have interests that span the spectrum from theory to algorithms to applications. I'm most interested in problems that arise when working with non-traditional data types; examples I've worked with include document corpora, graphs, protein structures, phylogenies and multi-media signals. Working with these kinds of data types often leads one to work on problems of an unusually large scale, where classical methods can be infeasible on computational grounds. I've thus been interested in new computational methods for large-scale problems; specifically I've worked on the development of novel estimators using tools from constrained optimization theory and convex analysis. I'm also interested in the interface between probability theory and nonparametric statistics, particularly in the setting known as "nonparametric Bayes", where the prior distribution is a general stochastic process. Here ideas familiar in modern probability theory, such as the Chinese restaurant process and stick-breaking distributions, yield novel statistical models and novel inference procedures. These methods have numerous applications in areas such as statistical genetics, image processing and natural language processing. I'm quite interested in pursuing these applications, particularly in collaboration with biologists and computer scientists.

Back to top

Cari Kaufman

My research is motivated by scientific questions about complex systems. In many cases, existing scientific knowledge about the system is available, sometimes represented in a deterministic computer model, and this knowledge needs to be appropriately incorporated into the statistical inference. One such project involves comparing sources of variability in regional climate model experiments. These computer models of the climate system have been developed to provide high-resolution simulations over limited areas, and their boundary conditions are often provided by lower-resolution, global climate models. Using a Bayesian functional ANOVA model, I am exploring to what degree, and in which regions, the global-scale forcing contributes to the overall variability in these regional models.

Much of my applied work concerns environmental and climate problems, and my theoretical interests lie primarily in the realm of spatial statistics. In particular, I am interested in developing good estimators and predictors for spatial processes. When the data are large, the usual likelihood methods become computationally infeasible. I have developed estimators which are computationally more efficient and shown that they share desirable asymptotic properties with the MLE. I continue to be interested in the properties of so-called "plug-in prediction" for spatial fields, in which one uses the same data both to estimate the covariance structure of the spatial field and then to predict values of the field at unknown locations.

Back to top

Michael Klass

(Joint with Mathematics, Member of the Probability Group, Biostatistics Group)

My research interests are in probability theory. I have done research on sums of independent random variables, laws of the iterated logarithm, approximations of tail probabilities, operator limit theorems for sums of independent random vectors, expectations of functions of sums of fixed and random numbers of random variables, as well as arbitrary self-normalized sums.

Back to top

Elchanan Mossel

(Joint with CS, Member of the Probability Group)

I like studying problems that involve probability, algorithms and discrete mathematics. In particular problems with applied flavor coming from the study of algorithms on typical instances, the study of evolution, of voting, of games theoretic and economic models.

Back to top

Rasmus Nielsen

(Joint with Integrative Biology, Member of the Graduate Group in Computational & Genomic Biology)

My work focuses on the development and application of statistical methods in genomics. Most of it concentrates on making inferences regarding function and evolution from molecular and genetic data. Some of the projects that I am currently involved in are in the areas of human population genetics, comparative evolutionary genomics, coalescent theory, and statistical methods in molecular ecology. Examples include evolutionary analyses of whole genome data from a diverse set of organisms including bacteria, the Giant Panda, the Rhesus Macaque monkey, humans, and chimpanzees, development of methods for association mapping which can accommodate non-linear interactions, and the development of MCMC methods for inferring demographic parameters in population genetics.

Back to top

Deborah Nolan

One of my current interests is in the area of cyber-infrastructure for education. I am studying problems surrounding the design of electronic documents which provide ways for authors and readers to interact with and dynamically view a data analysis process/statistics research activity. I am also interested in problems in more traditional areas of statistics that are related to high-dimensional modeling and model selection.

Back to top

Jim Pitman

(Joint with Mathematics, Member of the Probability Group)

I have been interested in interfaces between the traditional theory of stochastic processes and other areas of mathematics, especially combinatorics. I have studied various random combinatorial objects, such as permutations, partitions, and trees, and how the asymptotic behaviour of such structures over a large number of elements can be described in probabilistic terms, most often involving Brownian motion and related processes. This has led to the study of various measure-valued and partition-valued Markov processes whose behaviour may be understood in terms of combinatorial constructions involving random trees. I am at present engaged in developing various ideas related to random partitions, random trees, irreversible processes of coalescence, and their time reversals which provide models for random splitting or fragmentation.

I view this line of research largely as pure mathematics, but mathematics of a concrete kind which is often motivated and influenced by applications. Stochastic models with a natural probabilistic structure typically turn up in different disguises in diverse fields. The study of their mathematical structure allows ideas and results developed in one context to be transferred to another.

Back to top

Elizabeth Purdom

My interests are in statistical problems arising in the field of molecular biology and genetics and in particular high-throughput genomic experiments. I am currently working on methodologies for analyzing gene expression and alternative splicing using microarrays and ultra high-throughput sequencing. Methodologically, I am interested in high-dimensional and multivariate techniques that are relevant in this context.

Back to top

John Rice

(Member of the Biostatistics Group, Graduate Group in Communication, Computation & Statistics)

I was drawn to the discipline of statistics by a fascination with randomness, and by the way it blends mathematics and scientific content. I have enjoyed interactions with researchers in many areas, especially the natural sciences, which have given me opportunities to learn about these fields and to make contributions to them. Underlying these separate analyses are paradigms of statistical methodology that have been developed over the last 100 years. This evolution accelerates as statisticians confront new challenges in the information age.

I am especially interested in developing methods for analyzing data that arise in the form of random functions, such as time series, and which involve large quantities of data and computationally intensive analysis. Much of my recent work has centered around two projects in astronomy: detecting objects in the outer regions of the solar system (the Kuiper Belt) and detecting gamma-ray pulsars.

Back to top

Alistair Sinclair

(Joint with EECS, Member of the Graduate Group in Communication, Computation & Statistics)

My research interests are in theoretical computer science, algorithms, randomized computation, Markov Chains, phase transitions, statistical physics, and combinatorial optimization. Most of my work involves applying probabilistic ideas in some way, usually to design or analyze algorithms.

Back to top

Yun Song

(Joint with EECS, Member of the Graduate Group in Computational & Genomic Biology)

My research centers around computational biology and mathematical population genetics. I am generally interested in developing methods, using techniques from statistics and computer science, to address problems that arise from evolutionary molecular biology. I am also interested in combinatorial optimization, algorithms, and Monte Carlo methods.

Back to top

Terry Speed

(Joint with Walter & Eliza Hall Institute of Medical Research (Australia), Member of the Biostatistics Group, Graduate Group in Computational & Genomic Biology, Speed Berkeley Research Group)

My research concerns the application of statistics to problems in genetics and molecular biology. These have provided many novel challenges of both an applied and a theoretical nature. My major interests within this area are in the mapping of genes in mice and humans, including disease genes and genes contributing to the variation of quantitative traits. The Human Genome Project was a stimulus for a number of the problems I have investigated with my students. Other areas of interest include the analysis of DNA and protein sequences, for example, finding genes or motifs in DNA sequence, and the analysis of microarray data.

Back to top

Philip Stark

My research centers on inference (inverse) problems, primarily in physical science. I am especially interested in confidence procedures tailored for specific goals and in quantifying the uncertainty in inferences that rely on simulations of complex physical systems. I've done research on the internal structure of Sun and Earth, earthquake prediction, the Big Bang, the geomagnetic field, election auditing, geriatric hearing loss, the U.S. census, the effectiveness of Internet content filters, spectrum estimation and information retrieval. I am interested in numerical optimization, and have published some software.

I've consulted in product liability litigation, truth in advertising, equal protection under the law, jury selection, trade secret litigation, employment discrimination litigation, import restrictions, insurance litigation, natural resource legislation, environmental litigation, patent litigation, sampling in litigation, wage and hour class actions, product liability class actions, consumer class actions, the U.S. census, clinical trials, signal processing, geochemistry, IC mask quality control, targeted marketing, water treatment, sampling the web, risk assessment, and oil exploration.

Back to top

Bernd Sturmfels

(Joint with Mathematics and EECS, Member of the Graduate Group in Computational & Genomic Biology)

I am mathematician with a wide range of interests ranging from combinatorics and algebraic geometry to optimization and computational biology. I have always been fascinated with numerical experiments and data analysis, and this led me quite naturally into the emerging field of algebraic statistics. Here I have worked on phylogenetics, Markov bases, symbolic computation of Bayesian integrals, likelihood inference, and the geometry of conditional independence models for discrete and Gaussian random variables.

Back to top

Mark van der Laan

(Joint with Biostatistics, Member of the Graduate Group in Computational & Genomic Biology, Computational Biology & Causality Group)

The goal of my research group is to develop statistical methods to estimate/learn causal and non-causal parameters of interest, based on potentially complex and high dimensional data from randomized clinical trials or observational longitudinal studies, or from cross-sectional (e.g., case-control sampling) studies. The model assumptions under which these methods are valid should be clearly formulated, so that they can be subject to scrutiny. The estimates should be accompanied by confidence regions for the true parameter values or other types of confidence measures (e.g., variability/reproducibility of clusters as measured by the bootstrap). The longitudinal data structures may involve high dimensional measurements such as whole genome profiles at various points in time; censoring and missingness of data due to a subject not responding well to treatment (or not feeling well); and changes of treatment at various points in time, based on variables related to the outcome of interest. Our methods are designed to rely on as few assumptions as possible on nuisance parameters so that they provide maximally objective statistical inference and testing procedures. To develop and refine these methods, we work with simulated and real data in collaboration with biologists, medical researchers, epidemiologists, and others.

Back to top

Ken Wachter

(Joint with Demography)

As a mathematical demographer and statistician, I study systematic constraints and random influences that shape the structure of human populations. I helped develop methods of computer simulation to understand the rarity of coresident family members in pre-industrial English households. With these methods, I am now forecasting the kin and family support available to new generations of elderly in the Twentyfirst century. Working in "non-linear" demography, I have identified mechanisms that give rise to specific kinds of cycles in fertility and population growth. I am currently interested in patterns of mortality at extreme ages shared between humans and other species, trying to reconcile them with statistical models for long-term processes of evolutionary change.

Back to top

Martin Wainwright

(Joint with EECS, Member of the Statistical Machine Learning Group, Graduate Group in Communication, Computation & Statistics)

In broad terms, I am interested in problems at the interface between computation and statistics. Part of my research focuses on algorithms and Markov random fields, a class of probabilistic model based on graphs used to capture dependencies in multivariate data (e.g., image models, data compression, computational biology). In general, exact solutions to inference problems in such models are computationally intractable, so that there is a great deal of interest in approximate methods for statistical inference. I am also interested in studying the effect of decentralization and communication constraints in statistical inference problems. A final area of interest is methodology and theory for high-dimensional inference problems, in which the model dimension is of the same order (or larger than) the sample size.

Back to top

Bin Yu

(Joint with EECS, Member of the Biostatistics Group, Statistical Machine Learning Group, Graduate Group in Communication, Computation & Statistics, Yu Group)

I am currently working on statistical methodologies and models involving large data sets from remote sensing, data networks (internet and sensor networks), neuroscience, finance, and bioinformatics. Together with my students and collaborators, I have been working on different areas of statistical machine learning, theoretical and computational. These areas include boosting, Lasso, support vector machines (SVM), and semi-supervised learning. On the computational side, we have developed algorithms such as BLasso and iCAP for sparse modeling. My past research areas have also included empirical processes, Markov Chain Monte Carlo, signal processing, the minimum description length principle (MDL), and information theory.

Back to top