Faculty Research Interests
Analysis of algorithms, applied probability, complex networks, entropy, mathematical probability, phylogenetic trees, random networks, spatial networks. Also popularization of probability.
I do research in mathematical probability. A central theme is the study of large finite random structures, obtaining asymptotic behavior as the size tends to infinity via consideration of some suitable infinite random structure. Much current work involves random network models.
My research interests are in the areas of machine learning, statistical learning theory, and reinforcement learning. I work on the theoreticalanalysis of computationally efficient methods for large or otherwise complex prediction problems. One example is structured prediction problems, where there is considerable complexity to the space of possible predictions. Such methods are important in a variety of application areas, including natural language processing, computer vision, and bioinformatics. A second area of interest is the analysis of prediction methods in a deterministic, game-theoretic setting. As well as being of interest in areas such as computer security, where an adversarial environment is a reasonable model, this analysis also provides insight into the design and understanding of prediction methods in a probabilistic setting. A third area of interest is the design of methods for large scale sequential decision problems, such as control of Markov decision processes. Again, computational efficiency is a crucial requirement. This is a common feature in all of these areas: the interplay between the constraint of computational efficiency and the statistical properties of a method.
My main theoretical interest is in understanding why we are able to do statistics as well as we do on very high dimensional datasets without knowing much, even though least favorable (malicious God) formulations suggest we should not be able to do anything. Currently this has led me to focus on estimation of covariance matrices and their eigenstructures in high dimensions. Parallel applied interests are in:
- Computational biology, specifically at the moment regulatory networks in the cell. To my surprise some of the methods flowing out of my primary interest are relevant to this one.
- Atmospheric sciences ... in part as a ready source of questions based on very high dimensional data.
Random process theory and data analysis, risk analysis, spatial-temporal trajectory modeling, sports statistics, applications to ecology, forestry, marine biology, neuroscience, seismology and engineering.
My recent research focuses on statistical methods for random processes, random process data analysis and applications in engineering and science generally. Particular topics include: modelling the motion of animals and other entities and risk analysis for earthquakes, wildfires, floods, and similar phenomena. I also work some on sports statistics, am President of The International Environmetrics Society (TIES) and am Deputy Editor of the journal Environmetrics.
Statistical design of experiments, originated from agricultural applications, is used extensively in a wide range of scientific and industrial investigations. Experiments need to be properly designed so that valid information can be extracted at a lower cost. I am interested in efficient experimental designs and the related construction and combinatorial problems. Currently, I work mostly on design of experiments in the situation where the response depends on a large number of factors (variables), the so called factorial design. When a large number of factors have to be studied, but the experimental runs are expensive, it is not feasible to observe all possible combinations of the factors. For example, with just two settings for each factor, an experiment with 10 factors requires 2^10 runs to observe all the combinations. One aspect of my research deals with how to choose a "good" small subset of the factor combinations. There are interesting connections with combinatorics, coding theory and finite geometry.
causal inference in experiments and observational studies, missing data
My research and teaching activities concern the development and application of statistical and computational methods to address problems in biomedical and genomic research. The statistical inference questions are truly multivariate and involve the joint analysis of multiple, diverse, and high-dimensional datasets. High-throughput assays such as microarrays and next-generation sequencers allow biologists to monitor expression levels for entire genomes. A challenging task is to relate these genome-wide genotypes to biological and clinical covariates (e.g., age, sex, environmental exposure) and outcomes (e.g., cell type/state, affectedness/unaffectedness, survival time, response to treatment) as well as to the wealth of biological annotation metadata available on the web (e.g. Gene Ontology, KEGG pathways, PubMed literature).
Motivated by these biological challenges, my methodological research interests fall in the following two areas: loss-based estimation with cross-validation (parametric and non-parametric density estimation and regression, variable selection) and resampling-based multiple hypothesis testing. I am also interested in statistical computing and I am a core developer of the Bioconductor Project, an open-source and open-development software project for the analysis of biomedical and genomic data.
High-dimensional statistics, random matrices, high-dimensional robust regression, high-dimensional M-estimation, the bootstrap and resampling in high-dimension, limit theorems and statistical inference, applied statistics
I'm mostly interested in statistical problems for high-dimensional data. More specifically, I have been working on covariance estimation for large dimensional data. It's an important practical topic because a lot of data analysis relies on having good estimates of covariance. It's also very interesting because a lot of classical results break down (rather badly) in high-dimensions. So on the theory side, I study questions motivated by applications and ask myself how badly standard things break down and how I can describe what sort of "strange" phenomena happen. More practically, I try to use this understanding to come up with new methods that are computationally feasible and fix the problems I've identified. At the end of the day, this covers a wide range of things, from data analysis, to algorithms to questions that straddle theoretical statistics and modern probability theory. Right now, I'm starting to use these insights in some problems arising in Finance, having mostly to do with portfolio optimization. We have a good group of people in the department working on different aspects of these questions of high-dimensional data analysis, and it's really exciting to be doing this at Berkeley right now!
I am a probabilist and statistician working in the general area of stochastic processes and their applications. In the past, I have collaborated with Persi Diaconis and others on random matrices and various other aspects of probability on algebraic structures. I have numerous publications with Martin Barlow, Ed Perkins, Klaus Fleischmann, Tom Kurtz, Xiaowen Zhou, and Peter Donnelly on Dawson-Watanabe superprocesses and other measure-valued processes that arise in population biology, as well as with Jim Pitman on various coalescent models that appear in biology, chemistry and astrophysics. In the past, I have worked with Terry Speed, Mary Sara McPeek, Xiaowen Zhou, and others on phylogenetic invariants and interference regarding recombination.
I share an ongoing interest in biodemography with David Steinsaltz and Ken Wachter that has resulted in papers on fitness landscapes, mutation-selection balance, stochastic PDE models of bacteria and yeast aging, and applications of quasistationarity to mortality modeling.
I continue research on probability and real trees, particularly applications of ideas from metric geometry such as the Gromov-Hausdorff metric, some of it in collaboration with Tye Lidman, Jim Pitman, and Anita Winter. I am investigating tree statistics and most recent common ancestors in diploid populations with Erick Matsen. Monty Slatkin and I are researching allele frequency spectra for time-varying population sizes.
I am in the middle of an extensive project involving Tandy Warnow, Don Ringe, Luay Nakhleh, and Francois Barbancon on several aspects of phylogenetic inference - particularly applications of computational phylogenetic methods in historical linguistics.
I currently have students working on stepping stone models and coalescent sticky flows, the population genetics of hybrid zones, random matrices associated with Coxeter groups, random matrices arising from random trees and random networks, infinite-dimensional dynamical systems applied to mutation-selection balance, and connections between matrix-valued orthogonal polynomials and queuing theory.
Models of Percolation, Phase transitions in Statistical Mechanics, Mixing time of Markov chains, Random walk on graphs, Counting problems in non-linear sparse settings.
Financial economics, statistical evaluation of investment strategies, asset allocation, credit and counterparty risk, socially responsible investing
My primary interest is the development of a broad, widely applicable, statistically sound, quantitative framework for measuring and managing financial risk. This is very topical and important, given the turbulence that has plagued financial markets during the last twelve months. With colleagues at MSCI Barra, I am working on extreme risk attribution, generalized portfolio optimization, and the development of downside safe financial indices. Many of my research articles can be found in the MSCI Barra Research Library, and some are posted on the Social Sciences Research Network.
Statistical mechanics, studied rigorously via modern techniques from mathematical probability
High dimensional and integrative genomic data analysis; Network modeling;Hierarchical multi-label classification; translational bioinformatics
My research areas are in Computational Biology and Applied Statistics. Particularly, I am interested in solving practical problems in emerging bio data-intensive systems, and in understanding and developing theoretical principles of the practical methods. My current focuses are: 1) develop statistical methods that provide a consistent formulation between the statistical modeling and the biological nature of data, 2) understand and solve the problem of unreliable estimates in analyzing high dimensional structured data, and 3) tackle the challenges posed by the high level of noise and the lack of reproducibility in the datasets from different resources.
Infectious diseases, specifically HIV; chronic disease epidemiology; environmental epidemiology; survival analysis; human rights statistics
As a biostatistician, my research focuses on the application of novel statistical techniques to the design, analysis and interpretation of data arising from public health and human disease studies. From a statistical point of view I am interested in survival analysis-- particularly current status data, statistical methods for epidemiology--particularly for (i) infectious diseases and (ii) environmental exposures, causal inference in intervention studies, survey methods and longitudinal data analysis, and applications of statistics to molecular and cell biology. My work is motivated by application of such statistical techniques most recently to studies of HIV disease and AIDS including intervention trials to reduce HIV transmission in Africa, the impact of pesticide exposure to pregnancy outcomes and infant neurodevelopment, measuring key factors in epidemic growth in situations such as SARS, assessing drug safety with particular interest in the adverse cardiovascular side effects of Cox-2 inhibitors, the measurement of PBDEs in peregrine falcon eggs in California, and the assessment of civilian casualties in times of conflict.
I have interests that span the spectrum from theory to algorithms to applications. I'm most interested in problems that arise when working with non-traditional data types; examples I've worked with include document corpora, graphs, protein structures, phylogenies and multi-media signals. Working with these kinds of data types often leads one to work on problems of an unusually large scale, where classical methods can be infeasible on computational grounds. I've thus been interested in new computational methods for large-scale problems; specifically I've worked on the development of novel estimators using tools from constrained optimization theory and convex analysis. I'm also interested in the interface between probability theory and nonparametric statistics, particularly in the setting known as "nonparametric Bayes", where the prior distribution is a general stochastic process. Here ideas familiar in modern probability theory, such as the Chinese restaurant process and stick-breaking distributions, yield novel statistical models and novel inference procedures. These methods have numerous applications in areas such as statistical genetics, image processing and natural language processing. I'm quite interested in pursuing these applications, particularly in collaboration with biologists and computer scientists.
My research interests are in probability theory. I have done research on sums of independent random variables, laws of the iterated logarithm, approximations of tail probabilities, operator limit theorems for sums of independent random vectors, expectations of functions of sums of fixed and random numbers of random variables, as well as arbitrary self-normalized sums.
machine learning, statistical prediction, variational inference, statistical computing, optimization
My work focuses on the development and application of statistical methods in genomics. Most of it concentrates on making inferences regarding function and evolution from molecular and genetic data. Some of the projects that I am currently involved in are in the areas of human population genetics, comparative evolutionary genomics, coalescent theory, and statistical methods in molecular ecology. Examples include evolutionary analyses of whole genome data from a diverse set of organisms including bacteria, the Giant Panda, the Rhesus Macaque monkey, humans, and chimpanzees, development of methods for association mapping which can accommodate non-linear interactions, and the development of MCMC methods for inferring demographic parameters in population genetics.
One of my current interests is in the area of cyber-infrastructure for education. I am studying problems surrounding the design of electronic documents which provide ways for authors and readers to interact with and dynamically view a data analysis process/statistics research activity. I am also interested in problems in more traditional areas of statistics that are related to high-dimensional modeling and model selection.
environmental statistics, statistical computing, spatial statistics, Bayesian statistics
I have been interested in interfaces between the traditional theory of stochastic processes and other areas of mathematics, especially combinatorics. I have studied various random combinatorial objects, such as permutations, partitions, and trees, and how the asymptotic behaviour of such structures over a large number of elements can be described in probabilistic terms, most often involving Brownian motion and related processes. This has led to the study of various measure-valued and partition-valued Markov processes whose behaviour may be understood in terms of combinatorial constructions involving random trees. I am at present engaged in developing various ideas related to random partitions, random trees, irreversible processes of coalescence, and their time reversals which provide models for random splitting or fragmentation.
I view this line of research largely as pure mathematics, but mathematics of a concrete kind which is often motivated and influenced by applications. Stochastic models with a natural probabilistic structure typically turn up in different disguises in diverse fields. The study of their mathematical structure allows ideas and results developed in one context to be transferred to another.
My interests are in statistical problems arising in the field of molecular biology and genetics and in particular high-throughput genomic experiments. I am currently working on methodologies for analyzing gene expression and alternative splicing using microarrays and ultra high-throughput sequencing. Methodologically, I am interested in high-dimensional and multivariate techniques that are relevant in this context.
I was drawn to the discipline of statistics by a fascination with randomness, and by the way it blends mathematics and scientific content. I have enjoyed interactions with researchers in many areas, especially the natural sciences, which have given me opportunities to learn about these fields and to make contributions to them. Underlying these separate analyses are paradigms of statistical methodology that have been developed over the last 100 years. This evolution accelerates as statisticians confront new challenges in the information age.
I am especially interested in developing methods for analyzing data that arise in the form of random functions, such as time series, and which involve large quantities of data and computationally intensive analysis. Much of my recent work has centered around two projects in astronomy: detecting objects in the outer regions of the solar system (the Kuiper Belt) and detecting gamma-ray pulsars.
My research interests are in theoretical computer science, algorithms, randomized computation, Markov Chains, phase transitions, statistical physics, and combinatorial optimization. Most of my work involves applying probabilistic ideas in some way, usually to design or analyze algorithms.
Computational biology, statistical genetics, applied probability
My research centers around computational biology and mathematical population genetics. I am generally interested in developing methods, using techniques from statistics and computer science, to address problems that arise from evolutionary molecular biology. I am also interested in combinatorial optimization, algorithms, and Monte Carlo methods.
uncertainty quantification and inference, inverse problems, nonparametrics, risk assessment, earthquake prediction, election auditing, geomagnetism, cosmology, litigation, food/nutrition
My research centers on inference (inverse) problems, primarily in physical science. I am especially interested in confidence procedures tailored for specific goals and in quantifying the uncertainty in inferences that rely on simulations of complex physical systems. I've done research on the internal structure of Sun and Earth, climate modeling, earthquake prediction, the Big Bang, the geomagnetic field, election auditing, geriatric hearing loss, the U.S. census, the effectiveness of Internet content filters, endangered species, spectrum estimation, urban foraging, and information retrieval. I am interested in numerical optimization, and have published some software.
I've consulted in product liability litigation, truth in advertising, equal protection under the law, jury selection, trade secret litigation, employment discrimination litigation, import restrictions, insurance litigation, natural resource legislation, environmental litigation, patent litigation, sampling in litigation, wage and hour class actions, product liability class actions, consumer class actions, the U.S. census, clinical trials, signal processing, geochemistry, IC mask quality control, targeted marketing, water treatment, sampling the web, risk assessment, and oil exploration.
I am mathematician with a wide range of interests ranging from combinatorics and algebraic geometry to optimization and computational biology. I have always been fascinated with numerical experiments and data analysis, and this led me quite naturally into the emerging field of algebraic statistics. Here I have worked on phylogenetics, Markov bases, symbolic computation of Bayesian integrals, likelihood inference, and the geometry of conditional independence models for discrete and Gaussian random variables.
The goal of my research group is to develop statistical methods to estimate/learn causal and non-causal parameters of interest, based on potentially complex and high dimensional data from randomized clinical trials or observational longitudinal studies, or from cross-sectional (e.g., case-control sampling) studies. The model assumptions under which these methods are valid should be clearly formulated, so that they can be subject to scrutiny. The estimates should be accompanied by confidence regions for the true parameter values or other types of confidence measures (e.g., variability/reproducibility of clusters as measured by the bootstrap). The longitudinal data structures may involve high dimensional measurements such as whole genome profiles at various points in time; censoring and missingness of data due to a subject not responding well to treatment (or not feeling well); and changes of treatment at various points in time, based on variables related to the outcome of interest. Our methods are designed to rely on as few assumptions as possible on nuisance parameters so that they provide maximally objective statistical inference and testing procedures. To develop and refine these methods, we work with simulated and real data in collaboration with biologists, medical researchers, epidemiologists, and others.
Mathematical demography, models for the evolution of aging, simulation.
As a mathematical demographer and statistician, I study systematic constraints and random influences that shape the structure of human populations. I helped develop methods of computer simulation to understand the rarity of coresident family members in pre-industrial English households. With these methods, I am now forecasting the kin and family support available to new generations of elderly in the Twentyfirst century. Working in "non-linear" demography, I have identified mechanisms that give rise to specific kinds of cycles in fertility and population growth. I am currently interested in patterns of mortality at extreme ages shared between humans and other species, trying to reconcile them with statistical models for long-term processes of evolutionary change.
In broad terms, I am interested in problems at the interface between computation and statistics. Part of my research focuses on algorithms and Markov random fields, a class of probabilistic model based on graphs used to capture dependencies in multivariate data (e.g., image models, data compression, computational biology). In general, exact solutions to inference problems in such models are computationally intractable, so that there is a great deal of interest in approximate methods for statistical inference. I am also interested in studying the effect of decentralization and communication constraints in statistical inference problems. A final area of interest is methodology and theory for high-dimensional inference problems, in which the model dimension is of the same order (or larger than) the sample size.
Statistical inference for high dimensional data and interdisciplinary research in neuroscience, remote sensing, and text summarization.
I am currently working on statistical methodologies and models involving large data sets from remote sensing, data networks (internet and sensor networks), neuroscience, finance, and bioinformatics. Together with my students and collaborators, I have been working on different areas of statistical machine learning, theoretical and computational. These areas include boosting, Lasso, support vector machines (SVM), and semi-supervised learning. On the computational side, we have developed algorithms such as BLasso and iCAP for sparse modeling. My past research areas have also included empirical processes, Markov Chain Monte Carlo, signal processing, the minimum description length principle (MDL), and information theory.