Faculty Research Interests
Analysis of algorithms, applied probability, complex networks, entropy, mathematical probability, phylogenetic trees, random networks, spatial networks. Also popularization of probability.
I do research in mathematical probability. A central theme is the study of large finite random structures, obtaining asymptotic behavior as the size tends to infinity via consideration of some suitable infinite random structure. Much current work involves random network models.
My research interests are in the areas of machine learning, statistical learning theory, and reinforcement learning. I work on the theoreticalanalysis of computationally efficient methods for large or otherwise complex prediction problems. One example is structured prediction problems, where there is considerable complexity to the space of possible predictions. Such methods are important in a variety of application areas, including natural language processing, computer vision, and bioinformatics. A second area of interest is the analysis of prediction methods in a deterministic, game-theoretic setting. As well as being of interest in areas such as computer security, where an adversarial environment is a reasonable model, this analysis also provides insight into the design and understanding of prediction methods in a probabilistic setting. A third area of interest is the design of methods for large scale sequential decision problems, such as control of Markov decision processes. Again, computational efficiency is a crucial requirement. This is a common feature in all of these areas: the interplay between the constraint of computational efficiency and the statistical properties of a method.
My main theoretical interest is in understanding why we are able to do statistics as well as we do on very high dimensional datasets without knowing much, even though least favorable (malicious God) formulations suggest we should not be able to do anything. Currently this has led me to focus on estimation of covariance matrices and their eigenstructures in high dimensions. Parallel applied interests are in:
- Computational biology, specifically at the moment regulatory networks in the cell. To my surprise some of the methods flowing out of my primary interest are relevant to this one.
- Atmospheric sciences ... in part as a ready source of questions based on very high dimensional data.
Random process theory and data analysis, risk analysis, spatial-temporal trajectory modeling, sports statistics, applications to ecology, forestry, marine biology, neuroscience, seismology and engineering.
My recent research focuses on statistical methods for random processes, random process data analysis and applications in engineering and science generally. Particular topics include: modelling the motion of animals and other entities and risk analysis for earthquakes, wildfires, floods, and similar phenomena. I also work some on sports statistics, am President of The International Environmetrics Society (TIES) and am Deputy Editor of the journal Environmetrics.
Statistical design of experiments, originated from agricultural applications, is used extensively in a wide range of scientific and industrial investigations. I am interested in efficient experimental designs and the related construction and combinatorial problems. Currently, I work mostly on design of experiments in the situation where the response depends on a large number of factors (variables), the so called factorial design. When a large number of factors have to be studied, but the experimental runs are expensive, it is not feasible to observe all possible combinations of the factors. One aspect of my research deals with how to choose a "good" small subset of the factor combinations. There are interesting connections with combinatorics, coding theory and finite geometry.
causal inference in experiments and observational studies, missing data
My methodological research interests regard high-dimensional inference and include exploratory data analysis (EDA), dimensionality reduction, visualization, loss-based estimation with cross-validation (e.g., density estimation, classification, regression, model selection), cluster analysis, and multiple hypothesis testing.
Much of my methodological work is motivated by statistical inference questions arising in biological research and, in particular, the design and analysis of high-throughput microarray and sequencing gene expression experiments, e.g., single-cell transcriptome sequencing (RNA-Seq) for discovering novel cell types and for the study of stem cell differentiation. My contributions include: exploratory data analysis, normalization and expression quantitation, differential expression analysis, class discovery, prediction, inference of cell lineages, integration of biological annotation metadata (e.g., Gene Ontology (GO) annotation).
I am also interested in statistical computing and, in particular, reproducible research. I am a founding core developer of the Bioconductor Project, an open-source and open-development software project for the analysis of biomedical and genomic data.
High-dimensional statistics, random matrices, high-dimensional robust regression, high-dimensional M-estimation, the bootstrap and resampling in high-dimension, limit theorems and statistical inference, applied statistics. Recent interests include auction theory from the bidder standpoint.
I am a probabilist and statistician working in the general area of stochastic processes and their applications.
Particular areas of interest are:
- random matrices
- probability on algebraic structures
- Dawson-Watanabe superprocesses and other measure-valued processes arising in biology
- coalescent models that appear in biology, chemistry and astrophysics
- phylogenetic invariants and phylogenetic inference in both biology and historical linguistics
- biodemography (e.g. fitness landscapes, mutation-selection balance, stochastic PDE models of bacteria and yeast aging, and applications of quasistationarity to mortality modeling)
- probability and real trees, particularly applications of ideas from metric geometry such as the Gromov-Hausdorff metric
- population genetics (e.g. allele frequency spectra for time-varying population sizes and inference from ancient DNA)
- metric measure spaces
- Levy processes and Brownian motion
Models of Percolation, Phase transitions in Statistical Mechanics, Mixing time of Markov chains, Random walk on graphs, Counting problems in non-linear sparse settings.
Financial economics, statistical evaluation of investment strategies, asset allocation, credit and counterparty risk, socially responsible investing, tax-aware investing, causal inference, random matrix theory, sports statistics.
Statistical mechanics, studied rigorously via modern techniques from mathematical probability
High dimensional and integrative genomic data analysis; Network modeling;Hierarchical multi-label classification; translational bioinformatics
My research areas are in Computational Biology and Applied Statistics. Particularly, I am interested in solving practical problems in emerging bio data-intensive systems, and in understanding and developing theoretical principles of the practical methods. My current focuses are: 1) develop statistical methods that provide a consistent formulation between the statistical modeling and the biological nature of data, 2) understand and solve the problem of unreliable estimates in analyzing high dimensional structured data, and 3) tackle the challenges posed by the high level of noise and the lack of reproducibility in the datasets from different resources.
Infectious diseases, specifically HIV; chronic disease epidemiology; environmental epidemiology; survival analysis; human rights statistics
As a biostatistician, my research focuses on the application of novel statistical techniques to the design, analysis and interpretation of data arising from public health and human disease studies. From a statistical point of view I am interested in survival analysis-- particularly current status data, statistical methods for epidemiology--particularly for (i) infectious diseases and (ii) environmental exposures, causal inference in intervention studies, survey methods and longitudinal data analysis, and applications of statistics to molecular and cell biology. My work is motivated by application of such statistical techniques most recently to studies of HIV disease and AIDS including intervention trials to reduce HIV transmission in Africa, the impact of pesticide exposure to pregnancy outcomes and infant neurodevelopment, measuring key factors in epidemic growth in situations such as SARS, assessing drug safety with particular interest in the adverse cardiovascular side effects of Cox-2 inhibitors, the measurement of PBDEs in peregrine falcon eggs in California, and the assessment of civilian casualties in times of conflict.
I have interests that span the spectrum from theory to algorithms to applications. I'm most interested in problems that arise when working with non-traditional data types; examples I've worked with include document corpora, graphs, protein structures, phylogenies and multi-media signals. Working with these kinds of data types often leads one to work on problems of an unusually large scale, where classical methods can be infeasible on computational grounds. I've thus been interested in new computational methods for large-scale problems; specifically I've worked on the development of novel estimators using tools from constrained optimization theory and convex analysis. I'm also interested in the interface between probability theory and nonparametric statistics, particularly in the setting known as "nonparametric Bayes", where the prior distribution is a general stochastic process. Here ideas familiar in modern probability theory, such as the Chinese restaurant process and stick-breaking distributions, yield novel statistical models and novel inference procedures. These methods have numerous applications in areas such as statistical genetics, image processing and natural language processing. I'm quite interested in pursuing these applications, particularly in collaboration with biologists and computer scientists.
My research interests are in probability theory. I have done research on sums of independent random variables, laws of the iterated logarithm, approximations of tail probabilities, operator limit theorems for sums of independent random vectors, expectations of functions of sums of fixed and random numbers of random variables, as well as arbitrary self-normalized sums.
machine learning, statistical prediction, variational inference, statistical computing, optimization
My work focuses on the development and application of statistical methods in genomics. Most of it concentrates on making inferences regarding function and evolution from molecular and genetic data. Some of the projects that I am currently involved in are in the areas of human population genetics, comparative evolutionary genomics, coalescent theory, and statistical methods in molecular ecology. Examples include evolutionary analyses of whole genome data from a diverse set of organisms including bacteria, the Giant Panda, the Rhesus Macaque monkey, humans, and chimpanzees, development of methods for association mapping which can accommodate non-linear interactions, and the development of MCMC methods for inferring demographic parameters in population genetics.
One of my current interests is in the area of cyber-infrastructure for education. I am studying problems surrounding the design of electronic documents which provide ways for authors and readers to interact with and dynamically view a data analysis process/statistics research activity. I am also interested in problems in more traditional areas of statistics that are related to high-dimensional modeling and model selection.
environmental statistics, statistical computing, spatial statistics, Bayesian statistics
I have been interested in interfaces between the traditional theory of stochastic processes and other areas of mathematics, especially combinatorics. I have studied various random combinatorial objects, such as permutations, partitions, and trees, and how the asymptotic behaviour of such structures over a large number of elements can be described in probabilistic terms, most often involving Brownian motion and related processes. This has led to the study of various measure-valued and partition-valued Markov processes whose behaviour may be understood in terms of combinatorial constructions involving random trees. I am at present engaged in developing various ideas related to random partitions, random trees, irreversible processes of coalescence, and their time reversals which provide models for random splitting or fragmentation.
I view this line of research largely as pure mathematics, but mathematics of a concrete kind which is often motivated and influenced by applications. Stochastic models with a natural probabilistic structure typically turn up in different disguises in diverse fields. The study of their mathematical structure allows ideas and results developed in one context to be transferred to another.
My interests are in statistical problems arising in the field of molecular biology and genetics and in particular high-throughput genomic experiments. I am currently working on methodologies for analyzing gene expression and alternative splicing using microarrays and ultra high-throughput sequencing. Methodologically, I am interested in high-dimensional and multivariate techniques that are relevant in this context.
I was drawn to the discipline of statistics by a fascination with randomness, and by the way it blends mathematics and scientific content. I have enjoyed interactions with researchers in many areas, especially the natural sciences, which have given me opportunities to learn about these fields and to make contributions to them. Underlying these separate analyses are paradigms of statistical methodology that have been developed over the last 100 years. This evolution accelerates as statisticians confront new challenges in the information age.
I am especially interested in developing methods for analyzing data that arise in the form of random functions, such as time series, and which involve large quantities of data and computationally intensive analysis. Much of my recent work has centered around two projects in astronomy: detecting objects in the outer regions of the solar system (the Kuiper Belt) and detecting gamma-ray pulsars.
My research interests are in theoretical computer science, algorithms, randomized computation, Markov Chains, phase transitions, statistical physics, and combinatorial optimization. Most of my work involves applying probabilistic ideas in some way, usually to design or analyze algorithms.
Computational biology, statistical genetics, applied probability
My research centers around computational biology and mathematical population genetics. I am generally interested in developing methods, using techniques from statistics and computer science, to address problems that arise from evolutionary molecular biology. I am also interested in combinatorial optimization, algorithms, and Monte Carlo methods.
uncertainty quantification and inference, inverse problems, nonparametrics, risk assessment, earthquake prediction, election auditing, geomagnetism, cosmology, litigation, food/nutrition
My research centers on inference (inverse) problems, primarily in physical science. I am especially interested in confidence procedures tailored for specific goals and in quantifying the uncertainty in inferences that rely on simulations of complex physical systems. I've done research on the internal structure of Sun and Earth, climate modeling, earthquake prediction, the Big Bang, the geomagnetic field, election auditing, geriatric hearing loss, the U.S. census, the effectiveness of Internet content filters, endangered species, spectrum estimation, urban foraging, and information retrieval. I am interested in numerical optimization, and have published some software.
I've consulted in product liability litigation, truth in advertising, equal protection under the law, jury selection, trade secret litigation, employment discrimination litigation, import restrictions, insurance litigation, natural resource legislation, environmental litigation, patent litigation, sampling in litigation, wage and hour class actions, product liability class actions, consumer class actions, the U.S. census, clinical trials, signal processing, geochemistry, IC mask quality control, targeted marketing, water treatment, sampling the web, risk assessment, and oil exploration.
I am mathematician with a wide range of interests ranging from combinatorics and algebraic geometry to optimization and computational biology. I have always been fascinated with numerical experiments and data analysis, and this led me quite naturally into the emerging field of algebraic statistics. Here I have worked on phylogenetics, Markov bases, symbolic computation of Bayesian integrals, likelihood inference, and the geometry of conditional independence models for discrete and Gaussian random variables.
The goal of my research group is to develop statistical methods to estimate/learn causal and non-causal parameters of interest, based on potentially complex and high dimensional data from randomized clinical trials or observational longitudinal studies, or from cross-sectional (e.g., case-control sampling) studies. The model assumptions under which these methods are valid should be clearly formulated, so that they can be subject to scrutiny. The estimates should be accompanied by confidence regions for the true parameter values or other types of confidence measures (e.g., variability/reproducibility of clusters as measured by the bootstrap). The longitudinal data structures may involve high dimensional measurements such as whole genome profiles at various points in time; censoring and missingness of data due to a subject not responding well to treatment (or not feeling well); and changes of treatment at various points in time, based on variables related to the outcome of interest. Our methods are designed to rely on as few assumptions as possible on nuisance parameters so that they provide maximally objective statistical inference and testing procedures. To develop and refine these methods, we work with simulated and real data in collaboration with biologists, medical researchers, epidemiologists, and others.
Mathematical demography, models for the evolution of aging, simulation.
As a mathematical demographer and statistician, I study systematic constraints and random influences that shape the structure of human populations. I helped develop methods of computer simulation to understand the rarity of coresident family members in pre-industrial English households. With these methods, I am now forecasting the kin and family support available to new generations of elderly in the Twentyfirst century. Working in "non-linear" demography, I have identified mechanisms that give rise to specific kinds of cycles in fertility and population growth. I am currently interested in patterns of mortality at extreme ages shared between humans and other species, trying to reconcile them with statistical models for long-term processes of evolutionary change.
In broad terms, I am interested in problems at the interface between computation and statistics. Part of my research focuses on algorithms and Markov random fields, a class of probabilistic model based on graphs used to capture dependencies in multivariate data (e.g., image models, data compression, computational biology). In general, exact solutions to inference problems in such models are computationally intractable, so that there is a great deal of interest in approximate methods for statistical inference. I am also interested in studying the effect of decentralization and communication constraints in statistical inference problems. A final area of interest is methodology and theory for high-dimensional inference problems, in which the model dimension is of the same order (or larger than) the sample size.
Statistical inference for high dimensional data and interdisciplinary research in neuroscience, remote sensing, and text summarization.
I am currently working on statistical methodologies and models involving large data sets from remote sensing, data networks (internet and sensor networks), neuroscience, finance, and bioinformatics. Together with my students and collaborators, I have been working on different areas of statistical machine learning, theoretical and computational. These areas include boosting, Lasso, support vector machines (SVM), and semi-supervised learning. On the computational side, we have developed algorithms such as BLasso and iCAP for sparse modeling. My past research areas have also included empirical processes, Markov Chain Monte Carlo, signal processing, the minimum description length principle (MDL), and information theory.