Sandrine Dudoit

Professor

Status

Current

Website

http://www.stat.berkeley.edu/~sandrine

Office / Location

327 Evans Hall

sandrine@stat.berkeley.edu

Research Expertise and Interests

statistics, applied statistics, data science, statistical computing, computational biology and genomics

Research Description

https://vcresearch.berkeley.edu/faculty/sandrine-dudoit

My research and teaching activities concern the development and application of statistical methods and software for the analysis of biomedical and genomic data.

Statistical methodology. My methodological research interests regard high-dimensional inference and include exploratory data analysis (EDA), visualization, loss-based estimation with cross-validation (e.g., density estimation, regression, model selection), and multiple hypothesis testing.

Applications to biomedical and genomic research. Much of my methodological work is motivated by statistical inference questions arising in biological research and, in particular, the design and analysis of high-throughput microarray and sequencing gene expression experiments. The novel assays allow biologists to monitor expression levels for entire genomes. A challenging task is to relate these genome-wide genotypes to biological and clinical covariates (e.g., age, sex, environmental exposure) and outcomes (e.g., cell type/state, affectedness/unaffectedness, survival time, response to treatment) as well as to the wealth of biological annotation metadata available on the web (e.g. Gene Ontology (GO), KEGG pathways, PubMed literature). My recent focus has been on single-cell transcriptome sequencing (RNA-Seq) for discovering novel cell types and for the study of stem cell differentiation. My contributions include: exploratory data analysis, normalization and expression quantitation, differential expression analysis, class discovery, prediction, cell lineage inference, integration of biological annotation metadata.

Statistical computing. I am also interested in statistical computing and, in particular, reproducible research. I am a founding core developer of the Bioconductor Project (http://www.bioconductor.org), an open-source and open-development software project for the analysis of biomedical and genomic data.

Research Areas

Statistical Computing

Applications in Biology and Medicine

High Dimensional Data Analysis

Non-Parametric Inference

Artificial Intelligence/Machine Learning