statistics, applied statistics, data science, statistical computing, computational biology and genomics
My research and teaching activities concern the development and application of statistical methods and software for the analysis of biomedical and genomic data.
Statistical methodology. My methodological research interests regard high-dimensional inference and include exploratory data analysis (EDA), visualization, loss-based estimation with cross-validation (e.g., density estimation, regression, model selection), and multiple hypothesis testing.
Applications to biomedical and genomic research. Much of my methodological work is motivated by statistical inference questions arising in biological research and, in particular, the design and analysis of high-throughput microarray and sequencing gene expression experiments. The novel assays allow biologists to monitor expression levels for entire genomes. A challenging task is to relate these genome-wide genotypes to biological and clinical covariates (e.g., age, sex, environmental exposure) and outcomes (e.g., cell type/state, affectedness/unaffectedness, survival time, response to treatment) as well as to the wealth of biological annotation metadata available on the web (e.g. Gene Ontology (GO), KEGG pathways, PubMed literature). My recent focus has been on single-cell transcriptome sequencing (RNA-Seq) for discovering novel cell types and for the study of stem cell differentiation. My contributions include: exploratory data analysis, normalization and expression quantitation, differential expression analysis, class discovery, prediction, cell lineage inference, integration of biological annotation metadata.
Statistical computing. I am also interested in statistical computing and, in particular, reproducible research. I am a founding core developer of the Bioconductor Project (http://www.bioconductor.org), an open-source and open-development software project for the analysis of biomedical and genomic data.