My research and teaching activities concern the development and application of statistical and computational methods to address problems in biomedical and genomic research. The statistical inference questions are truly multivariate and involve the joint analysis of multiple, diverse, and high-dimensional datasets. High-throughput assays such as microarrays and next-generation sequencers allow biologists to monitor expression levels for entire genomes. A challenging task is to relate these genome-wide genotypes to biological and clinical covariates (e.g., age, sex, environmental exposure) and outcomes (e.g., cell type/state, affectedness/unaffectedness, survival time, response to treatment) as well as to the wealth of biological annotation metadata available on the web (e.g. Gene Ontology, KEGG pathways, PubMed literature).
Motivated by these biological challenges, my methodological research interests fall in the following two areas: loss-based estimation with cross-validation (parametric and non-parametric density estimation and regression, variable selection) and resampling-based multiple hypothesis testing. I am also interested in statistical computing and I am a core developer of the Bioconductor Project, an open-source and open-development software project for the analysis of biomedical and genomic data.