Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method

Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method

Report Number
600
Authors
David A. Freedman
Citation
PDf
Abstract

The burgeoning field of genomics, and in particular microarray experiments, have revived interest in both discriminant and cluster analysis, by raising new methodological and computational challenges. The present paper discusses applications of resampling methods to problems in cluster analysis. A resampling method, known as bagging in discriminant analysis, is applied to increase clustering accuracy and to assess the confidence of cluster assignments for individual observations. A novel prediction-based resampling method is also proposed to estimate the number of clusters, if any, in a dataset. The performance of the proposed and existing methods are compared using simulated data and gene expression data from four recently published cancer microarray studies.

PDF File
Postscript File