Using Random Forest to Learn Imbalanced Data

Using Random Forest to Learn Imbalanced Data

Report Number
666
Authors
Chao Chen, Andy Liaw and Leo Breiman
Abstract

In this paper we propose two ways to deal with the imbalanced data classification problem using random forest. One is based on cost sensitive learning, and the other is based on a sampling technique. Performance metrics such as precision and recall, false positive rate and false negative rate, $F$-measure and weighted accuracy are computed. Both methods are shown to improve the prediction accuracy of the minority class, and have favorable performance compared to the existing algorithms.

PDF File
Postscript File