Using Random Forest to Learn Imbalanced Data

July, 2004
Report Number: 
666
Authors: 
Chao Chen, Andy Liaw and Leo Breiman
Abstract: 

In this paper we propose two ways to deal with the imbalanced data classification problem using random forest. One is based on cost sensitive learning, and the other is based on a sampling technique. Performance metrics such as precision and recall, false positive rate and false negative rate, $F$-measure and weighted accuracy are computed. Both methods are shown to improve the prediction accuracy of the minority class, and have favorable performance compared to the existing algorithms.

PDF File: 
Postscript File: