Using Convex Psuedo-Data to Increase Prediction Accuracy

March 1, 1998

Report Number

513

Authors

Leo Breiman

Citation

Electronic Journal of Probability</em>, Vol. 5 (2000) Paper no. 2, pages 1-18

Abstract

A prediction algorithm is consistent if given a large enough sample of instances from the underlying distribution, it can achieve nearly optimal generalization accuracy. In practice, the training set is finite and does not give an adequate representation of the underlying distribution. Our work is based on a simple method for generating additional data from the existing data. Using this new data (convex pseudo-data) it is shown empirically that on a variety of data sets prediction accuracy of an algorithm can be significantly improved. This is shown first in classification using the CART algorithm. Similar results are shown in regression. Then pseudo-data is applied to bagging CART. Although CART is being used as a test bed, the idea of generating convex psuedo-data can be applied to any prediction method.

PDF File

513.pdf

Postscript File

513.ps.Z