Bivariate variable selection for classification problem

Bivariate variable selection for classification problem

Report Number
692
Authors
Vivian Ng and Leo Breiman
Abstract

In recent years, large amount of attention has been placed on variable or feature selection in various domains. Varieties of variable selection methods have been proposed in the literature. However, most of them are focused on univariate variable selection -- method that selects relevant variables one by one. Currently, there is not much emphasis on variable selection on pairs of variables. It is not unreasonable, as researchers in industries have been asked to identify pairs of variables that are relevant. All is well using univariate variable selection for identifying independently significant variables, but pairs of independently important variables are not the same as pairs of variables that have joint effect. Therefore, univariate variable selection methods are not applicable in selecting pairs of linked variables. To overcome this obstacle, Professor Breiman and I propose a bivariate variable selection method that detects linked pairs of variables. It is equally important to learn the relationship between each linked pair with the response variable. To this end, a graphical tool is designed for visualizing the relationship uncovered by the proposed bivariate variable selection method.

PDF File
Postscript File