Binning in Gaussian Kernel Regularization

Binning in Gaussian Kernel Regularization

Report Number
689
Authors
Tao Shi and Bin Yu
Abstract

Gaussian kernel regularization is widely used in the machine learning literature and proven successful in many empirical experiments. The periodic version of the Gaussian kernel regularization has been shown to be minimax rate optimal in estimating functions in any finite order Sobolev spaces. However, for a data set with $n$ points, the computation complexity of the Gaussian kernel regularization method is of order $O$($n^3$).

In this paper we propose to use binning to reduce the computation of Gaussian kernel regularization in both regression and classification. For the periodic Gaussian kernel regression, we show that the binned estimator achieves the same minimax rates of the unbinned estimator, but the computation is reduced to $O(m^3)$ with $m$ as the number of bins. To achieve the minimax rate in the $k$-th order Sobolev space, $m$ needs to be in the order of $O(kn^{1/(2k+1)})$, which makes the binned estimator computation of order $O(n)$ for $k=1$ and even less for larger $k$. Our simulations show that the binned estimator (binning 120 data points into 20 bins in our simulation) provides almost the same accuracy with only 0.4\% of computation time.

For classification, binning with the $L2$-loss Gaussian kernel regularization and the Gaussian kernel Support Vector Machines is tested in a polar cloud detection problem. With basically the same computation time, the $L2$-loss Gaussian kernel regularization on 966 bins achieves better test classification rate (79.22\%) than that (71.40\%) on 966 randomly sampled data. Using the OSU-SVM Matlab package, the SVM trained on 966 bins has a comparable test classification rate as the SVM trained on 27,179 samples, but reduces the training time from 5.99 hours to 2.56 minutes. The SVM trained on 966 randomly selected samples has a similar training time as and a slightly worse test classification rate than the SVM on 966 bins, but has 67\% more support vectors so takes 67\% longer to predict on a new data point. The SVM trained on 512 cluster centers from the k-mean algorithm reports almost the same test classification rate and a similar number of support vectors as the SVM on 512 bins, but the k-mean clustering itself takes 375 times more computation time than binning.

PDF File
Postscript File