Lasso-type recovery of sparse representations for high-dimensional data

Lasso-type recovery of sparse representations for high-dimensional data

Report Number
720
Authors
Nicolai Meinshausen and Bin Yu
Abstract

The Lasso (Tibshirani, 1996) is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables p is potentially much larger than the number of samples n. However, it was recently discovered (Zhao and Yu, 2006; Zou, 2005; Meinshausen and Buhlmann, 2006) that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in applications due to the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the l_2-norm sense for fixed designs under conditions on (a) the number s(n) of non-zero components of the vector beta(n) and (b) the minimal singular values of the design matrices that are induced by selecting of order s(n) variables. The results are extended to vectors beta in weak l_q-balls with 0<q<1. Our results imply that, with high probability, all important variables are selected. The set of selected variables is a useful (meaningful) reduction on the original set of variables (p(n) >n). Finally, our results are illustrated with the detection of closely adjacent frequencies, a problem encountered in astrophysics.

PDF File