Simultaneous support recovery in high dimensions: Benefits and perils of block $\ell_1/\ell_\infty$-regularization

Simultaneous support recovery in high dimensions: Benefits and perils of block $\ell_1/\ell_\infty$-regularization

Report Number
774
Authors
S. Negahban and M. J. Wainwright
Abstract

Given a collection of $r \geq 2$ linear regression problems in $\pdim$ dimensions, suppose that the regression coefficients share partially common supports. This set-up suggests the use of $\ell_{1}/\ell_{\infty}$-regularized regression for joint estimation of the $\pdim \times \numreg$ matrix of regression coefficients. We analyze the high-dimensional scaling of $\ell_1/\ell_\infty$-regularized quadratic programming, considering both consistency rates in $\ell_\infty$-norm, and also how the minimal sample size $n$ required for performing variable selection grows as a function of the model dimension, sparsity, and overlap between the supports. We begin by establishing bounds on the $\ell_\infty$-error as well sufficient conditions for exact variable selection for fixed design matrices, as well as designs drawn randomly from general Gaussian matrices. Our second set of results applies to $\numreg = 2$ linear regression problems with standard Gaussian designs whose supports overlap in a fraction $\alpha \in [0,1]$ of their entries: for this problem class, we prove that the $\ell_{1}/\ell_{\infty}$-regularized method undergoes a phase transition---that is, a sharp change from failure to success---characterized by the rescaled sample size $\theta_{1,\infty}(n, p, s, \alpha) = n/\{(4 - 3 \alpha) s \log(p-(2- \alpha) \, s)\}$. More precisely, given sequences of problems specified by $(n, p, s, \alpha)$, for any $\delta > 0$, the probability of successfully recovering both supports converges to $1$ if $\theta_{1, \infty}(n, p, s, \alpha) > 1+\delta$, and converges to $0$ for problem sequences for which $\theta_{1,\infty}(n,p,s, \alpha) < 1 - \delta$. An implication of this threshold is that use of $\ell_1 / \ell_{\infty}$-regularization yields improved statistical efficiency if the overlap parameter is large enough ($\alpha > 2/3$), but has \emph{worse} statistical efficiency than a naive Lasso-based approach for moderate to small overlap ($\alpha < 2/3$). Empirical simulations illustrate the close agreement between these theoretical predictions, and the actual behavior in practice. These results indicate that some caution needs to be exercised in the application of $\ell_1/\ell_\infty$ block regularization: if the data does not match its structure closely enough, it can impair statistical performance relative to computationally less expensive schemes.

PDF File