Minimax-optimal rates for sparse additive models over kernel classes via convex programming
Sparse additive models are families of $d$-variate functions that have the additive decomposition \mbox{$f^* = \sum_{j \in S} f^*_j$,} where $S$ is a unknown subset of cardinality $s \ll d$. We consider the case where each component function $f^*_j$ lies in a reproducing kernel Hilbert space, and analyze a simple kernel-based convex program for estimating the unknown function $f^*$. Working within a high-dimensional framework that allows both the dimension $d$ and sparsity $s$ to scale, we derive convergence rates in the $L^2(\mathbb{P})$ and $L^2(\mathbb{P}_n)$ norms. These rates consist of two terms: a \emph{subset selection term} of the order $\frac{s \log d}{n}$, corresponding to the difficulty of finding the unknown $s$-sized subset, and an \emph{estimation error} term of the order $s \, \nu_n^2$, where $\nu_n^2$ is the optimal rate for estimating an univariate function within the RKHS. We complement these achievable results by deriving minimax lower bounds on the $L^2(\mathbb{P})$ error, thereby showing that our method is optimal up to constant factors for sub-linear sparsity $s = o(d)$. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, finite-rank kernel classes, as well as Sobolev smoothness classes.