Over the past decades, technological advances have made the collection and storage of very large quantities of data feasible. As a result, different fields now face the problem of analyzing high dimensional data sets for different purposes. It is often the case that the number of variables involved in the analysis is large but the number of sample observations cannot be made much larger due to constraints such as time and cost. Regularization methods are a powerful yet versatile way of adding structural information to the estimation procedure.
In this paper, we introduce the Composite Absolute Penalties (CAP) family of penalty functions. They are convex, highly customizable and enable users to incorporate their side knowledge about grouping and hierarchical structures into the regularization procedure. Natural grouping and hierarchies amongst the variables arises in many situations (e.g. ANOVA with categorical variables and interaction terms respectively).
We provide a Bayesian interpretation for the penalties in the CAP penalty, computational methods for computing its regularization path and, for some particular cases, unbiased estimates of the degrees of freedom of the estimates along the regularization path.
The CAP estimation procedure is then illustrated through simulation examples and its application in the problem of cloud detection over the arctic cap.