TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
TL;DR: Using a coordinate descent procedure for the lasso, a simple algorithm is developed that solves a 1000-node problem in at most a minute and is 30-4000 times faster than competing methods.
Abstract: We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Buhlmann (2006). We illustrate the method on some cell-signaling data from proteomics.
TL;DR: Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.
Abstract: Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of 1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.
TL;DR: In this article, the convergence properties of a block coordinate descent method applied to minimize a non-convex function f(x1,.., x 2, N 3 ) with certain separability and regularity properties were studied.
Abstract: We study the convergence properties of a (block) coordinate descent method applied to minimize a nondifferentiable (nonconvex) function f(x
1, . . . , x
N
) with certain separability and regularity properties. Assuming that f is continuous on a compact level set, the subsequence convergence of the iterates to a stationary point is shown when either f is pseudoconvex in every pair of coordinate blocks from among N-1 coordinate blocks or f has at most one minimum in each of N-2 coordinate blocks. If f is quasiconvex and hemivariate in every coordinate block, then the assumptions of continuity of f and compactness of the level set may be relaxed further. These results are applied to derive new (and old) convergence results for the proximal minimization algorithm, an algorithm of Arimoto and Blahut, and an algorithm of Han. They are applied also to a problem of blind source separation.
TL;DR: This work introduces a pathwise algorithm for the Cox proportional hazards model, regularized by convex combinations of ℓ1 andℓ2 penalties (elastic net), and employs warm starts to find a solution along a regularization path.
Abstract: We introduce a pathwise algorithm for the Cox proportional hazards model, regularized by convex combinations of l1 and l2 penalties (elastic net). Our algorithm fits via cyclical coordinate descent, and employs warm starts to find a solution along a regularization path. We demonstrate the efficacy of our algorithm on real and simulated data sets, and find considerable speedup between our algorithm and competing methods.