Open AccessPosted Content
Generalized Conditional Gradient for Sparse Estimation
TL;DR: In this paper, the generalized conditional gradient (GCGGCG) algorithm was investigated for solving structured sparse optimization problems, and it can provide a more efficient alternative to current state-of-the-art approaches.
read more
Abstract: Structured sparsity is an important modeling tool that expands the applicability of convex formulations for data analysis, however it also creates significant challenges for efficient algorithm design. In this paper we investigate the generalized conditional gradient (GCG) algorithm for solving structured sparse optimization problems---demonstrating that, with some enhancements, it can provide a more efficient alternative to current state of the art approaches. After providing a comprehensive overview of the convergence properties of GCG, we develop efficient methods for evaluating polar operators, a subroutine that is required in each GCG iteration. In particular, we show how the polar operator can be efficiently evaluated in two important scenarios: dictionary learning and structured sparse estimation. A further improvement is achieved by interleaving GCG with fixed-rank local subspace optimization. A series of experiments on matrix completion, multi-class classification, multi-view dictionary learning and overlapping group lasso shows that the proposed method can significantly reduce the training cost of current alternatives.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Stochastic Frank-Wolfe methods for nonconvex optimization
Sashank J. Reddi,Suvrit Sra,Barnabás Póczos,Alexander J. Smola +3 more
- 01 Sep 2016
TL;DR: For objective functions that decompose into a finite-sum, ideas from variance reduction for convex optimization are leveraged to obtain new variance reduced nonconvex Frank- Wolfe methods that have provably faster convergence than the classical Frank-Wolfe method.
225
Complexity Bounds for Primal-dual Methods Minimizing the Model of Objective Function
TL;DR: This work provides Frank–Wolfe method with a convergence analysis allowing to approach a primal-dual solution of convex optimization problem with composite objective function and justifies a new variant of this method, which can be seen as a trust-region scheme applying to the linear model of objective function.
Communication-Efficient Distributed Optimization of Self-concordant Empirical Loss
Yuchen Zhang,Lin Xiao +1 more
TL;DR: A communication-efficient distributed algorithm to minimize the overall empirical loss, which is the average of the local empirical losses of the distributed computing system, based on an inexact damped Newton method.
•Posted Content
Structured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity Analysis
TL;DR: In this article, the authors consider constrained nonconvex optimization models in block decision variables, with or without coupled affine constraints, and show a sublinear rate of convergence to an $\epsilon$-stationary solution in the form of variational inequality for a generalized conditional gradient method.
62
•Proceedings Article
On Frank-Wolfe and Equilibrium Computation
Jacob Abernethy,Jun-Kun Wang +1 more
- 01 Jan 2017
TL;DR: This paper considers the Frank-Wolfe method for constrained convex optimization, and shows that this classical technique can be interpreted from a different perspective: FW emerges as the computation of an equilibrium (saddle point) of a special convex-concave zero sum game.
References
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Support-Vector Networks
Corinna Cortes,Vladimir Vapnik +1 more
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
Amir Beck,Marc Teboulle +1 more
TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
14.3K
Learning the parts of objects by non-negative matrix factorization
TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.
14.2K
•Book
Optimization and nonsmooth analysis
Frank H. Clarke
- 01 Jan 1983
TL;DR: The Calculus of Variations as discussed by the authors is a generalization of the calculus of variations, which is used in many aspects of analysis, such as generalized gradient descent and optimal control.