Non-convex Optimization for Machine Learning
Prateek Jain,Purushottam Kar +1 more
TL;DR: Non-convex optimization as discussed by the authors is a generalization of the convex optimization problem, and it has been widely used in machine learning applications, such as deep learning and reinforcement learning.
read more
Abstract: A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks.
The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization.
On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques - popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties.
This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. The monograph will lead the reader through several widely used non-convex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Survey of Optimization Methods From a Machine Learning Perspective
TL;DR: A systematic retrospect and summary of the optimization methods from the perspective of machine learning can be found in this article, which can offer guidance for both developments of optimization and machine learning research.
573
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Yuejie Chi,Yue Lu,Yuxin Chen +2 more
TL;DR: This tutorial-style overview highlights the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees and reviews two contrasting approaches: two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and global landscape analysis and initialization-free algorithms.
542
Non-convex Optimization for Machine Learning
Prateek Jain,Purushottam Kar +1 more
TL;DR: Non-convex optimization as discussed by the authors is a generalization of the convex optimization problem, and it has been widely used in machine learning applications, such as deep learning and reinforcement learning.
516
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou,Li Shen,Zequn Jie,Weizhong Zhang,Wei Liu +4 more
- 15 Jun 2019
TL;DR: In this paper, an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, was proposed to guarantee the global convergence of generic Adam/RMSProp for solving large-scale non-convex stochastic optimization.
•Posted Content
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
TL;DR: This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.
References
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Least squares quantization in PCM
TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.
Matrix Factorization Techniques for Recommender Systems
TL;DR: As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.
Robust Face Recognition via Sparse Representation
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Related Papers (5)
Christopher Zach,Marc Pollefeys +1 more
- 05 Sep 2010
Oliver Kramer,David Echeverría Ciaurri,Slawomir Koziel +2 more
- 01 Jan 2011