Non-convex Optimization for Machine Learning

doi:10.1561/2200000058

Open AccessJournal Article10.1561/2200000058

Non-convex Optimization for Machine Learning

Prateek Jain, +1 more

- 21 Dec 2017

- arXiv: Machine Learning

512

TL;DR: Non-convex optimization as discussed by the authors is a generalization of the convex optimization problem, and it has been widely used in machine learning applications, such as deep learning and reinforcement learning.

Abstract: A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization. On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques - popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties. This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. The monograph will lead the reader through several widely used non-convex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/TCYB.2019.2950779

A Survey of Optimization Methods From a Machine Learning Perspective

Shiliang Sun, +3 more

- 01 Aug 2020

- IEEE Transactions on Systems, Man, and C...

TL;DR: A systematic retrospect and summary of the optimization methods from the perspective of machine learning can be found in this article, which can offer guidance for both developments of optimization and machine learning research.

...read moreread less

573

•Journal Article•10.1109/TSP.2019.2937282

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Yuejie Chi, +2 more

- 15 Oct 2019

- IEEE Transactions on Signal Processing

TL;DR: This tutorial-style overview highlights the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees and reviews two contrasting approaches: two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and global landscape analysis and initialization-free algorithms.

...read moreread less

542

•Journal Article•10.1561/2200000058

Non-convex Optimization for Machine Learning

Prateek Jain, +1 more

- 21 Dec 2017

- arXiv: Machine Learning

TL;DR: Non-convex optimization as discussed by the authors is a generalization of the convex optimization problem, and it has been widely used in machine learning applications, such as deep learning and reinforcement learning.

...read moreread less

516

•Proceedings Article•10.1109/CVPR.2019.01138

A Sufficient Condition for Convergences of Adam and RMSProp

Fangyu Zou, +4 more

- 15 Jun 2019

TL;DR: In this paper, an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, was proposed to guarantee the global convergence of generic Adam/RMSProp for solving large-scale non-convex stochastic optimization.

...read moreread less

390

•Posted Content

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

Cong Fang, +3 more

- 04 Jul 2018

- arXiv: Optimization and Control

TL;DR: This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.

...read moreread less

363

...

Expand

References

Journal Article•10.1111/J.2517-6161.1977.TB01600.X

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977

- Journal of the royal statistical society...

55.2K

Journal Article•10.1145/358669.358692

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Martin A. Fischler, +1 more

- 01 Jun 1981

- Communications of The ACM

TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.

...read moreread less

27.9K

•Journal Article•10.1109/TIT.1982.1056489

Least squares quantization in PCM

S. P. Lloyd

- 01 Mar 1982

- IEEE Transactions on Information Theory

TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.

...read moreread less

16K

Journal Article•10.1109/MC.2009.263

Matrix Factorization Techniques for Recommender Systems

Yehuda Koren, +2 more

- 01 Aug 2009

- IEEE Computer

TL;DR: As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.

...read moreread less

12.5K

•Journal Article•10.1109/TPAMI.2008.79

Robust Face Recognition via Sparse Representation

John Wright, +4 more

- 01 Feb 2009

- IEEE Transactions on Pattern Analysis an...

TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.

...read moreread less

10.5K

...

Expand

Non-convex Optimization for Machine Learning

Chat with Paper

AI Agents for this Paper

Citations

A Survey of Optimization Methods From a Machine Learning Perspective

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Non-convex Optimization for Machine Learning

A Sufficient Condition for Convergences of Adam and RMSProp

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

References

Maximum likelihood from incomplete data via the EM algorithm

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Least squares quantization in PCM

Matrix Factorization Techniques for Recommender Systems

Robust Face Recognition via Sparse Representation

Related Papers (5)

Practical methods for convex multi-view reconstruction

Introductory Lectures on Convex Optimization: A Basic Course

Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization

Practical Applications in Constrained Evolutionary Multi-objective Optimization

Derivative-Free Optimization