From average case complexity to improper learning complexity
Amit Daniely,Nati Linial,Shai Shalev-Shwartz +2 more
- 31 May 2014
- pp 441-448
TL;DR: In this article, the authors introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average, which is a generalization of Feige's assumption about the complexity of refuting random constraint satisfaction problems.
read more
Abstract: The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are effficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of improper learning (a.k.a. representation independent learning). The difficulty in proving lower bounds for improper learning is that the standard reductions from NP-hard problems do not seem to apply in this context. There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in [21] and relies on cryptographic assumptions. We introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average. We put forward a (fairly strong) generalization of Feige's assumption [13] about the complexity of refuting random constraint satisfaction problems. Combining this assumption with our new technique yields far reaching implications. In particular, • Learning DNF's is hard. • Agnostically learning halfspaces with a constant approximation ratio is hard. • Learning an intersection of ω(1) halfspaces is hard.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Proceedings Article
In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
Behnam Neyshabur,Ryota Tomioka,Nathan Srebro +2 more
- 20 Dec 2014
TL;DR: In this paper, Sipser et al. showed that without any regularization, even with zero training error (and zero approximation error), increasing the number of hidden units reduces estimation error.
563
Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures
Ilias Diakonikolas,Daniel M. Kane,Alistair Stewart +2 more
- 01 Oct 2017
TL;DR: In particular, this paper showed that the complexity of learning a Gaussian mixture model is exponential in the dimension of the latent space, and showed that statistical query algorithms can be implemented in polynomial time.
282
•Proceedings Article
Globally optimal gradient descent for a ConvNet with Gaussian inputs
Alon Brutzkus,Amir Globerson +1 more
- 06 Aug 2017
TL;DR: This work provides the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations, and shows that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time.
•Posted Content
Geometry of Optimization and Implicit Regularization in Deep Learning.
TL;DR: This work argues that the optimization plays a crucial role in generalization of deep learning models through implicit regularization, and demonstrates how changing the empirical optimization procedure can improve generalization, even if actual optimization quality is not affected.
150
•Proceedings Article
Noisy Tensor Completion via the Sum-of-Squares Hierarchy
Boaz Barak,Ankur Moitra +1 more
- 06 Jun 2016
TL;DR: The main technical result is in characterizing the Rademacher complexity of the sequence of norms that arise in the sum-of-squares relaxations to the tensor nuclear norm.
References
Statistical learning theory
Vladimir Vapnik
- 01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
30.4K
•Book
Neural networks for pattern recognition
Christopher M. Bishop
- 01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Neural Networks for Pattern Recognition
Suresh Kothari,Heekuck Oh +1 more
TL;DR: The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue.
14.5K
The perceptron: a probabilistic model for information storage and organization in the brain.
TL;DR: This article will be concerned primarily with the second and third questions, which are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory.
10.6K
•Book
The perception: a probabilistic model for information storage and organization in the brain
F. Rosenblatt
- 01 Jan 1988
TL;DR: The second and third questions are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory as mentioned in this paper.
9.3K
Related Papers (5)
Leslie G. Valiant
- 05 Nov 1984
Amit Daniely
- 19 Jun 2016
Benny Applebaum,Boaz Barak,Avi Wigderson +2 more
- 05 Jun 2010