Open Access
Relating Data Compression and Learnability
Nick Littlestone,Manfred K. Warmuth +1 more
- 01 Jan 2003
322
TL;DR: It is demonstrated that the existence of a suitable data compression scheme is sufficient to ensure learnability and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning.
read more
Abstract: We explore the learnability of two-valued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts as an hypothesis for the function to be learned. The second algorithm must be able to reconstruct the correct function values when given a point of the original sample. We demonstrate that the existence of a suitable data compression scheme is sufficient to ensure learnability. We express the probability that the hypothesis predicts the function correctly on a random sample point as a function of the sample and kernel sizes. No assumptions are made on the probability distributions according to which the sample points are generated. This approach provides an alternative to that of [BEHW86], which uses the Vapnik-Chervonenkis dimension to classify learnable geometric concepts. Our bounds are derived directly from the kernel size of the algorithms rather than from the Vapnik-Chervonenkis dimension of the hypothesis class. The proofs are simpler and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
Understanding Machine Learning: From Theory To Algorithms
Shai Shalev-Shwartz,Shai Ben-David +1 more
- 01 Jan 2015
TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.
Real and Complex Analysis. By W. Rudin. Pp. 412. 84s. 1966. (McGraw-Hill, New York.)
TL;DR: In this paper, the Riesz representation theorem is used to describe the regularity properties of Borel measures and their relation to the Radon-Nikodym theorem of continuous functions.
3.5K
•Journal Article
Fast Kernel Classifiers with Online and Active Learning
TL;DR: This contribution presents an online SVM algorithm based on the premise that active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.
•Book
Boosting: Foundations and Algorithms
Robert E. Schapire,Yoav Freund +1 more
- 18 May 2012
TL;DR: This book begins with a general introduction to machine learning algorithms and their analysis; then explores the core theory of boosting, especially its ability to generalize; examines some of the myriad other theoretical viewpoints that help to explain and understand boosting; provides practical extensions of boosting for more complex learning problems; and finally presents a number of advanced theoretical topics.
Structural risk minimization over data-dependent hierarchies
TL;DR: A result is presented that allows one to trade off errors on the training sample against improved generalization performance, and a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets.
638
References
•Book
Real and complex analysis
Walter Rudin
- 01 Jan 1966
TL;DR: In this paper, the Riesz representation theorem is used to describe the regularity properties of Borel measures and their relation to the Radon-Nikodym theorem of continuous functions.
A theory of the learnable
Leslie G. Valiant
- 05 Nov 1984
TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.
On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities
TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.
4.3K
Real and Complex Analysis. By W. Rudin. Pp. 412. 84s. 1966. (McGraw-Hill, New York.)
TL;DR: In this paper, the Riesz representation theorem is used to describe the regularity properties of Borel measures and their relation to the Radon-Nikodym theorem of continuous functions.
3.5K
Approximation algorithms for combinatorial problems
TL;DR: For the problem of finding the maximum clique in a graph, no algorithm has been found for which the ratio does not grow at least as fast as n^@e, where n is the problem size and @e>0 depends on the algorithm.
2.5K
Related Papers (5)
Vladimir Vapnik
- 01 Jan 1998
Leslie G. Valiant
- 05 Nov 1984
Shai Shalev-Shwartz,Shai Ben-David +1 more
- 01 Jan 2015