Active Property Testing
Maria-Florina Balcan,Eric Blais,Avrim Blum,Liu Yang +3 more
- 20 Oct 2012
- Vol. 2012, pp 21-30
TL;DR: For example, the authors showed that testing unions of d intervals can be done with O(1) label requests in our setting, whereas it is known that requiring Omega(d) labeled examples for learning (and Omega(sqrt{d}) for passive testing [KR00] where the algorithm must pay for every example drawn from D).
read more
Abstract: One motivation for property testing of boolean functions is the idea that testing can provide a fast preprocessing step before learning. However, in most machine learning applications, it is not possible to request for labels of arbitrary examples constructed by an algorithm. Instead, the dominant query paradigm in applied machine learning, called *active learning*, is one where the algorithm may query for labels, but only on points in a given (polynomial-sized) unlabeled sample, drawn from some underlying distribution D. In this work, we bring this well-studied model to the domain of testing. We develop both general results for this *active testing* model as well as efficient testing algorithms for several important properties for learning, demonstrating that testing can still yield substantial benefits in this restricted setting. For example, we show that testing unions of d intervals can be done with O(1) label requests in our setting, whereas it is known to require Omega(d) labeled examples for learning (and Omega(sqrt{d}) for passive testing [KR00] where the algorithm must pay for every example drawn from D). In fact, our results for testing unions of intervals also yield improvements on prior work in both the classic query model (where any point in the domain can be queried) and the passive testing model as well. For the problem of testing linear separators in R^n over the Gaussian distribution, we show that both active and passive testing can be done with O(sqrt{n}) queries, substantially less than the Omega(n) needed for learning, with near-matching lower bounds. We also present a general combination result in this model for building testable properties out of others, which we then use to provide testers for a number of assumptions used in semi-supervised learning. In addition to the above results, we also develop a general notion of the *testing dimension* of a given property with respect to a given distribution, that we show characterizes (up to constant factors) the intrinsic number of label requests needed to test that property. We develop such notions for both the active and passive testing models. We then use these dimensions to prove a number of lower bounds, including for linear separators and the class of dictator functions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
Introduction to Property Testing
Oded Goldreich
- 01 Nov 2017
TL;DR: In this article, a wide range of algorithmic techniques for the design and analysis of tests for algebraic properties, properties of Boolean functions, graph properties, and properties of distributions are presented.
440
Active Property Testing
Maria-Florina Balcan,Eric Blais,Avrim Blum,Liu Yang +3 more
- 20 Oct 2012
TL;DR: For example, the authors showed that testing unions of d intervals can be done with O(1) label requests in our setting, whereas it is known that requiring Omega(d) labeled examples for learning (and Omega(sqrt{d}) for passive testing [KR00] where the algorithm must pay for every example drawn from D).
On Sample-Based Testers
Oded Goldreich,Dana Ron +1 more
TL;DR: This work advances the study of sample-based property testers by providing several general positive results as well as by revealing relations between variants of this testing model, and shows that certain types of query-based testers yield sample- based testers of sublinear sample complexity.
49
Erasure-Resilient Property Testing
TL;DR: This work begins a study of property testers that are resilient to the presence of adversarially erased function values and identifies an $\alpha$-erasure-resilient $\var...$ that is resistant to being erased by an adversary.
Testing surface area
Pravesh K. Kothari,Amir Nayyeri,Ryan O'Donnell,Chenggang Wu +3 more
- 05 Jan 2014
TL;DR: The surface area of an unknown n-dimensional set F given membership oracle access is considered, and the algorithm completely evades the "curse of dimensionality": for any n and any κ > 4/π a 1.27, the "approximation factor" of the testing algorithm.
30
References
Statistical learning theory
Vladimir Vapnik
- 01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
30.4K
Nonlinear dimensionality reduction by locally linear embedding.
Sam T. Roweis,Lawrence K. Saul +1 more
TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
A global geometric framework for nonlinear dimensionality reduction.
TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities
TL;DR: This chapter reproduces the English translation by B. Seckler of the paper by Vapnik and Chervonenkis in which they gave proofs for the innovative results they had obtained in a draft form in July 1966 and announced in 1968 in their note in Soviet Mathematics Doklady.
4.3K
Semi-Supervised Learning
Olivier Chapelle,Bernhard Schlkopf,Alexander Zien +2 more
- 31 Mar 2010
TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).