Improving classification performance through selective instance completion

doi:10.1007/S10994-015-5500-5

Open AccessJournal Article10.1007/S10994-015-5500-5

Improving classification performance through selective instance completion

Amit Dhurandhar, +1 more

- 01 Sep 2015

- Machine Learning

- Vol. 100, Iss: 2, pp 425-447

15

TL;DR: A principled framework is proposed which motivates a generally applicable yet efficient meta-technique for choosing k instances to query from a much larger universe of N incomplete instances so as to learn the most accurate classifier.

Abstract: In multiple domains, actively acquiring missing input information at a reasonable cost in order to improve our understanding of the input---output relationships is of increasing importance. This problem has gained prominence in healthcare, public policy making, education, and in the targeted advertising industry which tries to best match people to products. In this paper we tackle an important variant of this problem: Instance completion, where we want to choose the best k incomplete instances to query from a much larger universe of N ($$\gg $$?k) incomplete instances so as to learn the most accurate classifier. We propose a principled framework which motivates a generally applicable yet efficient meta-technique for choosing k such instances. Since we cannot know a priori the classifier that will result from the completed dataset, i.e. the final classifier, our method chooses the k instances based on a derived upper bound on the expectation of the distance between the next classifier and the final classifier. We additionally derive a sufficient condition for these two solutions to match. We then empirically evaluate the performance of our method relative to the state-of-the-art methods on four UCI datasets as well as three proprietary e-commerce datasets used in previous studies. In these experiments, we also demonstrate how close we are likely to be to the optimal solution, by quantifying the extent to which our sufficient condition is satisfied. Lastly, we show that our method is easily extensible to the setting where we have a non-uniform cost associated with acquiring the missing information.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Data Mining - Concepts and Techniques.

Petra Perner

- 01 Jan 2002

14.6K

•Posted Content

Active Feature Acquisition with Supervised Matrix Completion

Sheng-Jun Huang, +5 more

- 15 Feb 2018

- arXiv: Learning

TL;DR: This paper tries to train an effective classification model with least acquisition cost by jointly performing active feature querying and supervised matrix completion and a bi-objective optimization method is presented for cost-aware active selection when features bear different acquisition costs.

...read moreread less

38

Journal Article•10.1109/TKDE.2019.2897307

ASCENT: Active Supervision for Semi-Supervised Learning

Yanchao Li, +5 more

- 01 May 2020

- IEEE Transactions on Knowledge and Data ...

TL;DR: A novel active learning method that makes embeddings of labeled examples to those of unlabeled ones and back via deep neural networks, which considers both the informativeness and representativeness of examples, as well as being robust to the noise labels is proposed.

...read moreread less

25

Journal Article•10.1016/j.artint.2023.104059

Exploratory machine learning with unknown unknowns

Peng Zhao, +3 more

- 01 Feb 2024

- Artificial Intelligence

TL;DR: Exploratory machine learning discovers hidden classes in training data by actively augmenting the feature space.

...read moreread less

5

Journal Article•10.1007/S10489-020-01953-4

Batch mode active learning via adaptive criteria weights

Hao Li, +5 more

- 01 Jun 2021

- Applied Intelligence

TL;DR: A novel active learning method, abbreviated ACW, which dynamically combines the example selection criteria together to select critical examples for semi-supervised classification and is the first attempt to explore adaptive criteria weights in the context of active learning.

...read moreread less

3

References

Statistical learning theory

Vladimir Vapnik

- 01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

30.4K

•Book

Data Mining: Concepts and Techniques

Jiawei Han, +2 more

- 08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

29.9K

•Book

The Elements of Statistical Learning

Trevor Hastie, +2 more

- 01 Jan 2001

29.4K

Journal Article•10.1145/1656274.1656278

The WEKA data mining software: an update

Mark Hall, +5 more

- 16 Nov 2009

- Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

21.2K

Journal Article•10.1198/TECH.2003.S770

The Elements of Statistical Learning

Eric R. Ziegel

- 01 Aug 2003

- Technometrics

TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.

...read moreread less

15.5K