Improving classification performance through selective instance completion
TL;DR: A principled framework is proposed which motivates a generally applicable yet efficient meta-technique for choosing k instances to query from a much larger universe of N incomplete instances so as to learn the most accurate classifier.
read more
Abstract: In multiple domains, actively acquiring missing input information at a reasonable cost in order to improve our understanding of the input---output relationships is of increasing importance. This problem has gained prominence in healthcare, public policy making, education, and in the targeted advertising industry which tries to best match people to products. In this paper we tackle an important variant of this problem: Instance completion, where we want to choose the best k incomplete instances to query from a much larger universe of N ($$\gg $$?k) incomplete instances so as to learn the most accurate classifier. We propose a principled framework which motivates a generally applicable yet efficient meta-technique for choosing k such instances. Since we cannot know a priori the classifier that will result from the completed dataset, i.e. the final classifier, our method chooses the k instances based on a derived upper bound on the expectation of the distance between the next classifier and the final classifier. We additionally derive a sufficient condition for these two solutions to match. We then empirically evaluate the performance of our method relative to the state-of-the-art methods on four UCI datasets as well as three proprietary e-commerce datasets used in previous studies. In these experiments, we also demonstrate how close we are likely to be to the optimal solution, by quantifying the extent to which our sufficient condition is satisfied. Lastly, we show that our method is easily extensible to the setting where we have a non-uniform cost associated with acquiring the missing information.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Active Feature Acquisition with Supervised Matrix Completion
TL;DR: This paper tries to train an effective classification model with least acquisition cost by jointly performing active feature querying and supervised matrix completion and a bi-objective optimization method is presented for cost-aware active selection when features bear different acquisition costs.
38
ASCENT: Active Supervision for Semi-Supervised Learning
TL;DR: A novel active learning method that makes embeddings of labeled examples to those of unlabeled ones and back via deep neural networks, which considers both the informativeness and representativeness of examples, as well as being robust to the noise labels is proposed.
25
Exploratory machine learning with unknown unknowns
Peng Zhao,Sanli Jin,Yujie Zhang,Zhi‐Hua Zhou +3 more
TL;DR: Exploratory machine learning discovers hidden classes in training data by actively augmenting the feature space.
5
Batch mode active learning via adaptive criteria weights
TL;DR: A novel active learning method, abbreviated ACW, which dynamically combines the example selection criteria together to select critical examples for semi-supervised classification and is the first attempt to explore adaptive criteria weights in the context of active learning.
3
References
Statistical learning theory
Vladimir Vapnik
- 01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
30.4K
•Book
Data Mining: Concepts and Techniques
Jiawei Han,Micheline Kamber,Jian Pei +2 more
- 08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
•Book
The Elements of Statistical Learning
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 01 Jan 2001
29.4K
The WEKA data mining software: an update
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
The Elements of Statistical Learning
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
15.5K