Open AccessJournal Article
Handling Missing Values when Applying Classification Models
TL;DR: A method for analyzing various components in a natural gas pipeline with the aid of a computer controlled gas chromatograph and the amount of components in the natural gas stream.
read more
Abstract: Much work has studied the effect of different treatments of missing values on model induction, but little work has analyzed treatments for the common case of missing values at prediction time. This paper first compares several different methods---predictive value imputation, the distribution-based imputation used by C4.5, and using reduced models---for applying classification trees to instances with missing values (and also shows evidence that the results generalize to bagged trees and to logistic regression). The results show that for the two most popular treatments, each is preferable under different conditions. Strikingly the reduced-models approach, seldom mentioned or used, consistently outperforms the other two methods, sometimes by a large margin. The lack of attention to reduced modeling may be due in part to its (perceived) expense in terms of computation or storage. Therefore, we then introduce and evaluate alternative, hybrid approaches that allow users to balance between more accurate but computationally expensive reduced modeling and the other, less accurate but less computationally expensive treatments. The results show that the hybrid methods can scale gracefully to the amount of investment in computation/storage, and that they outperform imputation even for small investments.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
Applied Predictive Modeling
Max Kuhn,Kjell Johnson +1 more
- 17 May 2013
TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.
5.9K
To Explain or to Predict
TL;DR: The distinction between explanatory and predictive models is discussed in this paper, and the practical implications of the distinction to each step in the model- ing process are discussed as well as a discussion of the differences that arise in the process of modeling for an explanatory ver- sus a predictive goal.
To Explain or to Predict
TL;DR: The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.
1.7K
•Posted Content
Stealing Machine Learning Models via Prediction APIs
TL;DR: In this article, the authors investigate model extraction attacks in ML-as-a-service (ML-aaS) systems and show that an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i.e., "steal") the model.
1.4K
•Proceedings Article
Stealing machine learning models via prediction APIs
Florian Tramèr,Fan Zhang,Ari Juels,Michael K. Reiter,Thomas Ristenpart +4 more
- 10 Aug 2016
TL;DR: In this paper, the authors investigate model extraction attacks in ML-as-a-service (ML-aaS) systems and show that an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i.e., "steal") the model.
References
•Book
The Elements of Statistical Learning
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 01 Jan 2001
29.4K
•Book
C4.5: Programs for Machine Learning
J. Ross Quinlan
- 15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
27.2K
•Book
Multiple imputation for nonresponse in surveys
Donald B. Rubin
- 01 Jan 1987
TL;DR: In this article, a survey of drinking behavior among men of retirement age was conducted and the results showed that the majority of the participants reported that they did not receive any benefits from the Social Security Administration.
18.8K