Open AccessProceedings Article
Imputation of missing data using machine learning techniques
Kamakshi Lakshminarayan,Steven A. Harp,Robert P. Goldman,Tariq Samad +3 more
- 02 Aug 1996
- pp 140-145
TL;DR: It is argued that the choice between unsupervised and supervised classification techniques should be influenced by the motivation for solving the missing data problem, and potential applications for the procedures developed are discussed.
read more
Abstract: A serious problem in mining industrial data bases is that they are often incomplete, and a significant amount of data is missing, or erroneously entered This paper explores the use of machine-learning based alternatives to standard statistical data completion (data imputation) methods, for dealing with missing data We have approached the data completion problem using two well-known machine learning techniques The first is an unsupervised clustering strategy which uses a Bayesian approach to cluster the data into classes The classes so obtained are then used to predict multiple choices for the attribute of interest The second technique involves modeling missing variables by supervised induction of a decision tree-based classifier This predicts the most likely value for the attribute of interest Empirical tests using extracts from industrial databases maintained by Honeywell customers have been done in order to compare the two techniques These tests show both approaches are useful and have advantages and disadvantages We argue that the choice between unsupervised and supervised classification techniques should be influenced by the motivation for solving the missing data problem, and discuss potential applications for the procedures we are developing
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Statistical Analysis with Missing Data
TL;DR: Theory of Inference Based on the Likelihood Function.PART I: OVERVIEW AND BASIC APPROACHES.Preface.Subject Index.Author Index.
7K
Missing data imputation with adversarially-trained graph convolutional networks
TL;DR: A more general framework for MDI, leveraging recent work in the field of graph neural networks (GNNs), is proposed in terms of a graph denoising autoencoder, where each edge of the graph encodes the similarity between two patterns.
154
Missing value imputation based on data clustering
Shichao Zhang,Jilian Zhang,Xiaofeng Zhu,Yongsong Qin,Chengqi Zhang +4 more
- 01 Jan 2008
TL;DR: An efficient nonparametric missing value imputation method based on clustering, called CMI (Clustering-based Missing value Imputation), for dealing with missing values in target attributes.
123
Data-centric Artificial Intelligence: A Survey
TL;DR: Data-centric AI as mentioned in this paper provides a comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle, and equip the readers with the techniques and further research ideas to systematically engineer data for building AI systems.
Missing data imputation by utilizing information within incomplete instances
TL;DR: The utilization of information within incomplete instances is of benefit to easily capture the distribution of a dataset, and the NIIA method outperforms the existing methods in accuracy, and this advantage is clearly highlighted when datasets have a high missing ratio.
97
References
•Book
C4.5: Programs for Machine Learning
J. Ross Quinlan
- 15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
27.2K
•Book
Statistical Analysis with Missing Data
Roderick J. A. Little,Donald B. Rubin +1 more
- 01 Jan 1987
TL;DR: This work states that maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse and large-Sample Inference Based on Maximum Likelihood Estimates is likely to be high.
18.3K
Statistical Analysis with Missing Data
TL;DR: Theory of Inference Based on the Likelihood Function.PART I: OVERVIEW AND BASIC APPROACHES.Preface.Subject Index.Author Index.
7K
Statistical Analysis With Missing Data
TL;DR: In this article, a statistical analysis with missing data is presented, where the authors use missing data for statistical analysis of missing data in the absence of data sets, such as data augmentation.
4K
Related Papers (5)
Roderick J. A. Little,Donald B. Rubin +1 more
- 01 Jan 1987
J. Ross Quinlan
- 15 Oct 1992
Donald B. Rubin
- 01 Jan 1987
Jiawei Han,Micheline Kamber,Jian Pei +2 more
- 08 Sep 2000