Journal Article10.4018/ijdsst.292446
Missing Data Imputation – A Survey
6
TL;DR: In this article , a comprehensive review of the approaches to tackle the missing data problem is discussed with a comprehensive discussion on the effectiveness of three imputation methods namely, imputation based on Multiple Linear Regression (MLR), Predictive Mean Matching (PMM), and Classification And Regression Tree (CART) in the context of subspace clustering.
read more
Abstract: Many real world datasets may contain missing values for various reasons. These incomplete datasets can pose severe issues to the underlying machine learning algorithms and decision support systems. It may result in high computational cost, skewed output and invalid deductions. Various solutions exist to mitigate this issue; the most popular strategy is to estimate the missing values by applying inferential techniques such as linear regression, decision trees or Bayesian inference. In this paper, the missing data problem is discussed in detail with a comprehensive review of the approaches to tackle it. The paper concludes with a discussion on the effectiveness of three imputation methods namely, imputation based on Multiple Linear Regression (MLR), Predictive Mean Matching (PMM) and Classification And Regression Tree (CART) in the context of subspace clustering. The experimental results obtained on real benchmark datasets and high-dimensional synthetic datasets highlight that, MLR based imputation method is more efficient on high-dimensional incomplete datasets.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Noise-Aware Multiple Imputation Algorithm for Missing Data
Fangfan Li,Hui Sun,Yu Gu,Ge Yu +3 more
TL;DR: Wang et al. as discussed by the authors proposed a noise-aware missing data multiple imputation algorithm NPMI in static data, and the method to determine the imputation order of multivariables missing is given.
A Comprehensive Bibliometric Analysis of Missing Value imputation
Heru Nugroho,Kridanto Surendro +1 more
TL;DR: To systematically explore various aspects of missing data imputation, a conceptual framework was used to uncover potential research directions and underlying themes and a thematic map serves as a valuable tool for providing a comprehensive understanding.
2
A Novel Algorithm for Imputing the Missing Values in Incomplete Datasets
16 Jun 2022
TL;DR: In this article , a splitting-based IMV-RE algorithm is proposed to estimate missing values within a dataset, where an upper limit is set for every class containing missing values that assist the algorithm to predict the missing values more accurately.
A novel algorithm for imputing the missing values in incomplete datasets
Hutashan Vishal Bhagat,Manminder Singh +1 more
TL;DR: A new algorithm, known as the IMV-RE (imputing the missing values in real-time environment) algorithm, which is based on a novel approach and outperforms existing techniques in terms of sensitivity to accuracy, root mean square error (RMSE), and coefficient of determination ( R ^2).
Improving Effort Estimation Accuracy in Software Development Projects Using Multiple Imputation Techniques for Missing Data Handling
S. Hayat,Wajahat Akbar,Tariq Hussain,Muhammad Inam Ul Haq,Altaf Hussian,Irshad Khalil,Muhammad Nawaz Khan,S. Diana +7 more
- 12 Nov 2024
TL;DR: This study improves effort estimation accuracy in software development projects by employing Multiple Imputation (MI) to handle missing data, enhancing the Analogy-Based Effort Estimation (ABEE) model's performance and providing more accurate and efficient outcomes.
References
•Book
Classification and regression trees
Leo Breiman
- 01 Jan 1983
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
22.7K
•Book
Multiple imputation for nonresponse in surveys
Donald B. Rubin
- 01 Jan 1987
TL;DR: In this article, a survey of drinking behavior among men of retirement age was conducted and the results showed that the majority of the participants reported that they did not receive any benefits from the Social Security Administration.
18.8K
Inference and missing data
TL;DR: In this article, it was shown that ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are missing at random and the observed data are observed at random, and then such inferences are generally conditional on the observed pattern of missing data.
10K
Multiple Imputation by Chained Equations: What is it and how does it work?
TL;DR: This paper provides an introduction to the MICE method with a focus on practical aspects and challenges in using this method.
3K
How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory
TL;DR: It is recommended that researchers using MI should perform many more imputations than previously considered sufficient, based on γ, and take into consideration one’s tolerance for a preventable power falloff due to using too few imputations.
Related Papers (5)
Marcell Nagy,Roland Molontay +1 more
- 21 Jun 2018
J H Conklin,William T. Scherer +1 more
- 01 May 2003