Journal Article10.1007/S10489-006-0032-0
Semi-parametric optimization for missing data imputation
TL;DR: This paper proposes a new and efficient imputation method for a kind of missing data: semi-parametric data, and demonstrates that it is much better than existing deterministic semi- parametric regression imputation in efficiency and effectiveness.
read more
Abstract: Missing data imputation is an important issue in machine learning and data mining. In this paper, we propose a new and efficient imputation method for a kind of missing data: semi-parametric data. Our imputation method aims at making an optimal evaluation about Root Mean Square Error (RMSE), distribution function and quantile after missing-data are imputed. We evaluate our approaches using both simulated data and real data experimentally, and demonstrate that our stochastic semi-parametric regression imputation is much better than existing deterministic semi-parametric regression imputation in efficiency and effectiveness.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Statistical Analysis with Missing Data
TL;DR: Theory of Inference Based on the Likelihood Function.PART I: OVERVIEW AND BASIC APPROACHES.Preface.Subject Index.Author Index.
7K
Efficient kNN Classification With Different Numbers of Nearest Neighbors
TL;DR: An improvement version of kTree method is proposed, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods.
1.2K
Learning k for kNN Classification
TL;DR: Experimental results showed that the proposed Correlation Matrix kNN (CM-kNN) classification was more accurate and efficient than existing kNN methods in data-mining applications, such as classification, regression, and missing data imputation.
592
A survey on missing data in machine learning.
Tlamelo Emmanuel,Thabiso M. Maupong,Dimane Mpoeleng,Thabo Semong,Banyatsang Mphago,Oteng Tabona +5 more
TL;DR: This paper aggregates some of the literature on missing data particularly focusing on machine learning techniques, and gives insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for.
References
•Book
Data Mining: Concepts and Techniques
Jiawei Han,Micheline Kamber,Jian Pei +2 more
- 08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
•Book
C4.5: Programs for Machine Learning
J. Ross Quinlan
- 15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
27.2K
•Book
Statistical Analysis with Missing Data
Roderick J. A. Little,Donald B. Rubin +1 more
- 01 Jan 1987
TL;DR: This work states that maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse and large-Sample Inference Based on Maximum Likelihood Estimates is likely to be high.
18.3K
Density estimation for statistics and data analysis
Bernard W. Silverman
- 01 Jan 1986
TL;DR: The Kernel Method for Multivariate Data: Three Important Methods and Density Estimation in Action.
Density Estimation for Statistics and Data Analysis
TL;DR: Density estimation, as discussed in this book, is the construction of an estimate of the density function from the observed data from an unknown probability density function.
14.7K