Book Chapter10.1007/978-3-030-29407-6_17
Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning
Shivani Tyagi,Sangeeta Mittal +1 more
- 01 Jan 2020
- pp 209-221
109
TL;DR: It has been observed that adaptive synthetic oversampling approach can best improve the imbalance ratio as well as classification results, however, undersampling approaches gave better overall performance on all datasets.
read more
Abstract: Real-world datasets in many domains like medical, intrusion detection, fraud transactions and bioinformatics are highly imbalanced. In classification problems, imbalanced datasets negatively affect the accuracy of class predictions. This skewness can be handled either by oversampling minority class examples or by undersampling majority class. In this work, popular methods of both categories have been evaluated for their capability of improving the imbalanced ratio of five highly imbalanced datasets from different application domains. Effect of balancing on classification results has been also investigated. It has been observed that adaptive synthetic oversampling approach can best improve the imbalance ratio as well as classification results. However, undersampling approaches gave better overall performance on all datasets.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Deep Learning based Vulnerability Detection: Are We There Yet?
TL;DR: In this article, the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario, and they find that their performance drops by more than 50%.
I-SiamIDS: an improved Siam-IDS for handling class imbalance in network-based intrusion detection systems
TL;DR: Improved Siam-IDS (I-SiamIDS) as discussed by the authors uses an ensemble of binary eXtreme Gradient Boosting (b-XGBoost), Siamese Neural Network (Siamese-NN) and deep neural network (DNN) for handling class imbalance problem.
119
A broad review on class imbalance learning techniques
Salim Rezvani,Xizhao Wang +1 more
TL;DR: In this article , a review of existing methods to deal with issues linked with class imbalance learning is presented, and a taxonomy for class imbalanced learning techniques is proposed and classified into three parts: (1) Data pre-processing, (2) Algorithmic structures, and (3) Hybrid techniques.
101
Using Variational Auto Encoding in Credit Card Fraud Detection
TL;DR: Experimental results suggest that the VAE-based oversampling method can be effectively applied to imbalanced classification problems and performs better than synthetic minority oversamplings techniques and traditional deep neural network methods.
I-SiamIDS: an improved Siam-IDS for handling class imbalance in network-based intrusion detection systems
TL;DR: This paper proposes an algorithm-level approach called Improved Siam-IDS (I-SiamIDS), which is a two-layer ensemble for handling class imbalance problem and showed significant improvement in terms of Accuracy, Recall, Precision, F1-score and values of Area Under the Curve (AUC) for both NSL-KDD and CIDDS-001 datasets.
76
References
ADASYN: Adaptive synthetic sampling approach for imbalanced learning
Haibo He,Yang Bai,E.A. Garcia,Shutao Li +3 more
- 01 Jun 2008
TL;DR: Simulation analyses on several machine learning data sets show the effectiveness of the ADASYN sampling approach across five evaluation metrics.
•Proceedings Article
Addressing the Curse of Imbalanced Training Sets: One-Sided Selection.
Miroslav Kubat,Stan Matwin +1 more
- 01 Jan 1997
TL;DR: Criteria to evaluate the utility of clas-siiers induced from such imbalanced training sets are discussed, explanation of the poor behavior of some learners under these circumstances is given, and a simple technique called one-sided selection of examples is suggested.
2.6K
Exploratory Undersampling for Class-Imbalance Learning
Xu-Ying Liu,Jianxin Wu,Zhi-Hua Zhou +2 more
- 01 Apr 2009
TL;DR: Experiments show that the proposed algorithms, BalanceCascade and EasyEnsemble, have better AUC scores than many existing class-imbalance learning methods and have approximately the same training time as that of under-sampling, which trains significantly faster than other methods.
2.3K
Asymptotic Properties of Nearest Neighbor Rules Using Edited Data
Dennis L. Wilson
- 01 Jul 1972
TL;DR: The convergence properties of a nearest neighbor rule that uses an editing procedure to reduce the number of preclassified samples and to improve the performance of the rule are developed.
2.1K