Proceedings Article10.1145/2857546.2857643
A New Under-Sampling Method Using Genetic Algorithm for Imbalanced Data Classification
Jihyun Ha,Jongmin Lee +1 more
- 04 Jan 2016
- pp 95
57
TL;DR: The proposed GAUS (genetic algorithm based under-sampling) tries to maximize the performance of a prototype classifier such that the prototypes minimize the loss between distributions of original and undersampled majority objects.
read more
Abstract: The class imbalance problem is frequently found in many real-world domains, where many of traditional classifiers often fail to detect minority class objects due to paying less attention to those. In an effort to address this class imbalance problem, a new under-sampling technique GAUS (genetic algorithm based under-sampling) is proposed in this paper. GAUS is designed to overcome several limitations of existing methods such as performance instability and information loss of data distribution. To select informative majority objects, GAUS tries to maximize the performance of a prototype classifier such that the prototypes minimize the loss between distributions of original and undersampled majority objects. We confirmed the effectiveness of the proposed GAUS based on real-world datasets.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Learning from class-imbalanced data
TL;DR: An in depth review of rare event detection from an imbalanced learning perspective and a comprehensive taxonomy of the existing application domains of im balanced learning are provided.
2K
Learning from class-imbalanced data: Review of methods and applications
Haixiang Guo,Yijing Li,Jennifer Shang,Mingyun Gu,Yuanyue Huang,Gong Bing +5 more
TL;DR: An in depth review of rare event detection from an imbalanced learning perspective and a comprehensive taxonomy of the existing application domains of im balanced learning are provided.
932
Handling data irregularities in classification: Foundations, trends, and future challenges
TL;DR: This article provides a bird's eye view of data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities, and discusses the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities.
198
Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
TL;DR: The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance.
A Classification Method Based on Feature Selection for Imbalanced Data
TL;DR: An ensemble classification method that combines evolutionary under-sampling and feature selection of resampled data to construct an ensemble system that has a better classification performance compared with other algorithms, especially for the high-dimensional imbalanced data.
References
Learning from Imbalanced Data
Haibo He,E.A. Garcia +1 more
TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.
8.2K
The use of the area under the ROC curve in the evaluation of machine learning algorithms
TL;DR: AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities.
7K
A study of the behavior of several methods for balancing machine learning training data
TL;DR: This work performs a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets, and shows that, in general, over-sampling methods provide more accurate results than under-sampled methods considering the area under the ROC curve (AUC).
Learning From Imbalanced Data
Lincy Meera Mathews,Seetha Hari +1 more
- 01 Jan 2018
TL;DR: This chapter aims to highlight the existence of imbalance in all real world data and the need to focus on the inherent characteristics present in imbalanced data that can degrade the performance of classifiers.
2.7K
A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches
Mikel Galar,Alberto Fernández,Edurne Barrenechea,Humberto Bustince,Francisco Herrera +4 more
- 01 Jul 2012
TL;DR: A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.
2.7K