Journal Article10.1109/TSMCC.2011.2161285
A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches
Mikel Galar,Alberto Fernández,Edurne Barrenechea,Humberto Bustince,Francisco Herrera +4 more
- 01 Jul 2012
- Vol. 42, Iss: 4, pp 463-484
2.6K
TL;DR: A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.
read more
Abstract: Classifier learning with data-sets that suffer from imbalanced class distributions is a challenging problem in data mining community. This issue occurs when the number of examples that represent one class is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers. In machine learning, the ensemble of classifiers are known to increase the accuracy of single classifiers by combining several of them, but neither of these learning techniques alone solve the class imbalance problem, to deal with this issue the ensemble learning algorithms have to be designed specifically. In this paper, our aim is to review the state of the art on ensemble techniques in the framework of imbalanced data-sets, with focus on two-class problems. We propose a taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based. In addition, we develop a thorough empirical comparison by the consideration of the most significant published approaches, within the families of the taxonomy proposed, to show whether any of them makes a difference. This comparison has shown the good behavior of the simplest approaches which combine random undersampling techniques with bagging or boosting ensembles. In addition, the positive synergy between sampling techniques and bagging has stood out. Furthermore, our results show empirically that ensemble-based algorithms are worthwhile since they outperform the mere use of preprocessing techniques before learning the classifier, therefore justifying the increase of complexity by means of a significant enhancement of the results.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Predictive Models with Resampling: A Comparative Study of Machine Learning Algorithms and their Performances on Handling Imbalanced Datasets
Adithi Deborah Chakravarthy,Sindhura Bonthu,Zhengxin Chen,Qiuming Zhu +3 more
- 01 Dec 2019
TL;DR: The results of this study show that the effectiveness of these resampled techniques is a multivariate function relative to both the learning algorithms and the resampling ratios, as well as the coherent characteristics of datasets.
16
•Posted Content
Annealing Genetic GAN for Minority Oversampling
TL;DR: This work renovates the training of GANs as an evolutionary process that incorporates the mechanism of simulated annealing, and proposes an Annealing Genetic GAN method, which aims to reproduce the distributions closest to the ones of the minority classes using only limited data samples.
Adaptive Condensed Nearest Neighbor for Imbalance Data Classification
TL;DR: This paper addresses the issues faced by kNN by developing Adaptive-Condensed NN (Ada-CNN), the Ada-CNN classifier utilizes the distribution and density of test point's neighborhood and learn an appropriate point-explicit k by using artificial neural systems.
Machine learning models perform better than traditional empirical models for stomatal conductance when applied to multiple tree species across different forest biomes
Alta Saunders,David M. Drew,Willie Brink +2 more
- 01 Dec 2021
TL;DR: In this paper, various machine learning (ML) models were able to capture stomatal responses of multiple tree species, including a random forest model with an R2 of 75 % compared to the empirical Ball-Berry Stomatal conductance model (BWB) (R2 = 41 %).
16
Recognizing Induced Emotions With Only One Feature: A Novel Color Histogram-Based System
TL;DR: This study shows that the HSV color space is better suited than the RGB color space for REVC systems, and proposes a new optimization algorithm called Optimizing Parameters of Ensemble RUSboosted Tree (OPERT) to boost the performance of the REVC system.
References
SMOTE: synthetic minority over-sampling technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
•Book
C4.5: Programs for Machine Learning
J. Ross Quinlan
- 15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
27.2K
A Simple Sequentially Rejective Multiple Test Procedure
TL;DR: In this paper, a simple and widely accepted multiple test procedure of the sequentially rejective type is presented, i.e. hypotheses are rejected one at a time until no further rejections can be done.
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Yoav Freund,Robert E. Schapire +1 more
- 01 Aug 1997
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Bagging predictors
Leo Breiman
- 01 Aug 1996
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.