A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches

doi:10.1109/TSMCC.2011.2161285

Journal Article10.1109/TSMCC.2011.2161285

A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches

Mikel Galar, +4 more

- 01 Jul 2012

- Vol. 42, Iss: 4, pp 463-484

2.6K

TL;DR: A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.

Abstract: Classifier learning with data-sets that suffer from imbalanced class distributions is a challenging problem in data mining community. This issue occurs when the number of examples that represent one class is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers. In machine learning, the ensemble of classifiers are known to increase the accuracy of single classifiers by combining several of them, but neither of these learning techniques alone solve the class imbalance problem, to deal with this issue the ensemble learning algorithms have to be designed specifically. In this paper, our aim is to review the state of the art on ensemble techniques in the framework of imbalanced data-sets, with focus on two-class problems. We propose a taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based. In addition, we develop a thorough empirical comparison by the consideration of the most significant published approaches, within the families of the taxonomy proposed, to show whether any of them makes a difference. This comparison has shown the good behavior of the simplest approaches which combine random undersampling techniques with bagging or boosting ensembles. In addition, the positive synergy between sampling techniques and bagging has stood out. Furthermore, our results show empirically that ensemble-based algorithms are worthwhile since they outperform the mere use of preprocessing techniques before learning the classifier, therefore justifying the increase of complexity by means of a significant enhancement of the results.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1016/J.ENGAPPAI.2020.103500

Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem

Javad Hamidzadeh, +2 more

- 01 Apr 2020

- Engineering Applications of Artificial I...

TL;DR: A new instance reduction method is introduced that preserves between-class distributions in the balanced data and handles minority class instance reduction in two-class imbalanced data, efficiently and outperforms state-of-the-art methods in terms of classification accuracy, Gmean, reduction rates, and computational time.

...read moreread less

23

Journal Article•10.1007/S11280-019-00711-Y

Adaptive knowledge subgraph ensemble for robust and trustworthy knowledge graph completion

Guojia Wan, +3 more

- 01 Jan 2020

- World Wide Web

TL;DR: Experimental results show that the robustness of the ensemble framework outperforms exiting knowledge graph embedding approaches on manually injected noise as well as inherent noisy extracted KGs.

...read moreread less

23

Journal Article•10.1007/S12559-014-9256-1

A Kernel Clustering-Based Possibilistic Fuzzy Extreme Learning Machine for Class Imbalance Learning

Shixiong Xia, +3 more

- 01 Feb 2015

- Cognitive Computation

TL;DR: A kernel possibilistic fuzzy c-means clustering-based ELM algorithm for class imbalance learning (CIL) is developed to handle the class imbalance problem in the presence of outliers and noises and its performance is compared with some typical CIL methods.

...read moreread less

23

•Journal Article•10.3389/FONC.2020.00490

Electron Density and Biologically Effective Dose (BED) Radiomics-Based Machine Learning Models to Predict Late Radiation-Induced Subcutaneous Fibrosis

Michele Avanzo, +12 more

- 21 Apr 2020

- Frontiers in Oncology

TL;DR: Textures extracted from 3D Biologically Effective Dose and 3D-RED in the breast and PTV can predict late RIF and may help better select patient candidates to exclusive PBI.

...read moreread less

23

Book Chapter•10.1007/978-3-319-98842-9_4

Patch Before Exploited: An Approach to Identify Targeted Software Vulnerabilities

Mohammed Almukaynizi, +5 more

- 01 Jan 2019

TL;DR: In this chapter, an exploit prediction model is presented, which predicts whether a vulnerability will likely be exploited, and is proven to be much more robust than adversarial examples—postings authored by adversaries in the attempt to induce the model to produce incorrect predictions.

...read moreread less

23

...

Expand

References

•Journal Article•10.1613/JAIR.953

SMOTE: synthetic minority over-sampling technique

Nitesh V. Chawla, +3 more

- 01 Jan 2002

- Journal of Artificial Intelligence Resea...

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

27.7K

•Book

C4.5: Programs for Machine Learning

J. Ross Quinlan

- 15 Oct 1992

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

27.2K

•Journal Article•10.2307/4615733

A Simple Sequentially Rejective Multiple Test Procedure

Sture Holm

- 01 Jan 1979

- Scandinavian Journal of Statistics

TL;DR: In this paper, a simple and widely accepted multiple test procedure of the sequentially rejective type is presented, i.e. hypotheses are rejected one at a time until no further rejections can be done.

...read moreread less

23.4K

•Journal Article•10.1006/JCSS.1997.1504

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Yoav Freund, +1 more

- 01 Aug 1997

TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.

...read moreread less

18.6K

•Journal Article•10.1023/A:1018054314350

Bagging predictors

Leo Breiman

- 01 Aug 1996

TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.

...read moreread less

16.6K

...

Expand

A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches

Chat with Paper

AI Agents for this Paper

Citations

Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem

Adaptive knowledge subgraph ensemble for robust and trustworthy knowledge graph completion

A Kernel Clustering-Based Possibilistic Fuzzy Extreme Learning Machine for Class Imbalance Learning

Electron Density and Biologically Effective Dose (BED) Radiomics-Based Machine Learning Models to Predict Late Radiation-Induced Subcutaneous Fibrosis

Patch Before Exploited: An Approach to Identify Targeted Software Vulnerabilities

References

SMOTE: synthetic minority over-sampling technique

C4.5: Programs for Machine Learning

A Simple Sequentially Rejective Multiple Test Procedure

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Bagging predictors

Related Papers (5)

SMOTE: synthetic minority over-sampling technique

Learning from Imbalanced Data

A study of the behavior of several methods for balancing machine learning training data

Bagging predictors

Random Forests