An automatic sampling ratio detection method based on genetic algorithm for imbalanced data classification

doi:10.1016/J.KNOSYS.2021.106800

Journal Article10.1016/J.KNOSYS.2021.106800

An automatic sampling ratio detection method based on genetic algorithm for imbalanced data classification

Ming Zheng, +7 more

- 15 Mar 2021

- Knowledge Based Systems

- Vol. 216, pp 106800

24

TL;DR: In this article, the authors proposed three algorithms to automatically determine the sampling ratios for oversampling, undersampling and hybrid sampling methods, based on a genetic algorithm, to obtain satisfactory and stable classification performance.

Abstract: Imbalanced data are a common phenomenon in both theoretical research and real-world applications. At a data level, standard classification algorithms cannot effectively learn and make predictions from imbalanced data, and this problem is generally solved by using oversampling, undersampling, or hybrid sampling methods. However, most of the current sampling methods use random sampling ratios, and the resulting classification performance can be undesirable and unstable. To obtain satisfactory and stable classification performance, we proposed three algorithms to automatically determine the sampling ratios for oversampling, undersampling, and hybrid sampling methods, based on a genetic algorithm. Experiments were performed to test the algorithms’ effectiveness by utilizing five widely used standard classification algorithms on 14 different imbalanced datasets using two oversampling, two undersampling, and four hybrid sampling methods. The statistical test results showed that for all five standard classification algorithms, sampling methods that used our proposed algorithms achieved the best classification results. Using area under the receiver operating characteristic curve (AUC) as the evaluation metric, it was demonstrated that the proposed algorithms for automatically determining the sampling ratio outperformed the random sampling ratio.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.3390/app13064006

Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks

Geoffrey John McLachlan

- 21 Mar 2023

- Applied Sciences

TL;DR: In this article , the authors investigated the effectiveness of methods based on deep neural networks and convolutional neural networks mixed with a variety of well-known imbalanced data solutions meaning oversampling and undersampling.

...read moreread less

49

Journal Article•10.1016/j.knosys.2022.109050

A novel Random Forest integrated model for imbalanced data classification problem

Qinghua Xu, +3 more

- 01 May 2022

- Knowledge Based Systems

TL;DR: In this paper , the authors proposed an equilibrium ensemble method (DCI-ISSA) with two novel techniques to conquer the shortcomings of the over-sampling strategy, which increased the training complexity and caused an overfitting problem.

...read moreread less

38

•Journal Article•10.1016/j.knosys.2021.107687

Synthetic sampling from small datasets: A modified mega-trend diffusion approach using <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="d1e1807" altimg="si73.svg"><mml:mi>k</mml:mi></mml:math>-nearest neighbors

Ozgur Mete

- 01 Jan 2022

- Knowledge Based Systems

TL;DR: In this paper , a modified Mega-Trend Diffusion (MTD) approach, kNNMTD, is proposed to address the data generation for only one of the tasks: supervised or unsupervised.

...read moreread less

18

•Journal Article•10.1007/S00500-021-06103-7

Hybrid intelligent model for classifying chest X-ray images of COVID-19 patients using genetic algorithm and neutrosophic logic.

Sameh H. Basha, +4 more

- 18 Aug 2021

TL;DR: In this article, the authors proposed a neurotrophic model to diagnose COVID-19 patients based on their chest X-ray images, which can be used for real-time automatic early recognition of COVID19.

...read moreread less

17

•Journal Article•10.32604/cmc.2022.025865

Imbalanced Classification in Diabetics Using Ensembled Machine Learning

Mechiri Sandeep Kumar, +5 more

- 01 Jan 2022

TL;DR: In this paper , an embedded-based machine learning model that combines the split-vote method and instance duplication to leverage an imbalanced dataset called PIMA Indian to increase the prediction of diabetics was proposed.

...read moreread less

10

...

Expand

References

•Journal Article•10.1023/A:1010933404324

Random Forests

Leo Breiman

- 01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

113.1K

Journal Article•10.1145/1961189.1961199

LIBSVM: A library for support vector machines

Chih-Chung Chang, +1 more

- 06 May 2011

- ACM Transactions on Intelligent Systems ...

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

46.3K

Journal Article•10.1109/4235.996017

A fast and elitist multiobjective genetic algorithm: NSGA-II

Kalyanmoy Deb, +3 more

- 01 Apr 2002

- IEEE Transactions on Evolutionary Comput...

TL;DR: This paper suggests a non-dominated sorting-based MOEA, called NSGA-II (Non-dominated Sorting Genetic Algorithm II), which alleviates all of the above three difficulties, and modify the definition of dominance in order to solve constrained multi-objective problems efficiently.

...read moreread less

44.7K

•Book

Adaptation in natural and artificial systems

John H. Holland

- 01 Jan 1975

TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.

...read moreread less

40.3K