Journal Article10.1016/J.KNOSYS.2021.106800
An automatic sampling ratio detection method based on genetic algorithm for imbalanced data classification
24
TL;DR: In this article, the authors proposed three algorithms to automatically determine the sampling ratios for oversampling, undersampling and hybrid sampling methods, based on a genetic algorithm, to obtain satisfactory and stable classification performance.
read more
Abstract: Imbalanced data are a common phenomenon in both theoretical research and real-world applications. At a data level, standard classification algorithms cannot effectively learn and make predictions from imbalanced data, and this problem is generally solved by using oversampling, undersampling, or hybrid sampling methods. However, most of the current sampling methods use random sampling ratios, and the resulting classification performance can be undesirable and unstable. To obtain satisfactory and stable classification performance, we proposed three algorithms to automatically determine the sampling ratios for oversampling, undersampling, and hybrid sampling methods, based on a genetic algorithm. Experiments were performed to test the algorithms’ effectiveness by utilizing five widely used standard classification algorithms on 14 different imbalanced datasets using two oversampling, two undersampling, and four hybrid sampling methods. The statistical test results showed that for all five standard classification algorithms, sampling methods that used our proposed algorithms achieved the best classification results. Using area under the receiver operating characteristic curve (AUC) as the evaluation metric, it was demonstrated that the proposed algorithms for automatically determining the sampling ratio outperformed the random sampling ratio.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks
TL;DR: In this article , the authors investigated the effectiveness of methods based on deep neural networks and convolutional neural networks mixed with a variety of well-known imbalanced data solutions meaning oversampling and undersampling.
A novel Random Forest integrated model for imbalanced data classification problem
TL;DR: In this paper , the authors proposed an equilibrium ensemble method (DCI-ISSA) with two novel techniques to conquer the shortcomings of the over-sampling strategy, which increased the training complexity and caused an overfitting problem.
38
Synthetic sampling from small datasets: A modified mega-trend diffusion approach using <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="d1e1807" altimg="si73.svg"><mml:mi>k</mml:mi></mml:math>-nearest neighbors
TL;DR: In this paper , a modified Mega-Trend Diffusion (MTD) approach, kNNMTD, is proposed to address the data generation for only one of the tasks: supervised or unsupervised.
18
Hybrid intelligent model for classifying chest X-ray images of COVID-19 patients using genetic algorithm and neutrosophic logic.
Sameh H. Basha,Ahmed M. Anter,Ahmed M. Anter,Aboul Ella Hassanien,Areeg Abdalla +4 more
- 18 Aug 2021
TL;DR: In this article, the authors proposed a neurotrophic model to diagnose COVID-19 patients based on their chest X-ray images, which can be used for real-time automatic early recognition of COVID19.
Imbalanced Classification in Diabetics Using Ensembled Machine Learning
Mechiri Sandeep Kumar,Mohammad Zubair Khan,Sukumar Rajendran,Ayman Noor,A. Stephen Dass,J Prabhu +5 more
- 01 Jan 2022
TL;DR: In this paper , an embedded-based machine learning model that combines the split-vote method and instance duplication to leverage an imbalanced dataset called PIMA Indian to increase the prediction of diabetics was proposed.
10
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
•Journal Article
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +15 more
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
A fast and elitist multiobjective genetic algorithm: NSGA-II
TL;DR: This paper suggests a non-dominated sorting-based MOEA, called NSGA-II (Non-dominated Sorting Genetic Algorithm II), which alleviates all of the above three difficulties, and modify the definition of dominance in order to solve constrained multi-objective problems efficiently.
•Book
Adaptation in natural and artificial systems
John H. Holland
- 01 Jan 1975
TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.