A Hybrid Sampling SVM Approach to Imbalanced Data Classification
TL;DR: A hybrid sampling SVM approach is proposed combining an oversampled technique and an undersampling technique for addressing the imbalanced data classification problem and generates a balanced training dataset to replace the original imbalanced training dataset.
read more
Abstract: Imbalanced datasets are frequently found in many real applications. Resampling is one of the effective solutions due to generating a relatively balanced class distribution. In this paper, a hybrid sampling SVM approach is proposed combining an oversampling technique and an undersampling technique for addressing the imbalanced data classification problem. The proposed approach first uses an undersampling technique to delete some samples of the majority class with less classification information and then applies an oversampling technique to gradually create some new positive samples. Thus, a balanced training dataset is generated to replace the original imbalanced training dataset. Finally, through experimental results on the real-world datasets, our proposed approach has the ability to identify informative samples and deal with the imbalanced data classification problem.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Handling data irregularities in classification: Foundations, trends, and future challenges
TL;DR: This article provides a bird's eye view of data irregularities, beginning with a taxonomy and characterization of various distribution-based and feature-based irregularities, and discusses the notable and recent approaches that have been taken to make the existing stand-alone as well as ensemble classifiers robust against such irregularities.
198
Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs
Shounak Datta,Swagatam Das +1 more
TL;DR: A Near-Bayesian Support Vector Machine (NBSVM) is proposed for such imbalanced classification problems, by combining the philosophies of decision boundary shift and unequal regularization costs.
164
Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise
Prabhjot Kaur,Anjana Gosain +1 more
- 01 Jan 2018
TL;DR: This paper compares the oversampling and undersampling approaches of class imbalance learning in noisy environment and tries to find out which is the better approach in such case.
136
A robust fuzzy least squares twin support vector machine for class imbalance learning
TL;DR: This paper proposes a robust fuzzy least squares twin support vector machine for class imbalance learning termed as RFLSTSVM-CIL using 2-norm of the slack variables which makes the optimization problem strongly convex.
76
A Data-Mining Approach to Identification of Risk Factors in Safety Management Systems
TL;DR: A data-mining approach to incident risk factor identification and analysis using data from the Aviation Safety Reporting System is presented, in an attempt to overcome obstacles related to labor intensive manual identification of risk factors as well as incomplete data.
70
References
•Book
The Nature of Statistical Learning Theory
Vladimir Vapnik
- 01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
46K
SMOTE: synthetic minority over-sampling technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
A training algorithm for optimal margin classifiers
Bernhard E. Boser,Isabelle Guyon,Vladimir Vapnik +2 more
- 01 Jul 1992
TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
SMOTE: Synthetic Minority Over-sampling Technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Learning from Imbalanced Data
Haibo He,E.A. Garcia +1 more
TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.
8.2K