Journal Article10.1109/TKDE.2012.232
MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
1.1K
TL;DR: A new method, called Majority Weighted Minority Oversampling TEchnique (MWMOTE), is presented for efficiently handling imbalanced learning problems and is better than or comparable with some other existing methods in terms of various assessment metrics.
read more
Abstract: Imbalanced learning problems contain an unequal distribution of data samples among different classes and pose a challenge to any classifier as it becomes hard to learn the minority class samples. Synthetic oversampling methods address this problem by generating the synthetic minority class samples to balance the distribution between the samples of the majority and minority classes. This paper identifies that most of the existing oversampling methods may generate the wrong synthetic minority samples in some scenarios and make learning tasks harder. To this end, a new method, called Majority Weighted Minority Oversampling TEchnique (MWMOTE), is presented for efficiently handling imbalanced learning problems. MWMOTE first identifies the hard-to-learn informative minority class samples and assigns them weights according to their euclidean distance from the nearest majority class samples. It then generates the synthetic samples from the weighted informative minority class samples using a clustering approach. This is done in such a way that all the generated samples lie inside some minority class cluster. MWMOTE has been evaluated extensively on four artificial and 20 real-world data sets. The simulation results show that our method is better than or comparable with some other existing methods in terms of various assessment metrics, such as geometric mean (G-mean) and area under the receiver operating curve (ROC), usually known as area under curve (AUC).
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An efficient novel approach for iris recognition based on stylometric features and machine learning techniques
Saša Adamović,Vladislav Miškovic,Nemanja Macek,Milan Milosavljević,Marko Šarac,Muzafer Saračević,Milan Gnjatović,Milan Gnjatović +7 more
TL;DR: A novel iris recognition system based on machine learning methods to reach virtually perfect classification accuracy, eliminate false acceptance rates, and cancel the possibility of recreating an iris image from a generated template.
Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique
Gayeong Eom,Haewon Byeon +1 more
TL;DR: The study explored optimal oversampling techniques for imbalanced data using generative adversarial networks (GANs) and synthetic minority over-sampling technique (SMOTE) on a medical dataset. CGAN and CTGAN showed better classification performance than traditional oversampling techniques.
A Hybrid Cluster-Borderline SMOTE Method for Imbalanced Data of Rock Groutability Classification
Li Kai,Ren Bingyu,Guan Tao,Wang Jiajun,Yu Jia,Wang Kexiang,Huang Jicun +6 more
TL;DR: A hybrid cluster-borderline SMOTE method (HCBS) is proposed to address imbalanced data in rock groutability classification, improving classification precision by reducing redundant samples and noise labels, and outperforming existing methods with optimized random forest and grey wolf optimization.
Synthetic Sampling Approach Based on Model-Based Clustering for Imbalanced Data
Shaukat Ali Shahee,Usha Ananthakumar +1 more
TL;DR: This paper proposes a synthetic sampling approach using model-based clustering to address both between-class and within-class imbalance in datasets, improving classifier performance by identifying and oversampling sub-clusters with varying example counts.
Imbalanced fault diagnosis based on sample-weighted counterfactual
Wei Zheng,Chunfei Gu,Hao Pan,Xin Fan,Sixiang Fu,Xiaoheng Ji +5 more
References
SMOTE: synthetic minority over-sampling technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
•Book
C4.5: Programs for Machine Learning
J. Ross Quinlan
- 15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
27.2K
An introduction to ROC analysis
TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
21.3K
Induction of Decision Trees
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Yoav Freund,Robert E. Schapire +1 more
- 01 Aug 1997
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.