Multiple Optimized Ensemble Learning for High-Dimensional Imbalanced Credit Scoring Datasets
03 Apr 2023
TL;DR: In this paper , a novel multiple-optimized ensemble learning (MOEL) is proposed to build a reliable and accurate credit scoring model, which first generates multiple diverse optimized subsets from various weighted random forests (WRFs), and from each subset more effective and relevant features are selected.
read more
Abstract: Abstract Banks determine the financial credibility or the credit score of the applicants before allocating loans to them. In recent decades, several machine learning algorithms have been developed to automate the decision-making process by constructing an effective credit scoring models. However, the high-dimensional and imbalanced credit datasets significantly degrade the models' classification ability. In this study to overcome these issues, a novel multiple-optimized ensemble learning (MOEL) is proposed to build a reliable and accurate credit scoring model. MOEL, first generates multiple diverse optimized subsets from various weighted random forests (WRFs), and from each subset more effective and relevant features are selected. A new evaluation measure is then applied to each subset to determine which subsets are more effectively optimized for the ensemble learning process. The subsets are then applied to a novel oversampling strategy to provide balanced subsets for the base classifier, which lessens the detrimental effects of imbalanced datasets. Finally, to further improve the performance of the base classifier, a stacking-based ensemble method is applied to the balanced subsets. Six credit-scoring datasets were used to evaluate the model's efficacy using the F1 score and G-mean metrics. The empirical results on these datasets demonstrate that MOEL achieves the best value of F1_score and G-mean with a mean ranking of 1.5 and 1.333, respectively.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
SMOTE: synthetic minority over-sampling technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
ADASYN: Adaptive synthetic sampling approach for imbalanced learning
Haibo He,Yang Bai,E.A. Garcia,Shutao Li +3 more
- 01 Jun 2008
TL;DR: Simulation analyses on several machine learning data sets show the effectiveness of the ADASYN sampling approach across five evaluation metrics.
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
Hui Han,Wenyuan Wang,Binghuan Mao +2 more
- 23 Aug 2005
TL;DR: Two new minority over-sampling methods are presented, borderline- SMOTE1 and borderline-SMOTE2, in which only the minority examples near the borderline are over- Sampling, which achieve better TP rate and F-value than SMOTE and random over-Sampling methods.
A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches
Mikel Galar,Alberto Fernández,Edurne Barrenechea,Humberto Bustince,Francisco Herrera +4 more
- 01 Jul 2012
TL;DR: A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.
2.7K
Exploratory Undersampling for Class-Imbalance Learning
Xu-Ying Liu,Jianxin Wu,Zhi-Hua Zhou +2 more
- 01 Apr 2009
TL;DR: Experiments show that the proposed algorithms, BalanceCascade and EasyEnsemble, have better AUC scores than many existing class-imbalance learning methods and have approximately the same training time as that of under-sampling, which trains significantly faster than other methods.
2.3K