Multiple Optimized Ensemble Learning for High-Dimensional Imbalanced Credit Scoring Datasets

doi:10.21203/rs.3.rs-2757867/v1

Open AccessPosted Content10.21203/rs.3.rs-2757867/v1

Multiple Optimized Ensemble Learning for High-Dimensional Imbalanced Credit Scoring Datasets

03 Apr 2023

TL;DR: In this paper , a novel multiple-optimized ensemble learning (MOEL) is proposed to build a reliable and accurate credit scoring model, which first generates multiple diverse optimized subsets from various weighted random forests (WRFs), and from each subset more effective and relevant features are selected.

Abstract: Abstract Banks determine the financial credibility or the credit score of the applicants before allocating loans to them. In recent decades, several machine learning algorithms have been developed to automate the decision-making process by constructing an effective credit scoring models. However, the high-dimensional and imbalanced credit datasets significantly degrade the models' classification ability. In this study to overcome these issues, a novel multiple-optimized ensemble learning (MOEL) is proposed to build a reliable and accurate credit scoring model. MOEL, first generates multiple diverse optimized subsets from various weighted random forests (WRFs), and from each subset more effective and relevant features are selected. A new evaluation measure is then applied to each subset to determine which subsets are more effectively optimized for the ensemble learning process. The subsets are then applied to a novel oversampling strategy to provide balanced subsets for the base classifier, which lessens the detrimental effects of imbalanced datasets. Finally, to further improve the performance of the base classifier, a stacking-based ensemble method is applied to the balanced subsets. Six credit-scoring datasets were used to evaluate the model's efficacy using the F1 score and G-mean metrics. The empirical results on these datasets demonstrate that MOEL achieves the best value of F1_score and G-mean with a mean ranking of 1.5 and 1.333, respectively.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

•Journal Article•10.1613/JAIR.953

SMOTE: synthetic minority over-sampling technique

Nitesh V. Chawla, +3 more

- 01 Jan 2002

- Journal of Artificial Intelligence Resea...

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

27.7K

•Proceedings Article•10.1109/IJCNN.2008.4633969

ADASYN: Adaptive synthetic sampling approach for imbalanced learning

Haibo He, +3 more

- 01 Jun 2008

TL;DR: Simulation analyses on several machine learning data sets show the effectiveness of the ADASYN sampling approach across five evaluation metrics.

...read moreread less

4.3K

Book Chapter•10.1007/11538059_91

Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

Hui Han, +2 more

- 23 Aug 2005

TL;DR: Two new minority over-sampling methods are presented, borderline- SMOTE1 and borderline-SMOTE2, in which only the minority examples near the borderline are over- Sampling, which achieve better TP rate and F-value than SMOTE and random over-Sampling methods.

...read moreread less

4.1K

Journal Article•10.1109/TSMCC.2011.2161285

A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches

Mikel Galar, +4 more

- 01 Jul 2012

TL;DR: A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.

...read moreread less

2.7K

Journal Article•10.1109/TSMCB.2008.2007853

Exploratory Undersampling for Class-Imbalance Learning

Xu-Ying Liu, +2 more

- 01 Apr 2009

TL;DR: Experiments show that the proposed algorithms, BalanceCascade and EasyEnsemble, have better AUC scores than many existing class-imbalance learning methods and have approximately the same training time as that of under-sampling, which trains significantly faster than other methods.

...read moreread less

2.3K