Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques

doi:10.1109/ACCESS.2020.2986809

Open AccessJournal Article10.1109/ACCESS.2020.2986809

Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques

Ramin Ghorbani, +1 more

- 13 Apr 2020

- IEEE Access

- Vol. 8, pp 67899-67911

214

TL;DR: This paper attempts to compare various resampling techniques to handle the imbalanced data problem while predicting students’ performance using two different datasets, and the Random Forest classifier has achieved the best result among all other models while using SVM-SMOTE as a resamplings method.

Abstract: In today's world, due to the advancement of technology, predicting the students' performance is among the most beneficial and essential research topics. Data Mining is extremely helpful in the field of education, especially for analyzing students' performance. It is a fact that predicting the students' performance has become a severe challenge because of the imbalanced datasets in this field, and there is not any comparison among different resampling methods. This paper attempts to compare various resampling techniques such as Borderline SMOTE, Random Over Sampler, SMOTE, SMOTE-ENN, SVM-SMOTE, and SMOTE-Tomek to handle the imbalanced data problem while predicting students' performance using two different datasets. Moreover, the difference between multiclass and binary classification, and structures of the features are examined. To be able to check the performance of the resampling methods better in solving the imbalanced problem, this paper uses various machine learning classifiers including Random Forest, K-Nearest-Neighbor, Artificial Neural Network, XG-boost, Support Vector Machine (Radial Basis Function), Decision Tree, Logistic Regression, and Naive Bayes. Furthermore, the Random hold-out and Shuffle 5-fold cross-validation methods are used as model validation techniques. The achieved results using different evaluation metrics indicate that fewer numbers of classes and nominal features will lead models to better performance. Also, classifiers do not perform well with imbalanced data, so solving this problem is necessary. The performance of classifiers is improved using balanced datasets. Additionally, the results of the Friedman test, which is a statistical significance test, confirm that the SVM-SMOTE is more efficient than the other resampling methods. Moreover, The Random Forest classifier has achieved the best result among all other models while using SVM-SMOTE as a resampling method.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.3390/EDUCSCI11090552

A Systematic Literature Review of Student’ Performance Prediction Using Machine Learning Techniques

Balqis Albreiki, +2 more

- 16 Sep 2021

- Education Sciences

TL;DR: The review results indicated that various Machine Learning techniques are used to understand and overcome the underlying challenges; predicting students at risk and students drop out prediction and improving the students’ performance.

...read moreread less

236

•Journal Article•10.5194/ACP-21-9475-2021

Separating emission and meteorological contributions to long-term PM 2.5 trends over eastern China during 2000–2018

Qingyang Xiao, +8 more

- 23 Jun 2021

- Atmospheric Chemistry and Physics

TL;DR: In this article, a combination of a machine learning model, statistical method, and chemical transport model was used to quantify the meteorological impacts on PM 2.5 pollution during 2000-2018.

...read moreread less

157

•Journal Article•10.1109/ACCESS.2021.3119596

Prediction of Students’ Academic Performance Based on Courses’ Grades Using Deep Neural Networks

Aya Nabil, +2 more

- 01 Jan 2021

- IEEE Access

TL;DR: In this paper, a dataset collected from a public 4-year university was used to develop predictive models to predict students' academic performance of upcoming courses given their grades in the previous courses of the first academic year using a deep neural network.

...read moreread less

113

•Journal Article•10.26599/bdma.2021.9020028

A mini-review of machine learning in big data analytics: Applications, challenges, and prospects

Isaac Kofi Nti, +3 more

- 01 Jun 2022

- Big data mining and analytics

TL;DR: In this article , a comprehensive mini-literature review of ML in Big Data Analytics (BDA) using a keyword search was presented, where a total of 1512 published articles were screened to 140 based on the proposed novel taxonomy.

...read moreread less

101

•Journal Article•10.1016/j.engappai.2022.105150

A comprehensive comparison among metaheuristics (MHs) for geohazard modeling using machine learning: Insights from a case study of landslide displacement prediction

Jun Wei Ma, +6 more

- 01 Sep 2022

- Engineering Applications of Artificial I...

TL;DR: In this article , a systematic framework combining k-fold cross-validation (CV), metaheuristics (MHs), support vector regression (SVR), and Friedman and Nemenyi tests was proposed to improve the reliability and performance of geohazard modeling.

...read moreread less

98

...

Expand

References

Journal Article•10.2307/2532419

Applied Logistic Regression.

A. J. Scott, +2 more

- 01 Dec 1991

- Biometrics

TL;DR: Applied Logistic Regression, Third Edition provides an easily accessible introduction to the logistic regression model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables.

...read moreread less

40.1K

•Journal Article•10.1613/JAIR.953

SMOTE: synthetic minority over-sampling technique

Nitesh V. Chawla, +3 more

- 01 Jan 2002

- Journal of Artificial Intelligence Resea...

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

27.7K

•Journal Article

Statistical Comparisons of Classifiers over Multiple Data Sets

Janez Demšar

- 01 Dec 2006

- Journal of Machine Learning Research

TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.

...read moreread less

12.5K

•Journal Article•10.1613/JAIR.953

SMOTE: Synthetic Minority Over-sampling Technique

Nitesh V. Chawla, +3 more

- 09 Jun 2011

- arXiv: Artificial Intelligence

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

11.5K

•Journal Article•10.1023/A:1018628609742

Least Squares Support Vector Machine Classifiers

Johan A. K. Suykens, +1 more

- 01 Jun 1999

- Neural Processing Letters

TL;DR: A least squares version for support vector machine (SVM) classifiers that follows from solving a set of linear equations, instead of quadratic programming for classical SVM's.

...read moreread less

10.3K