Anomaly detection using unsupervised machine learning algorithms: A simulation study

doi:10.1016/j.sciaf.2024.e02386

Journal Article10.1016/j.sciaf.2024.e02386

Anomaly detection using unsupervised machine learning algorithms: A simulation study

Edmund Fosu Agyemang

- 01 Sep 2024

- Scientific African

- pp e02386-e02386

7

About: This article is published in Scientific African. The article was published on 01 Sep 2024. The article focuses on the topics: Anomaly detection & Computer science.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.62411/jcta.11638

Outlier Detection Using Gaussian Mixture Model Clustering to Optimize XGBoost for Credit Approval Prediction

De Rosal Ignatius Moses Setiadi, +5 more

- 01 Nov 2024

- Journal of Computing Theories and Applic...

TL;DR: This study proposes a credit approval prediction method combining Gaussian Mixture Model (GMM) outlier detection with XGBoost, achieving 95.493% accuracy, 91.650% recall, and 95.145% AUC, outperforming other models on an imbalanced dataset.

...read moreread less

2

Journal Article•10.1016/j.compbiomed.2024.109367

A Gaussian Process Regression and Wavelet Transform Time Series approaches to modeling Influenza A

Edmund Fosu Agyemang

- 16 Nov 2024

- Computers in Biology and Medicine

1

Journal Article•10.1186/s40537-025-01248-w

Scalable unsupervised labeling with SHAP feature selection for fraud detection in imbalanced data

Mary Anne Walauskis, +1 more

- 22 Oct 2025

- Journal of Big Data

Abstract: Abstract There is a growing need for labeled data, yet manual annotation is costly, error-prone, and often infeasible in privacy-sensitive, highly imbalanced domains such as fraud detection. We introduce a fully unsupervised framework that combines unsupervised SHapley Additive exPlanations (SHAP) feature selection with our novel unsupervised labeling method. We apply unsupervised SHAP to the Kaggle Credit Card Fraud Detection and Medicare Part D datasets to produce high-impact feature subsets, and then label the datasets with our unsupervised labeling approach. To effectively evaluate the labels generated by our novel methodology, we apply a baseline unsupervised learner, Isolation Forest (IF), to both the original datasets and their subsets. We calculate Matthew’s Correlation Coefficient (MCC), Jaccard Index (JI), Precision, Recall, and F1-score by comparing our generated labels against the ground truth labels. It is important to note, the ground truth labels were used solely for evaluation. Our empirical results surpass the results obtained with the full feature dataset and baseline. By improving label quality while reducing computational complexity and preserving privacy, our approach offers a practical solution for learning from unlabeled, severely imbalanced data.

...read moreread less

Journal Article•10.1109/jsen.2025.3607873

A Robust Anomaly Detection Framework in Industrial Internet of Things

Rubina Riaz, +4 more

- 01 Jan 2025

- IEEE sensors journal

TL;DR: This study presents a robust anomaly detection framework, iForest-WGANs, combining Wasserstein GANs with isolation forest algorithms to address missing data in IIoT, achieving 97.2% detection accuracy and 93.9% recall, outperforming conventional benchmarks by 5-12%.

...read moreread less

Journal Article•10.64534/commer.2025.511

Unsupervised Machine Learning Based Anomaly Detection in High Frequency Data: Evidence from Cryptocurrency Market

Muhammad Nouman Latif, +2 more

- 30 Sep 2025

- Pakistan Journal of Commerce and Social ...

Abstract: The rapid integration of cryptocurrencies into the global financial ecosystem has introduced unprecedented challenges in market surveillance, risk management, and anomaly detection. While conventional statistical models such as ARIMA (Autoregressive Integrated Moving Average) and GARCH (Generalized Autoregressive Conditional Heteroscedasticity) have been widely used for anomaly detection, their reliance on assumptions of normality and stationarity often fails to capture the complexities of high-frequency, non-linear cryptocurrency trading. Furthermore, traditional risk metrics including down-to-up volatility, negative conditional skewness, and relative frequency may overlook short-term anomalies due to data aggregation limitations. In order to address these issues, this paper proposes machine-learning model for detecting anomalies in cryptocurrency markets using Jupyter Notebook. We compare four advanced unsupervised machine learning models, i.e, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest (iForest), One-Class Support Vector Machine (OC-SVM), and Local Outlier Factor (LOF) for anomaly detection by using Monte Carlo simulations. The findings indicate that DBSCAN has the highest precision (79.7%) with the fewest false positives, making it ideal for supervisory monitoring. However, the high false positive rates of OC-SVM and Isolation Forest limit their use. By using data of six well-known cryptocurrencies at three different temporal resolutions (daily, hourly, and 15-minute) the performance of these four unsupervised learning techniques also examined and confirmed that the anomalies identified by DBSCAN are also consistent with the other three methods. Additionally, for robustness of results, we use UpSet Plots to incorporate the shared anomalies and found across the three unsupervised learning methods. Number of anomalies also depends on the volatility and time interval of cryptocurrencies, more volatile / high frequency more anomalies. The study presents sound methodological approach for facilitating financial monitoring and mitigating risks in the cryptocurrencies market, and provides useful information for market players, analysts and policymakers. These results emphasize the importance of choosing algorithms based on specific surveillance targets to promote greater stability in digital asset environments.

...read moreread less

References

Journal Article•10.1145/2133360.2133363

Isolation-Based Anomaly Detection

Fei Tony Liu, +2 more

- 01 Mar 2012

- ACM Transactions on Knowledge Discovery ...

TL;DR: This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods.

...read moreread less

1.9K

Journal Article•10.1016/J.INS.2014.03.128

Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm

Omar Abu Arqub, +1 more

- 20 Sep 2014

- Information Sciences

TL;DR: The numerical results show that the proposed continuous genetic algorithm is a robust and accurate procedure for solving systems of second-order boundary value problems and the obtained accuracy for the solutions using CGA is much better than the results obtained using some modern methods.

...read moreread less

515

•Journal Article•10.1613/JAIR.3623

Toward supervised anomaly detection

Nico Görnitz, +3 more

- 01 Jan 2013

- Journal of Artificial Intelligence Resea...

TL;DR: It is argued that semi-supervised anomaly detection needs to ground on the unsupervised learning paradigm and devise a novel algorithm that meets this requirement and it is shown that the optimization problem has a convex equivalent under relatively mild assumptions.

...read moreread less

428

•Journal Article•10.1109/JIOT.2020.3011726

Deep Anomaly Detection for Time-Series Data in Industrial IoT: A Communication-Efficient On-Device Federated Learning Approach

Yi Liu, +6 more

- 15 Apr 2021

- IEEE Internet of Things Journal

TL;DR: In this article, the authors proposed an attention mechanism-based convolutional neural network-long short-term memory (AMCNN-LSTM) model to accurately detect anomalies.

...read moreread less

373

•Journal Article•10.1186/S40537-020-00320-X

A comprehensive survey of anomaly detection techniques for high dimensional big data

Srikanth Thudumu, +3 more

- 01 Dec 2020

- Journal of Big Data

TL;DR: This survey aims to document the state of anomaly detection in high dimensional big data by representing the unique challenges using a triangular model of vertices: the problem, techniques/algorithms, and tools (big data applications/frameworks).

...read moreread less

347

...

Expand