Journal Article10.1016/j.sciaf.2024.e02386
Anomaly detection using unsupervised machine learning algorithms: A simulation study
Edmund Fosu Agyemang
7
About: This article is published in Scientific African. The article was published on 01 Sep 2024. The article focuses on the topics: Anomaly detection & Computer science.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Outlier Detection Using Gaussian Mixture Model Clustering to Optimize XGBoost for Credit Approval Prediction
De Rosal Ignatius Moses Setiadi,Ahmad Rofiqul Muslikh,Syahroni Wahyu Iriananda,Warto Warto,Jutono Gondohanindijo,Arnold Adimabua Ojugo +5 more
TL;DR: This study proposes a credit approval prediction method combining Gaussian Mixture Model (GMM) outlier detection with XGBoost, achieving 95.493% accuracy, 91.650% recall, and 95.145% AUC, outperforming other models on an imbalanced dataset.
Scalable unsupervised labeling with SHAP feature selection for fraud detection in imbalanced data
Mary Anne Walauskis,Taghi M. Khoshgoftaar +1 more
Abstract: Abstract There is a growing need for labeled data, yet manual annotation is costly, error-prone, and often infeasible in privacy-sensitive, highly imbalanced domains such as fraud detection. We introduce a fully unsupervised framework that combines unsupervised SHapley Additive exPlanations (SHAP) feature selection with our novel unsupervised labeling method. We apply unsupervised SHAP to the Kaggle Credit Card Fraud Detection and Medicare Part D datasets to produce high-impact feature subsets, and then label the datasets with our unsupervised labeling approach. To effectively evaluate the labels generated by our novel methodology, we apply a baseline unsupervised learner, Isolation Forest (IF), to both the original datasets and their subsets. We calculate Matthew’s Correlation Coefficient (MCC), Jaccard Index (JI), Precision, Recall, and F1-score by comparing our generated labels against the ground truth labels. It is important to note, the ground truth labels were used solely for evaluation. Our empirical results surpass the results obtained with the full feature dataset and baseline. By improving label quality while reducing computational complexity and preserving privacy, our approach offers a practical solution for learning from unlabeled, severely imbalanced data.
A Robust Anomaly Detection Framework in Industrial Internet of Things
Rubina Riaz,Guangjie Han,Kamran Shaukat,Naimat Ullah Khan,Lei Wang +4 more
TL;DR: This study presents a robust anomaly detection framework, iForest-WGANs, combining Wasserstein GANs with isolation forest algorithms to address missing data in IIoT, achieving 97.2% detection accuracy and 93.9% recall, outperforming conventional benchmarks by 5-12%.
Unsupervised Machine Learning Based Anomaly Detection in High Frequency Data: Evidence from Cryptocurrency Market
Muhammad Nouman Latif,Muhittin Kaplan,Asad ul Islam Khan +2 more
Abstract: The rapid integration of cryptocurrencies into the global financial ecosystem has introduced unprecedented challenges in market surveillance, risk management, and anomaly detection. While conventional statistical models such as ARIMA (Autoregressive Integrated Moving Average) and GARCH (Generalized Autoregressive Conditional Heteroscedasticity) have been widely used for anomaly detection, their reliance on assumptions of normality and stationarity often fails to capture the complexities of high-frequency, non-linear cryptocurrency trading. Furthermore, traditional risk metrics including down-to-up volatility, negative conditional skewness, and relative frequency may overlook short-term anomalies due to data aggregation limitations. In order to address these issues, this paper proposes machine-learning model for detecting anomalies in cryptocurrency markets using Jupyter Notebook. We compare four advanced unsupervised machine learning models, i.e, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest (iForest), One-Class Support Vector Machine (OC-SVM), and Local Outlier Factor (LOF) for anomaly detection by using Monte Carlo simulations. The findings indicate that DBSCAN has the highest precision (79.7%) with the fewest false positives, making it ideal for supervisory monitoring. However, the high false positive rates of OC-SVM and Isolation Forest limit their use. By using data of six well-known cryptocurrencies at three different temporal resolutions (daily, hourly, and 15-minute) the performance of these four unsupervised learning techniques also examined and confirmed that the anomalies identified by DBSCAN are also consistent with the other three methods. Additionally, for robustness of results, we use UpSet Plots to incorporate the shared anomalies and found across the three unsupervised learning methods. Number of anomalies also depends on the volatility and time interval of cryptocurrencies, more volatile / high frequency more anomalies. The study presents sound methodological approach for facilitating financial monitoring and mitigating risks in the cryptocurrencies market, and provides useful information for market players, analysts and policymakers. These results emphasize the importance of choosing algorithms based on specific surveillance targets to promote greater stability in digital asset environments.
References
Isolation-Based Anomaly Detection
TL;DR: This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods.
1.9K
Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm
TL;DR: The numerical results show that the proposed continuous genetic algorithm is a robust and accurate procedure for solving systems of second-order boundary value problems and the obtained accuracy for the solutions using CGA is much better than the results obtained using some modern methods.
515
Toward supervised anomaly detection
TL;DR: It is argued that semi-supervised anomaly detection needs to ground on the unsupervised learning paradigm and devise a novel algorithm that meets this requirement and it is shown that the optimization problem has a convex equivalent under relatively mild assumptions.
Deep Anomaly Detection for Time-Series Data in Industrial IoT: A Communication-Efficient On-Device Federated Learning Approach
TL;DR: In this article, the authors proposed an attention mechanism-based convolutional neural network-long short-term memory (AMCNN-LSTM) model to accurately detect anomalies.
373
A comprehensive survey of anomaly detection techniques for high dimensional big data
TL;DR: This survey aims to document the state of anomaly detection in high dimensional big data by representing the unique challenges using a triangular model of vertices: the problem, techniques/algorithms, and tools (big data applications/frameworks).