Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets
Xavier Larriva-Novo,Mario Vega-Barbas,Víctor A. Villagrá,Diego Rivera,Manuel Alvarez-Campana,Julio Berrocal +5 more
TL;DR: A new model of data preprocessing based on a novel distributed computing architecture focused on large-scale datasets such as UGR’16 is presented and the adequateness of decision tree algorithms for training a machine learning model is shown by using a large dataset when compared with a multilayer perceptron neural network.
read more
Abstract: New computational and technological paradigms that currently guide developments in the information society, i.e., Internet of things, pervasive technology, or Ubicomp, favor the appearance of new intrusion vectors that can directly affect people’s daily lives. This, together with advances in techniques and methods used for developing new cyber-attacks, exponentially increases the number of cyber threats which affect the information society. Because of this, the development and improvement of technology that assists cybersecurity experts to prevent and detect attacks arose as a fundamental pillar in the field of cybersecurity. Specifically, intrusion detection systems are now a fundamental tool in the provision of services through the internet. However, these systems have certain limitations, i.e., false positives, real-time analytics, etc., which require their operation to be supervised. Therefore, it is necessary to offer architectures and systems that favor an efficient analysis of the data handled by these tools. In this sense, this paper presents a new model of data preprocessing based on a novel distributed computing architecture focused on large-scale datasets such as UGR’16. In addition, the paper analyzes the use of machine learning techniques in order to improve the response and efficiency of the proposed preprocessing model. Thus, the solution developed achieves good results in terms of computer performance. Finally, the proposal shows the adequateness of decision tree algorithms for training a machine learning model by using a large dataset when compared with a multilayer perceptron neural network.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Development of method for identification the computer system state based on the decision tree with multidimensional nodes
TL;DR: The carried out experiments have confirmed the efficiency of the proposed method for constructing a decision tree, which makes it possible to recommend it for practical use in order to improve the accuracy of identifying the state of a computer system.
IoT Network Intrusion Detection Using Machine Learning Techniques
Bauyrzhan Omarov,Omirlan Auelbekov,Tursynay Koishiyeva,Ruslan Sadybekov,Yerkebulan Uxikbayev,Aizhan Bazarbayeva +5 more
- 28 Apr 2022
TL;DR: In this article , a comparison of the most widely used approaches for detecting anomalies in data is carried out using the following criteria: the speed of implementation and the model's data needs.
Machine Learning-Based Anomaly Detection in Cloud Virtual Machine Resource Usage
15 Jun 2023
TL;DR: Anomaly detection is an important activity in cloud computing systems because it aids in the identification of odd behaviours or actions that may result in software glitch, security breaches, and performance difficulties as mentioned in this paper .
Network Intrusion Detection and Prevention System Using Hybrid Machine Learning with Supervised Ensemble Stacking Model
Godfrey A. Mills,Daniel K. Acquah,Robert A. Sowah +2 more
TL;DR: This paper proposes a hybrid intrusion detection system combining supervised and unsupervised learning models via ensemble stacking, achieving 99.84% and 99.90% detection accuracy on NSL-KDD and CIC-DDoS2019 datasets, respectively, with low false positive rates.
IoT Network Intrusion Detection Using Machine Learning Techniques
28 Apr 2022
TL;DR: In this article , a comparison of the most widely used approaches for detecting anomalies in data is carried out using the following criteria: the speed of implementation and the model's data needs.
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Classification and regression trees
TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
A detailed analysis of the KDD CUP 99 data set
Mahbod Tavallaee,Ebrahim Bagheri,Wei Lu,Ali A. Ghorbani +3 more
- 08 Jul 2009
TL;DR: A new data set is proposed, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.
4.6K