Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets

doi:10.3390/APP10103430

Open AccessJournal Article10.3390/APP10103430

Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets

Xavier Larriva-Novo, +5 more

- 15 May 2020

- Applied Sciences

- Vol. 10, Iss: 10, pp 3430

30

TL;DR: A new model of data preprocessing based on a novel distributed computing architecture focused on large-scale datasets such as UGR’16 is presented and the adequateness of decision tree algorithms for training a machine learning model is shown by using a large dataset when compared with a multilayer perceptron neural network.

Abstract: New computational and technological paradigms that currently guide developments in the information society, i.e., Internet of things, pervasive technology, or Ubicomp, favor the appearance of new intrusion vectors that can directly affect people’s daily lives. This, together with advances in techniques and methods used for developing new cyber-attacks, exponentially increases the number of cyber threats which affect the information society. Because of this, the development and improvement of technology that assists cybersecurity experts to prevent and detect attacks arose as a fundamental pillar in the field of cybersecurity. Specifically, intrusion detection systems are now a fundamental tool in the provision of services through the internet. However, these systems have certain limitations, i.e., false positives, real-time analytics, etc., which require their operation to be supervised. Therefore, it is necessary to offer architectures and systems that favor an efficient analysis of the data handled by these tools. In this sense, this paper presents a new model of data preprocessing based on a novel distributed computing architecture focused on large-scale datasets such as UGR’16. In addition, the paper analyzes the use of machine learning techniques in order to improve the response and efficiency of the proposed preprocessing model. Thus, the solution developed achieves good results in terms of computer performance. Finally, the proposal shows the adequateness of decision tree algorithms for training a machine learning model by using a large dataset when compared with a multilayer perceptron neural network.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.3390/fi15020083

Analysis of Cyber Security Attacks and Its Solutions for the Smart grid Using Machine Learning and Blockchain Methods

Tehseen Mazhar, +5 more

- 19 Feb 2023

- Future Internet

TL;DR: In this paper , the authors look at the many risks and flaws that can affect the safety of critical, innovative grid network components, and propose security solutions using different methods, and also provide recommendations for reducing the chance that these three categories of cyberattacks may occur.

...read moreread less

78

•Journal Article•10.3390/S21020656

An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets.

Xavier Larriva-Novo, +4 more

- 19 Jan 2021

- Sensors

TL;DR: In this paper, the authors proposed the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm for intrusion detection in IoT networks, and evaluated these preprocessing models in accordance with scalar and normalization functions.

...read moreread less

70

•Journal Article•10.1109/ACCESS.2021.3118361

An Agile Approach to Identify Single and Hybrid Normalization for Enhancing Machine Learning-Based Network Intrusion Detection

Murtaza Ahmed Siddiqi, +1 more

- 06 Oct 2021

- IEEE Access

TL;DR: In this article, a statistical method is proposed that can identify the most suitable normalization method for the dataset, which gives the highest accuracy for an intrusion detection system, and the proposed method is also able to identify hybrid normalizations to achieve even improved intrusion detection results.

...read moreread less

57

•Journal Article•10.3390/jsan11030047

Edge Intelligence in Smart Grids: A Survey on Architectures, Offloading Models, Cyber Security Measures, and Challenges

Daisy Nkele Molokomme, +2 more

- 21 Aug 2022

- Journal of Sensor and Actuator Networks

TL;DR: It is concluded that most of the viable architectures for EI in smart grids often consist of three layers: device, edge, and cloud, and it is crucial that computation offloading techniques must be framed as optimization problems and addressed effectively in order to increase system performance.

...read moreread less

22

•Journal Article•10.1016/J.JNCA.2021.103106

Prepare for trouble and make it double! Supervised – Unsupervised stacking for anomaly-based intrusion detection

Tommaso Zoppi, +1 more

- 01 Sep 2021

- Journal of Network and Computer Applicat...

TL;DR: In this paper, a two-layer Stacker is proposed to detect unknown zero-day attacks by combining supervised and unsupervised algorithms, which is more effective in detecting unknown attacks than supervised algorithms.

...read moreread less

22

...

Expand

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Journal Article•10.1023/A:1010933404324

Random Forests

Leo Breiman

- 01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

113.1K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

Journal Article•10.1002/WIDM.8

Classification and regression trees

Wei-Yin Loh

- 01 Jan 2011

- Wiley Interdisciplinary Reviews-Data Min...

TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.

...read moreread less

18.7K

Proceedings Article•10.1109/CISDA.2009.5356528

A detailed analysis of the KDD CUP 99 data set

Mahbod Tavallaee, +3 more

- 08 Jul 2009

TL;DR: A new data set is proposed, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.

...read moreread less

4.6K