Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets
Xavier Larriva-Novo,Mario Vega-Barbas,Víctor A. Villagrá,Diego Rivera,Manuel Alvarez-Campana,Julio Berrocal +5 more
TL;DR: A new model of data preprocessing based on a novel distributed computing architecture focused on large-scale datasets such as UGR’16 is presented and the adequateness of decision tree algorithms for training a machine learning model is shown by using a large dataset when compared with a multilayer perceptron neural network.
read more
Abstract: New computational and technological paradigms that currently guide developments in the information society, i.e., Internet of things, pervasive technology, or Ubicomp, favor the appearance of new intrusion vectors that can directly affect people’s daily lives. This, together with advances in techniques and methods used for developing new cyber-attacks, exponentially increases the number of cyber threats which affect the information society. Because of this, the development and improvement of technology that assists cybersecurity experts to prevent and detect attacks arose as a fundamental pillar in the field of cybersecurity. Specifically, intrusion detection systems are now a fundamental tool in the provision of services through the internet. However, these systems have certain limitations, i.e., false positives, real-time analytics, etc., which require their operation to be supervised. Therefore, it is necessary to offer architectures and systems that favor an efficient analysis of the data handled by these tools. In this sense, this paper presents a new model of data preprocessing based on a novel distributed computing architecture focused on large-scale datasets such as UGR’16. In addition, the paper analyzes the use of machine learning techniques in order to improve the response and efficiency of the proposed preprocessing model. Thus, the solution developed achieves good results in terms of computer performance. Finally, the proposal shows the adequateness of decision tree algorithms for training a machine learning model by using a large dataset when compared with a multilayer perceptron neural network.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Analysis of Cyber Security Attacks and Its Solutions for the Smart grid Using Machine Learning and Blockchain Methods
TL;DR: In this paper , the authors look at the many risks and flaws that can affect the safety of critical, innovative grid network components, and propose security solutions using different methods, and also provide recommendations for reducing the chance that these three categories of cyberattacks may occur.
78
An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets.
TL;DR: In this paper, the authors proposed the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm for intrusion detection in IoT networks, and evaluated these preprocessing models in accordance with scalar and normalization functions.
70
An Agile Approach to Identify Single and Hybrid Normalization for Enhancing Machine Learning-Based Network Intrusion Detection
TL;DR: In this article, a statistical method is proposed that can identify the most suitable normalization method for the dataset, which gives the highest accuracy for an intrusion detection system, and the proposed method is also able to identify hybrid normalizations to achieve even improved intrusion detection results.
Edge Intelligence in Smart Grids: A Survey on Architectures, Offloading Models, Cyber Security Measures, and Challenges
TL;DR: It is concluded that most of the viable architectures for EI in smart grids often consist of three layers: device, edge, and cloud, and it is crucial that computation offloading techniques must be framed as optimization problems and addressed effectively in order to increase system performance.
Prepare for trouble and make it double! Supervised – Unsupervised stacking for anomaly-based intrusion detection
Tommaso Zoppi,Andrea Ceccarelli +1 more
TL;DR: In this paper, a two-layer Stacker is proposed to detect unknown zero-day attacks by combining supervised and unsupervised algorithms, which is more effective in detecting unknown attacks than supervised algorithms.
22
References
Efficient tree classifiers for large scale datasets
TL;DR: This paper proposes a randomly partitioned and a Principal Component Analysis-partitioned multivariate decision tree classifiers, of which the training time is quite short and the classification accuracy is quite high, which are superior to other classifiers in most cases.
Searching for Activation Functions.
Prajit Ramachandran,Barret Zoph,Quoc V Le +2 more
TL;DR: Researchers employ automatic search techniques to discover novel activation functions, finding Swish, a function that outperforms ReLU on deeper models, improving top-1 classification accuracy by 0.9% on ImageNet with Mobile NASNet-A and 0.6% with Inception-ResNet-v2.
Investigating the problem of IDS false alarms: An experimental study using Snort
Gina C. Tjhai,Maria Papadaki,Steven Furnell,Nathan Clarke +3 more
- 07 Sep 2008
TL;DR: The tuning process is actually a trade-off between reducing false alarms and maintaining the security level, and often leaves administrators with the difficulty of determining a proper balance between an ideal detection rate and the possibility of having false alarms.
Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach
TL;DR: It is shown that the M-NSR with higher degree of generalisability can handle frequency-based noise because it has higher recognition rate than the previous model under noisy conditions.
Adam: A Method for Stochastic Optimization.
Diederik P. Kingma,Jimmy Lei Ba +1 more
TL;DR: Adam is a stochastic optimization algorithm that adapts to lower-order moments, is computationally efficient, and has intuitive hyper-parameters, with theoretical convergence properties comparable to online convex optimization, and outperforms other methods in practice.