Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets
Xavier Larriva-Novo,Mario Vega-Barbas,Víctor A. Villagrá,Diego Rivera,Manuel Alvarez-Campana,Julio Berrocal +5 more
TL;DR: A new model of data preprocessing based on a novel distributed computing architecture focused on large-scale datasets such as UGR’16 is presented and the adequateness of decision tree algorithms for training a machine learning model is shown by using a large dataset when compared with a multilayer perceptron neural network.
read more
Abstract: New computational and technological paradigms that currently guide developments in the information society, i.e., Internet of things, pervasive technology, or Ubicomp, favor the appearance of new intrusion vectors that can directly affect people’s daily lives. This, together with advances in techniques and methods used for developing new cyber-attacks, exponentially increases the number of cyber threats which affect the information society. Because of this, the development and improvement of technology that assists cybersecurity experts to prevent and detect attacks arose as a fundamental pillar in the field of cybersecurity. Specifically, intrusion detection systems are now a fundamental tool in the provision of services through the internet. However, these systems have certain limitations, i.e., false positives, real-time analytics, etc., which require their operation to be supervised. Therefore, it is necessary to offer architectures and systems that favor an efficient analysis of the data handled by these tools. In this sense, this paper presents a new model of data preprocessing based on a novel distributed computing architecture focused on large-scale datasets such as UGR’16. In addition, the paper analyzes the use of machine learning techniques in order to improve the response and efficiency of the proposed preprocessing model. Thus, the solution developed achieves good results in terms of computer performance. Finally, the proposal shows the adequateness of decision tree algorithms for training a machine learning model by using a large dataset when compared with a multilayer perceptron neural network.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Urban Flood Disaster Mitigation through Image Classification Using Transfer Learning Method with MobileNet Fine-tuning
Andi Sadri Agung,Satria Gunawan Zain,Fhatiah Adiba,Dyah Darma Andayani,Andi Baso Kaswar +4 more
- 14 Nov 2023
TL;DR: This research encompasses several vital stages, encompassing data collection, data pre-processing, the application of fine-tuned transfer learning techniques, and evaluation using a confusion matrix, attaining impressive outcomes.
Extraction of Minimal Set of Traffic Features Using Ensemble of Classifiers and Rank Aggregation for Network Intrusion Detection Systems
Jacek Krupski,Marcin Iwanowski,Waldemar Graniszewski +2 more
TL;DR: This study proposes a feature selection method using ensemble classifiers and rank aggregation to identify a minimal set of traffic features for network intrusion detection systems, achieving high accuracy and computational efficiency with just 8 key features.
Cyber security in healthcare system: a systematic approach of modern threadsand development
Prof. J. I. Nandalwar,P. P. Pandhare,Prof. V. V. Shirashyad,Prof. T. S. Deshmukh +3 more
- 16 Dec 2023
TL;DR: Healthcare industry provides medical devices such as pharmaceutical and the third party vendor can also pose a risk to the organizations, so healthcare organizations should implement a range of security measures.
Effective Pre-processing of Datasets by Removing Anomalies with OptiContamFinder on Isolation Forest for Stock Market forecasting and Nowcasting
Jayaraman Kumarappan,Elakkiya R,Jayaraman Kumarappan,Elakkiya R +3 more
Use of traffic sampling in anomaly detection for high-throughput network links
Marek Bolanowski,Andrzej Paszkiewicz,Hubert Mazur +2 more
- 26 Sep 2023
TL;DR: The limitation of the sampling frequency for network traffic parameters is proposed as a technique to reduce the computational complexity of anomaly detection methods and has been verified in a real network link monitoring system for a medium-sized ISP.
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Classification and regression trees
TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
A detailed analysis of the KDD CUP 99 data set
Mahbod Tavallaee,Ebrahim Bagheri,Wei Lu,Ali A. Ghorbani +3 more
- 08 Jul 2009
TL;DR: A new data set is proposed, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.
4.6K