Efficient Log-based Anomaly Detection with Knowledge Distillation

doi:10.1109/icws62655.2024.00078

Journal Article10.1109/icws62655.2024.00078

Efficient Log-based Anomaly Detection with Knowledge Distillation

Huy-Trung Nguyen, +4 more

- 07 Jul 2024

pp 578-589

TL;DR: This paper proposes DistilLog, a lightweight anomaly detection method for system logs, addressing limitations of deep learning models on resource-constrained devices with Knowledge Distillation, achieving high F-measures on HDFS and BGL datasets.

Abstract: Logs are produced by many systems for troubleshooting purposes. Detecting abnormal events is crucial to maintaining regular operations and securing the security of systems. Despite the achievements of deep learning models on anomaly detection, it remains challenging to apply these deep learning models in some scenarios; one popular case is deploying on resource-constrained scenarios such as IoT devices due to the limitation of computational resources on these devices. We identify two main problems of adopting these deep learning models in practice, including (1) they cannot deploy on resource-constrained devices because of the size of large models and the time needed to analyze data with the models, and (2) they cannot achieve satisfactory detection accuracy with simple models. In this work, we proposed a novel lightweight anomaly detection method from system logs, DistilLog, to overcome these problems. DistilLog utilizes a pretrained word2vec model to represent log event templates as semantic vectors, incorporated with the PCA dimensionality reduction algorithm to minimize computational and storage burden. The Knowledge Distillation technique is applied to reduce the size of the detection model while maintaining high detection accuracy. The experimental results show that DistilLog can achieve high F-measures of 0.964 and 0.961 on HDFS and BGL datasets while maintaining the minimized model size and fastest detection speed. This effectiveness and efficiency demonstrate the potential for widespread use in most scenarios by showing the ability to deploy the proposed model on resource-constrained systems.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Posted Content

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, +2 more

- 09 Mar 2015

- arXiv: Machine Learning

TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

...read moreread less

21.2K

•Proceedings Article•10.1109/CVPR.2018.00716

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

Xiangyu Zhang, +3 more

- 18 Jun 2018

TL;DR: ShuffleNet as discussed by the authors utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy, and achieves an actual speedup over AlexNet while maintaining comparable accuracy.

...read moreread less

7.4K

•Posted Content

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

Xiangyu Zhang, +3 more

- 04 Jul 2017

- arXiv: Computer Vision and Pattern Recog...

TL;DR: An extremely computation-efficient CNN architecture named ShuffleNet is introduced, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs), to greatly reduce computation cost while maintaining accuracy.

...read moreread less

4.6K

•Proceedings Article•10.1109/ICCV.2017.155

Channel Pruning for Accelerating Very Deep Neural Networks

Yihui He, +2 more

- 01 Oct 2017

TL;DR: In this paper, a LASSO regression based channel selection and least square reconstruction is proposed to accelerate very deep convolutional neural networks, which achieves 5× speedup along with only 0.3% increase of error.

...read moreread less

3K

...

Expand