Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise

doi:10.1145/3319535.3363224

Proceedings Article10.1145/3319535.3363224

Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise

Fucheng Liu, +5 more

- 06 Nov 2019

- pp 1777-1794

258

TL;DR: This work proposes log2vec, a heterogeneous graph embedding based modularized method that remarkably outperforms state-of-the-art approaches, such as deep learning and hidden markov model (HMM), and shows its capability to detect malicious events in various attack scenarios.

Abstract: Conventional attacks of insider employees and emerging APT are both major threats for the organizational information system. Existing detections mainly concentrate on users' behavior and usually analyze logs recording their operations in an information system. In general, most of these methods consider sequential relationship among log entries and model users' sequential behavior. However, they ignore other relationships, inevitably leading to an unsatisfactory performance on various attack scenarios. We propose log2vec, a heterogeneous graph embedding based modularized method. First, it involves a heuristic approach that converts log entries into a heterogeneous graph in the light of diverse relationships among them. Next, it utilizes an improved graph embedding appropriate to the above heterogeneous graph, which can automatically represent each log entry into a low-dimension vector. The third component of log2vec is a practical detection algorithm capable of separating malicious and benign log entries into different clusters and identifying malicious ones. We implement a prototype of log2vec. Our evaluation demonstrates that log2vec remarkably outperforms state-of-the-art approaches, such as deep learning and hidden markov model (HMM). Besides, log2vec shows its capability to detect malicious events in various attack scenarios.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.70588/suluahpasaman.v1i2.141

Interpersonal Communication in the Character Building of Students in Islamic Boarding Schools

Dafrizal Dafrizal, +4 more

- 28 Oct 2023

TL;DR: This qualitative study explores how teachers' interpersonal communication approaches and strategies shape the character of Islamic boarding school students in Indonesia, identifying three approaches and two strategies used to form students' characters through teacher-student interactions.

...read moreread less

Proceedings Article•10.1109/noms54207.2022.9789921

Pikachu: Temporal Walk Based Dynamic Graph Embedding for Network Anomaly Detection

25 Apr 2022

TL;DR: PIKACHU as discussed by the authors is a sophisticated, unsupervised, temporal walk-based dynamic network embedding technique that can capture both network topology as well as highly granular temporal information.

...read moreread less

Journal Article•10.1109/tifs.2025.3618381

SauronEyes: Disentangling Voluminous Logs to Unveil Camouflaged Attack Intentions

Wei Qiao, +11 more

- 01 Jan 2025

- IEEE Transactions on Information Forensi...

TL;DR: This paper introduces SauronEyes, an APT detection system addressing sparsity and camouflaged attack intentions in voluminous logs, leveraging graph learning and self-supervised contrastive learning to achieve 99% detection accuracy in real-world and simulated scenarios.

...read moreread less

Journal Article•10.48550/arxiv.2409.11890

Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction

Caihong Wang, +2 more

- 18 Sep 2024

- arXiv.org

TL;DR: This study proposes Log2graphs, an unsupervised log anomaly detection framework, leveraging DualGCN-LogAE for efficient feature extraction, which adapts to various scenarios and identifies abnormal logs without labeled data, outperforming existing methods in detection accuracy and clustering quality.

...read moreread less

•Proceedings Article•10.1145/3433210.3453098

Recompose Event Sequences vs. Predict Next Events: A Novel Anomaly Detection Approach for Discrete Event Logs

Lun-Pin Yuan, +2 more

- 24 May 2021

TL;DR: DabLog as mentioned in this paper is a LSTM-based Deep Autoencoder-based anomaly detection method for discrete event logs, which determines whether a sequence is normal or abnormal by analyzing (encoding) and reconstructing the given sequence.

...read moreread less

...

Expand

References

•Proceedings Article

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

27.5K

•Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

24.1K

•Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013

- arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

22.9K

•Posted Content

Semi-Supervised Classification with Graph Convolutional Networks

Thomas Kipf, +1 more

- 09 Sep 2016

- arXiv: Learning

TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.

...read moreread less

22.7K

•Journal Article•10.1016/0377-0427(87)90125-7

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Peter J. Rousseeuw

- 01 Nov 1987

- Journal of Computational and Applied Mat...

TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

...read moreread less

19K

...

Expand

Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise

Chat with Paper

AI Agents for this Paper

Citations

Interpersonal Communication in the Character Building of Students in Islamic Boarding Schools

Pikachu: Temporal Walk Based Dynamic Graph Embedding for Network Anomaly Detection

SauronEyes: Disentangling Voluminous Logs to Unveil Camouflaged Attack Intentions

Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction

Recompose Event Sequences vs. Predict Next Events: A Novel Anomaly Detection Approach for Discrete Event Logs

References

Efficient Estimation of Word Representations in Vector Space

Distributed Representations of Words and Phrases and their Compositionality

Distributed Representations of Words and Phrases and their Compositionality

Semi-Supervised Classification with Graph Convolutional Networks

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Related Papers (5)

DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning

Detecting large-scale system problems by mining console logs

Experience Report: System Log Analysis for Anomaly Detection

HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows

Log clustering based problem identification for online service systems