Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise

doi:10.1145/3319535.3363224

Proceedings Article10.1145/3319535.3363224

Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise

Fucheng Liu, +5 more

- 06 Nov 2019

- pp 1777-1794

258

TL;DR: This work proposes log2vec, a heterogeneous graph embedding based modularized method that remarkably outperforms state-of-the-art approaches, such as deep learning and hidden markov model (HMM), and shows its capability to detect malicious events in various attack scenarios.

Abstract: Conventional attacks of insider employees and emerging APT are both major threats for the organizational information system. Existing detections mainly concentrate on users' behavior and usually analyze logs recording their operations in an information system. In general, most of these methods consider sequential relationship among log entries and model users' sequential behavior. However, they ignore other relationships, inevitably leading to an unsatisfactory performance on various attack scenarios. We propose log2vec, a heterogeneous graph embedding based modularized method. First, it involves a heuristic approach that converts log entries into a heterogeneous graph in the light of diverse relationships among them. Next, it utilizes an improved graph embedding appropriate to the above heterogeneous graph, which can automatically represent each log entry into a low-dimension vector. The third component of log2vec is a practical detection algorithm capable of separating malicious and benign log entries into different clusters and identifying malicious ones. We implement a prototype of log2vec. Our evaluation demonstrates that log2vec remarkably outperforms state-of-the-art approaches, such as deep learning and hidden markov model (HMM). Besides, log2vec shows its capability to detect malicious events in various attack scenarios.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Book

我的台灣, 看見心靈的故鄉 =2009林磐聳藝術與設計展

蕭瓊瑞撰述, +1 more

- 01 Jan 2009

951

•Posted Content

A Survey on Automated Log Analysis for Reliability Engineering.

Shilin He, +5 more

- 15 Sep 2020

- arXiv: Software Engineering

TL;DR: This survey presents a detailed overview of automated log analysis research, including how to automate and assist the writing of logging statements, how to compress logs,How to parse logs into structured event templates, and how to employ logs to detect anomalies, predict failures, and facilitate diagnosis.

...read moreread less

161

•Proceedings Article•10.1145/3510003.3510155

Log-based Anomaly Detection with Deep Learning: How Far Are We?

Van-Hoang Le, +1 more

- 09 Feb 2022

TL;DR: An in-depth analysis of five state-of-the-art deep learning-based models for detecting system anomalies on four public log datasets, focusing on several aspects of model evaluation, including training data selection, data grouping, class distribution, data noise, and early detection ability.

...read moreread less

158

•Posted Content

LogBERT: Log Anomaly Detection via BERT

Haixuan Guo, +2 more

- 07 Mar 2021

- arXiv: Cryptography and Security

TL;DR: This paper proposes LogBERT, a self-supervised framework for log anomaly detection based on Bidirectional Encoder Representations from Transformers (BERT), which is able to detect anomalies where the underlying patterns deviate from normal log sequences.

...read moreread less

131

•Journal Article•10.1145/3460345

A Survey on Automated Log Analysis for Reliability Engineering

Shilin He, +5 more

- 13 Jul 2021

- ACM Computing Surveys

TL;DR: A detailed overview of automated log analysis research can be found in this paper, where the authors present several promising future directions toward real-world and next-generation automated logging analysis, including how to assist the writing of logging statements, how to compress logs and how to parse logs into structured event templates.

...read moreread less

128

...

Expand

References

Journal Article•10.1126/SCIENCE.290.5500.2323

Nonlinear dimensionality reduction by locally linear embedding.

Sam T. Roweis, +1 more

- 22 Dec 2000

- Science

TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.

...read moreread less

17.4K

•Proceedings Article•10.1145/2623330.2623732

DeepWalk: online learning of social representations

Bryan Perozzi, +2 more

- 24 Aug 2014

TL;DR: DeepWalk as mentioned in this paper uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences, which encode social relations in a continuous vector space, which is easily exploited by statistical models.

...read moreread less

11.4K

•Proceedings Article•10.5555/1283383.1283494

k-means++: the advantages of careful seeding

David Arthur, +1 more

- 07 Jan 2007

TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.

...read moreread less

9.5K

•Proceedings Article•10.1145/2939672.2939754

node2vec: Scalable Feature Learning for Networks

Aditya Grover, +1 more

- 13 Aug 2016

TL;DR: Node2vec as mentioned in this paper learns a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes by using a biased random walk procedure.

...read moreread less

9.5K

•Journal Article•10.1109/TNN.2008.2005605

The Graph Neural Network Model

Franco Scarselli, +4 more

- 01 Jan 2009

- IEEE Transactions on Neural Networks

TL;DR: A new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains, and implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an m-dimensional Euclidean space.

...read moreread less

9.4K

...

Expand

Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise

Chat with Paper

AI Agents for this Paper

Citations

我的台灣, 看見心靈的故鄉 =2009林磐聳藝術與設計展

A Survey on Automated Log Analysis for Reliability Engineering.

Log-based Anomaly Detection with Deep Learning: How Far Are We?

LogBERT: Log Anomaly Detection via BERT

A Survey on Automated Log Analysis for Reliability Engineering

References

Nonlinear dimensionality reduction by locally linear embedding.

DeepWalk: online learning of social representations

k-means++: the advantages of careful seeding

node2vec: Scalable Feature Learning for Networks

The Graph Neural Network Model

Related Papers (5)

DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning

Detecting large-scale system problems by mining console logs

Experience Report: System Log Analysis for Anomaly Detection

HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows

Log clustering based problem identification for online service systems