KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on
  Graph Data

doi:10.48550/arxiv.2410.08390

Journal Article10.48550/arxiv.2410.08390

KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data

Andy Zhou, +6 more

- 10 Oct 2024

1

TL;DR: KnowGraph integrates domain knowledge with data-driven learning for enhanced graph-based anomaly detection, outperforming state-of-the-art baselines in transductive and inductive settings, with substantial gains in average precision and improved detection performance under extreme class imbalance.

Abstract: Graph-based anomaly detection is pivotal in diverse security applications, such as fraud detection in transaction networks and intrusion detection for network traffic. Standard approaches, including Graph Neural Networks (GNNs), often struggle to generalize across shifting data distributions. Meanwhile, real-world domain knowledge is more stable and a common existing component of real-world detection strategies. To explicitly integrate such knowledge into data-driven models such as GCNs, we propose KnowGraph, which integrates domain knowledge with data-driven learning for enhanced graph-based anomaly detection. KnowGraph comprises two principal components: (1) a statistical learning component that utilizes a main model for the overarching detection task, augmented by multiple specialized knowledge models that predict domain-specific semantic entities; (2) a reasoning component that employs probabilistic graphical models to execute logical inferences based on model outputs, encoding domain knowledge through weighted first-order logic formulas. Extensive experiments on these large-scale real-world datasets show that KnowGraph consistently outperforms state-of-the-art baselines in both transductive and inductive settings, achieving substantial gains in average precision when generalizing to completely unseen test graphs. Further ablation studies demonstrate the effectiveness of the proposed reasoning component in improving detection performance, especially under extreme class imbalance. These results highlight the potential of integrating domain knowledge into data-driven models for high-stakes, graph-based security applications.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.3390/bdcc9100257

Tester-Guided Graph Learning with End-to-End Detection Certificates for Triangle-Based Anomalies

Manuel J. C. S. Reis

- 12 Oct 2025

- Big data and cognitive computing

Abstract: We investigate anomaly detection in complex networks through a property-testing-guided graph neural model (PT-GNN) that provides an end-to-end miss-probability certificate (δ+α). The method combines (i) a wedge-sampling tester that estimates triangle-closure frequency and derives a concentration bound (δ) via Bernstein’s inequality, with (ii) a lightweight classifier over structural features whose validation error contributes (α). The overall certificate is given by the sum (δ+α), quantifying the probability of missed anomalies under bounded sampling. On synthetic communication graphs with n = 1000, edge probability p = 0.01, and anomalous subgraph size k = 120, PT-GNN achieves perfect detection performance (AUC = 1.0, F1 = 1.0) across all tested regimes. Moreover, the miss-probability certificate tightens systematically as the tester budget m increases (e.g., for ε = 0.06, enlarging m from 2000 to 8000 reduces (δ+α) from ≈0.87 to ≈0.49). These results demonstrate that PT-GNN effectively couples graph learning with property testing, offering both strong empirical detection and formally verifiable guarantees in anomaly detection tasks.

...read moreread less

References

•Proceedings Article•10.1145/2939672.2939785

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, +1 more

- 09 Mar 2016

- arXiv: Learning

TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.

...read moreread less

32.8K

•Journal Article•10.1613/JAIR.953

SMOTE: synthetic minority over-sampling technique

Nitesh V. Chawla, +3 more

- 01 Jan 2002

- Journal of Artificial Intelligence Resea...

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

27.7K

•Posted Content

Focal Loss for Dense Object Detection

Tsung-Yi Lin, +4 more

- 07 Aug 2017

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

...read moreread less

16.7K

•Posted Content

Inductive Representation Learning on Large Graphs

William L. Hamilton, +2 more

- 07 Jun 2017

- arXiv: Social and Information Networks

TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.

...read moreread less

11.9K

•Posted Content

node2vec: Scalable Feature Learning for Networks

Aditya Grover, +1 more

- 03 Jul 2016

- arXiv: Social and Information Networks

TL;DR: In node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks, a flexible notion of a node's network neighborhood is defined and a biased random walk procedure is designed, which efficiently explores diverse neighborhoods.

...read moreread less

6.6K

...

Expand