Journal Article10.48550/arxiv.2410.08390
KnowGraph: Knowledge-Enabled Anomaly Detection via Logical Reasoning on Graph Data
Andy Zhou,Xiaojun Xu,Ramesh Raghunathan,Alok Lal,Xiaohong Guan,Bin Yu,Bo Li +6 more
- 10 Oct 2024
TL;DR: KnowGraph integrates domain knowledge with data-driven learning for enhanced graph-based anomaly detection, outperforming state-of-the-art baselines in transductive and inductive settings, with substantial gains in average precision and improved detection performance under extreme class imbalance.
read more
Abstract: Graph-based anomaly detection is pivotal in diverse security applications, such as fraud detection in transaction networks and intrusion detection for network traffic. Standard approaches, including Graph Neural Networks (GNNs), often struggle to generalize across shifting data distributions. Meanwhile, real-world domain knowledge is more stable and a common existing component of real-world detection strategies. To explicitly integrate such knowledge into data-driven models such as GCNs, we propose KnowGraph, which integrates domain knowledge with data-driven learning for enhanced graph-based anomaly detection. KnowGraph comprises two principal components: (1) a statistical learning component that utilizes a main model for the overarching detection task, augmented by multiple specialized knowledge models that predict domain-specific semantic entities; (2) a reasoning component that employs probabilistic graphical models to execute logical inferences based on model outputs, encoding domain knowledge through weighted first-order logic formulas. Extensive experiments on these large-scale real-world datasets show that KnowGraph consistently outperforms state-of-the-art baselines in both transductive and inductive settings, achieving substantial gains in average precision when generalizing to completely unseen test graphs. Further ablation studies demonstrate the effectiveness of the proposed reasoning component in improving detection performance, especially under extreme class imbalance. These results highlight the potential of integrating domain knowledge into data-driven models for high-stakes, graph-based security applications.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Tester-Guided Graph Learning with End-to-End Detection Certificates for Triangle-Based Anomalies
Manuel J. C. S. Reis
Abstract: We investigate anomaly detection in complex networks through a property-testing-guided graph neural model (PT-GNN) that provides an end-to-end miss-probability certificate (δ+α). The method combines (i) a wedge-sampling tester that estimates triangle-closure frequency and derives a concentration bound (δ) via Bernstein’s inequality, with (ii) a lightweight classifier over structural features whose validation error contributes (α). The overall certificate is given by the sum (δ+α), quantifying the probability of missed anomalies under bounded sampling. On synthetic communication graphs with n = 1000, edge probability p = 0.01, and anomalous subgraph size k = 120, PT-GNN achieves perfect detection performance (AUC = 1.0, F1 = 1.0) across all tested regimes. Moreover, the miss-probability certificate tightens systematically as the tester budget m increases (e.g., for ε = 0.06, enlarging m from 2000 to 8000 reduces (δ+α) from ≈0.87 to ≈0.49). These results demonstrate that PT-GNN effectively couples graph learning with property testing, offering both strong empirical detection and formally verifiable guarantees in anomaly detection tasks.
References
XGBoost: A Scalable Tree Boosting System
Tianqi Chen,Carlos Guestrin +1 more
TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
SMOTE: synthetic minority over-sampling technique
TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
•Posted Content
Focal Loss for Dense Object Detection
TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
16.7K
•Posted Content
Inductive Representation Learning on Large Graphs
TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
11.9K
•Posted Content
node2vec: Scalable Feature Learning for Networks
Aditya Grover,Jure Leskovec +1 more
TL;DR: In node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks, a flexible notion of a node's network neighborhood is defined and a biased random walk procedure is designed, which efficiently explores diverse neighborhoods.
6.6K