GraphCage: Cache Aware Graph Processing on GPUs.

Open AccessPosted Content

GraphCage: Cache Aware Graph Processing on GPUs.

- 03 Apr 2019

- arXiv: Distributed, Parallel, and Cluste...

7

TL;DR: GraphCage is a cache centric optimization framework for highly efficient graph processing on GPUs that can improve performance by 2 ~ 4x compared to hand optimized implementations and state-of-the-art frameworks with less memory consumption.

Abstract: Efficient Graph processing is challenging because of the irregularity of graph algorithms. Using GPUs to accelerate irregular graph algorithms is even more difficult to be efficient, since GPU's highly structured SIMT architecture is not a natural fit for irregular applications. With lots of previous efforts spent on subtly mapping graph algorithms onto the GPU, the performance of graph processing on GPUs is still highly memory-latency bound, leading to low utilization of compute resources. Random memory accesses generated by the sparse graph data structure are the major causes of this significant memory access latency. Simply applying the conventional cache blocking technique proposed for matrix computation have limited benefit due to the significant overhead on the GPU. We propose GraphCage, a cache centric optimization framework for highly efficient graph processing on GPUs. We first present a throughput-oriented cache blocking scheme (TOCAB) in both push and pull directions. Comparing with conventional cache blocking which suffers repeated accesses when processing large graphs on GPUs, TOCAB is specifically optimized for the GPU architecture to reduce this overhead and improve memory access efficiency. To integrate our scheme into state-of-the-art implementations without significant overhead, we coordinate TOCAB with load balancing strategies by considering the sparsity of subgraphs. To enable cache blocking for traversal-based algorithms, we consider the benefit and overhead in different iterations with different working set sizes, and apply TOCAB for topology-driven kernels in pull direction. Evaluation shows that GraphCage can improve performance by 2 ~ 4x compared to hand optimized implementations and state-of-the-art frameworks (e.g. CuSha and Gunrock), with less memory consumption than CuSha.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1186/S40537-021-00443-9

An analysis of the graph processing landscape

Miguel E. Coimbra, +2 more

- 26 Nov 2019

- arXiv: Distributed, Parallel, and Cluste...

TL;DR: The use-case of performing global computations over a graph, it is first ingested into a graph processing system from one of many digital representations, and can be done with single-machine systems (with varying approaches to hardware usage), distributed systems (either homogeneous or heterogeneous groups of machines) and systems dedicated to high-performance computing (HPC) as discussed by the authors.

...read moreread less

7

•Journal Article•10.1186/S40537-021-00443-9

An analysis of the graph processing landscape

Miguel E. Coimbra, +5 more

- 01 Jan 2021

- Journal of Big Data

TL;DR: A recent survey as mentioned in this paper provides an overview of different aspects of the graph processing landscape and describes classes of systems based on a set of dimensions, including paradigms to express graph processing, different types of systems to use, coordination and communication models in distributed graph processing and partitioning techniques.

...read moreread less

2

•Proceedings Article•10.1109/icde53745.2022.00246

GX-Plug: a Middleware for Plugging Accelerators to Distributed Graph Processing

01 May 2022

TL;DR: GX-plug as discussed by the authors is a middleware for large-scale graph processing, which integrates the merits of distributed graph processing and high-performance accelerators by plugging accelerators to distributed graph systems.

...read moreread less

2

Proceedings Article•10.48550/arXiv.2203.13005

GX-Plug: a Middleware for Plugging Accelerators to Distributed Graph Processing

Kai Zou, +3 more

- 24 Mar 2022

TL;DR: For improving the middleware performance, a series of techniques, including pipeline shuffle, synchro-nization caching and skipping, and workload balancing, are studied, for intra-, inter-, and beyond-iteration optimizations, respectively.

...read moreread less

1

•10.22061/JECEI.2020.7453.393

NodeFetch: High Performance Graph Processing using Processing in Memory

M. Mosayebi, +1 more

- 01 Jan 2021

TL;DR: NestFetch, a new method to access nodes and their neighbors while processing a graph by adding a new command to HMC system is proposed, a way of dealing with large-scale graph processing, considering recent advances in the field.

...read moreread less

1

References

•Proceedings Article

The PageRank Citation Ranking : Bringing Order to the Web

Lawrence Page, +3 more

- 11 Nov 1999

TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.

...read moreread less

16.4K

Journal Article•10.1145/2049662.2049663

The university of Florida sparse matrix collection

Timothy A. Davis, +1 more

- 07 Dec 2011

- ACM Transactions on Mathematical Softwar...

TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.

...read moreread less

4.3K

Proceedings Article•10.1145/1807167.1807184

Pregel: a system for large-scale graph processing

Grzegorz Malewicz, +6 more

- 06 Jun 2010

TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.

...read moreread less

4.1K

Proceedings Article•10.1145/2487788.2488173

KONECT: the Koblenz network collection

Jérôme Kunegis

- 13 May 2013

TL;DR: KONECT's taxonomy of networks datasets is described, an overview of the datasets included, a review of the supported statistics and plots, and the project's role in the area of web science and network science are discussed.

...read moreread less

1.7K

Journal Article•10.14778/2212351.2212354

Distributed GraphLab: a framework for machine learning and data mining in the cloud

Yucheng Low, +5 more

- 01 Apr 2012

TL;DR: GraphLab as discussed by the authors extends the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees to reduce network congestion and mitigate the effect of network latency in the shared-memory setting.

...read moreread less

1.6K