Gunrock: GPU Graph Analytics

doi:10.1145/3108140

Journal Article10.1145/3108140

Gunrock: GPU Graph Analytics

Yangzihao Wang, +10 more

- 23 Aug 2017

- Vol. 4, Iss: 1, pp 3

156

TL;DR: The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries, and better performance than any other GPU high-level graph library.

Abstract: For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. “Gunrock,” our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock’s overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries, such as Ligra and Galois, and better performance than any other GPU high-level graph library.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1145/3381449

JGraphT—A Java Library for Graph Data Structures and Algorithms

Dimitrios Michail, +3 more

- 18 May 2020

- ACM Transactions on Mathematical Softwar...

TL;DR: JGraphT as discussed by the authors is a programming library that contains very efficient and generic graph data structures along with a large collection of state-of-the-art algorithms, such as shortest paths, spanning trees, graph and subgraph isomorphism, matching and flow problems, approximation algorithms for NP-hard problems such as independent set and the traveling salesman problem.

...read moreread less

140

Proceedings Article•10.1109/ISCA45697.2020.00044

GaaS-X: graph analytics accelerator supporting sparse data representation using crossbar architectures

Nagadastagiri Challapalle, +7 more

- 30 May 2020

TL;DR: This work presents GaaS-X, a graph analytics accelerator that inherently supports the sparse graph data representations using an in-situ compute-enabled crossbar memory architectures and alleviate the overheads of redundant writes, sparse to dense conversions, and redundant computations on the invalid edges that are present in the state of the art crossbar-based PIM accelerators.

...read moreread less

65

Proceedings Article•10.1145/3318464.3389745

Application Driven Graph Partitioning

Wenfei Fan, +8 more

- 11 Jun 2020

TL;DR: This paper proposes an application-driven hybrid partitioning strategy that, given a graph algorithm A, learns a cost model for A as polynomial regression and develops partitioners that given the learned cost model, refine an edge-cut or vertex-cut partition to a hybrid partition and reduce the parallel cost of A.

...read moreread less

64

Journal Article•10.1109/TCAD.2020.2971531

ABCDPlace: Accelerated Batch-Based Concurrent Detailed Placement on Multithreaded CPUs and GPUs

Yibo Lin, +5 more

- 04 Feb 2020

- IEEE Transactions on Computer-Aided Desi...

TL;DR: This article presents a concurrent detailed placement framework, ABCDPlace, exploiting multithreading and graphic processing unit (GPU) acceleration and proposes batch-based concurrent algorithms for widely adopted sequential detailed placement techniques, such as independent set matching, global swap, and local reordering.

...read moreread less

59

Proceedings Article•10.1145/3293883.3295716

A pattern based algorithmic autotuner for graph processing on GPUs

Ke Meng, +3 more

- 16 Feb 2019

TL;DR: Gswitch is a pattern-based algorithmic auto-tuning system that dynamically switches between optimization variants with negligible overhead and provides a simple programming interface that conceals low-level tuning details from the user.

...read moreread less

59

...

Expand

References

•Proceedings Article•10.1145/1772690.1772751

What is Twitter, a social network or a news media?

Haewoon Kwak, +3 more

- 26 Apr 2010

TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.

...read moreread less

7.5K

•Journal Article•10.1080/0022250X.2001.9990249

A faster algorithm for betweenness centrality

Ulrik Brandes

- 01 Jun 2001

- Journal of Mathematical Sociology

TL;DR: New algorithms for betweenness are introduced in this paper and require O(n + m) space and run in O(nm) and O( nm + n2 log n) time on unweighted and weighted networks, respectively, where m is the number of links.

...read moreread less

5.2K

{SNAP Datasets}: {Stanford} Large Network Dataset Collection

Jure Leskovec, +1 more

- 01 Jun 2014

TL;DR: A collection of more than 50 large network datasets from tens of thousands of node and edges to tens of millions of nodes and edges that includes social networks, web graphs, road networks, internet networks, citation networks, collaboration networks, and communication networks.

...read moreread less

4.2K

Journal Article•10.1145/79173.79181

A bridging model for parallel computation

Leslie G. Valiant

- 01 Aug 1990

- Communications of The ACM

TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

...read moreread less

4.1K

Proceedings Article•10.1145/1807167.1807184

Pregel: a system for large-scale graph processing

Grzegorz Malewicz, +6 more

- 06 Jun 2010

TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.

...read moreread less

4.1K

...

Expand

Gunrock: GPU Graph Analytics

Chat with Paper

AI Agents for this Paper

Citations

JGraphT—A Java Library for Graph Data Structures and Algorithms

GaaS-X: graph analytics accelerator supporting sparse data representation using crossbar architectures

Application Driven Graph Partitioning

ABCDPlace: Accelerated Batch-Based Concurrent Detailed Placement on Multithreaded CPUs and GPUs

A pattern based algorithmic autotuner for graph processing on GPUs

References

What is Twitter, a social network or a news media?

A faster algorithm for betweenness centrality

{SNAP Datasets}: {Stanford} Large Network Dataset Collection

A bridging model for parallel computation

Pregel: a system for large-scale graph processing

Related Papers (5)

Ligra: a lightweight graph processing framework for shared memory

The university of Florida sparse matrix collection

Pregel: a system for large-scale graph processing

Scalable GPU graph traversal

A lightweight infrastructure for graph analytics