Journal Article10.1145/3108140
Gunrock: GPU Graph Analytics
Yangzihao Wang,Yuechao Pan,Andrew Davidson,Yuduo Wu,Carl Yang,Leyuan Wang,Muhammad Osama,Chenshan Yuan,Weitang Liu,Andy Riffel,John D. Owens +10 more
- 23 Aug 2017
- Vol. 4, Iss: 1, pp 3
TL;DR: The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries, and better performance than any other GPU high-level graph library.
read more
Abstract: For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. “Gunrock,” our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock’s overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries, such as Ligra and Galois, and better performance than any other GPU high-level graph library.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
JGraphT—A Java Library for Graph Data Structures and Algorithms
TL;DR: JGraphT as discussed by the authors is a programming library that contains very efficient and generic graph data structures along with a large collection of state-of-the-art algorithms, such as shortest paths, spanning trees, graph and subgraph isomorphism, matching and flow problems, approximation algorithms for NP-hard problems such as independent set and the traveling salesman problem.
140
GaaS-X: graph analytics accelerator supporting sparse data representation using crossbar architectures
Nagadastagiri Challapalle,Sahithi Rampalli,Linghao Song,Nandhini Chandramoorthy,Karthik Swaminathan,John Sampson,Yi Chen,Vijaykrishnan Narayanan +7 more
- 30 May 2020
TL;DR: This work presents GaaS-X, a graph analytics accelerator that inherently supports the sparse graph data representations using an in-situ compute-enabled crossbar memory architectures and alleviate the overheads of redundant writes, sparse to dense conversions, and redundant computations on the invalid edges that are present in the state of the art crossbar-based PIM accelerators.
65
Application Driven Graph Partitioning
Wenfei Fan,Ruochun Jin,Muyang Liu,Ping Lu,Luo Xiaojian,Ruiqi Xu,Qiang Yin,Wenyuan Yu,Jingren Zhou +8 more
- 11 Jun 2020
TL;DR: This paper proposes an application-driven hybrid partitioning strategy that, given a graph algorithm A, learns a cost model for A as polynomial regression and develops partitioners that given the learned cost model, refine an edge-cut or vertex-cut partition to a hybrid partition and reduce the parallel cost of A.
64
ABCDPlace: Accelerated Batch-Based Concurrent Detailed Placement on Multithreaded CPUs and GPUs
TL;DR: This article presents a concurrent detailed placement framework, ABCDPlace, exploiting multithreading and graphic processing unit (GPU) acceleration and proposes batch-based concurrent algorithms for widely adopted sequential detailed placement techniques, such as independent set matching, global swap, and local reordering.
59
A pattern based algorithmic autotuner for graph processing on GPUs
Ke Meng,Jiajia Li,Guangming Tan,Ninghui Sun +3 more
- 16 Feb 2019
TL;DR: Gswitch is a pattern-based algorithmic auto-tuning system that dynamically switches between optimization variants with negligible overhead and provides a simple programming interface that conceals low-level tuning details from the user.
59
References
What is Twitter, a social network or a news media?
Haewoon Kwak,Changhyun Lee,Hosung Park,Sue Moon +3 more
- 26 Apr 2010
TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
7.5K
A faster algorithm for betweenness centrality
TL;DR: New algorithms for betweenness are introduced in this paper and require O(n + m) space and run in O(nm) and O( nm + n2 log n) time on unweighted and weighted networks, respectively, where m is the number of links.
{SNAP Datasets}: {Stanford} Large Network Dataset Collection
Jure Leskovec,Andrej Krevl +1 more
- 01 Jun 2014
TL;DR: A collection of more than 50 large network datasets from tens of thousands of node and edges to tens of millions of nodes and edges that includes social networks, web graphs, road networks, internet networks, citation networks, collaboration networks, and communication networks.
4.2K
A bridging model for parallel computation
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
4.1K
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
- 06 Jun 2010
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Related Papers (5)
Julian Shun,Guy E. Blelloch +1 more
- 23 Feb 2013
Duane Merrill,Michael Garland,Andrew S. Grimshaw +2 more
- 25 Feb 2012
Donald Nguyen,Andrew Lenharth,Keshav Pingali +2 more
- 03 Nov 2013