XBFS: eXploring Runtime Optimizations for Breadth-First Search on GPUs
Anil Gaihre,Zhenlin Wu,Fan Yao,Hang Liu +3 more
- 17 Jun 2019
- pp 121-131
TL;DR: XBFS is proposed that leverages the runtime optimizations atop GPUs to cope with the nondeterministic characteristics of BFS with the following three techniques: first, XBFS adaptively exploits four either new or optimized frontier queue generation designs to accommodate various BFS levels that present dissimilar features.
read more
Abstract: Attracted by the enormous potentials of Graphics Processing Units (GPUs), an array of efforts has surged to deploy Breadth-First Search (BFS) on GPUs, which, however, often exploits the static mechanisms to address the challenges that are dynamic in nature. Such a mismatch prevents us from achieving the optimal performance for offloading graph traversal on GPUs. To this end, we propose XBFS that leverages the runtime optimizations atop GPUs to cope with the nondeterministic characteristics of BFS with the following three techniques: First, XBFS adaptively exploits four either new or optimized frontier queue generation designs to accommodate various BFS levels that present dissimilar features. Second, inspired by the observation that the workload associated with each vertex is not proportional to its degree in bottom-up, we design three new strategies to better balance the workload. Third, XBFS introduces the first truly asynchronous bottom-up traversal which allows BFS to visit vertices for multiple levels at a single iteration with both theoretical soundness and practical benefits. Taken together, XBFS is, on average, 3.5×, 4.9×, 11.2× and 6.1× faster than the state-of-the-art Enterprise, Tigr, Gunrock on a Quadro P6000 GPU and Ligra on a 24-core Intel Xeon Platinum 8175M CPU. Note, the CPU used for Ligra is more expensive than the GPU for XBFS.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Subway: minimizing data transfer during out-of-GPU-memory graph processing
Amir Hossein Nodehi Sabet,Zhijia Zhao,Rajiv Gupta +2 more
- 15 Apr 2020
TL;DR: This work designs a fast subgraph generation algorithm with a simple yet efficient subgraph representation and a GPU-accelerated implementation, and brings asynchrony to the subgraph processing, delaying the synchronization between a subgraph in the GPU memory and the rest of the graph in the CPU memory.
70
Seastar: vertex-centric programming for graph neural networks
Yidi Wu,Kaihao Ma,Zhenkun Cai,Tatiana Jin,Boyang Li,Chenguang Zheng,James Cheng,Fan Yu +7 more
- 21 Apr 2021
TL;DR: Seastar as discussed by the authors is a vertex-centric programming model for GNN training on GPU and provides idiomatic python constructs to enable easy development of novel homogeneous and heterogeneous GNN models.
62
Optimization Techniques for GPU Programming
TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
54
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
TL;DR: C-SAW is introduced, the first framework that accelerates Sampling and Random Walk framework on GPUs, and provides a generic API which allows users to implement a wide range of sampling and random walk algorithms with ease.
46
EMOGI: efficient memory-access for out-of-memory graph-traversal in GPUs
Seungwon Min,Vikram Sharma Mailthody,Zaid Qureshi,Jinjun Xiong,Eiman Ebrahimi,Wen-mei W. Hwu +5 more
- 01 Oct 2020
TL;DR: This paper addresses the open question of whether a sufficiently large number of overlapping cacheline-sized accesses can be sustained to tolerate the long latency to host memory, fully utilize the available bandwidth, and achieve favorable execution performance and proposes EMOGI, an alternative approach to traverse graphs that do not fit in GPU memory using direct cacheline -sized access to data stored in host memory.
References
•Book
Lapack Users' Guide
Ed Anderson
- 01 Feb 1995
TL;DR: The third edition of LAPACK provided a guide to troubleshooting and installation of Routines, as well as providing examples of how to convert from LINPACK or EISPACK to BLAS.
3.2K
•Book
LINPACK Users' Guide
Jack Dongarra,Cleve B. Moler,J. R. Bunch,G. W. Stewart +3 more
- 01 Jan 1987
TL;DR: General matrices Band matrices positive definite matrices Positive definite band matrices Symmetric Indefinite Matrices Triangular matrices Tridiagonal matrices The Cholesky decomposition The QR decomposition up to and including the singular value decomposition is studied.
1.7K
Ligra: a lightweight graph processing framework for shared memory
Julian Shun,Guy E. Blelloch +1 more
- 23 Feb 2013
TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
964
Parallel Prefix Sum (Scan) with CUDA
Mark J. Harris
- 01 Jan 2011
TL;DR: The water needs of this region have changed in recent years from being primarily for agricultural purposes to domestic and industrial uses now, and the needs of these industries have changed as well.
788
Scan primitives for GPU computing
Shubhabrata Sengupta,Mark J. Harris,Yao Zhang,John D. Owens +3 more
- 04 Aug 2007
TL;DR: Using the scan primitives, this work shows novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyzes the performance of the scanPrimitives, several sort algorithms that use the scan Primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.
Related Papers (5)
Julian Shun,Guy E. Blelloch +1 more
- 23 Feb 2013
Duane Merrill,Michael Garland,Andrew S. Grimshaw +2 more
- 25 Feb 2012
Farzad Khorasani,Keval Vora,Rajiv Gupta,Laxmi N. Bhuyan +3 more
- 23 Jun 2014