GraphPEG: Accelerating Graph Processing on GPUs
TL;DR: GraphPEG as discussed by the authors improves the performance of graph processing by coupling automatic edge gathering with fine-grain work distribution, which is based on the observation that many graph algorithms have a common pattern on graph traversal.
read more
Abstract: Due to massive thread-level parallelism, GPUs have become an attractive platform for accelerating large-scale data parallel computations, such as graph processing. However, achieving high performance for graph processing with GPUs is non-trivial. Processing graphs on GPUs introduces several problems, such as load imbalance, low utilization of hardware unit, and memory divergence. Although previous work has proposed several software strategies to optimize graph processing on GPUs, there are several issues beyond the capability of software techniques to address. In this article, we present GraphPEG, a graph processing engine for efficient graph processing on GPUs. Inspired by the observation that many graph algorithms have a common pattern on graph traversal, GraphPEG improves the performance of graph processing by coupling automatic edge gathering with fine-grain work distribution. GraphPEG can also adapt to various input graph datasets and simplify the software design of graph processing with hardware-assisted graph traversal. Simulation results show that, in comparison with two representative highly efficient GPU graph processing software framework Gunrock and SEP-Graph, GraphPEG improves graph processing throughput by 2.8× and 2.5× on average, and up to 7.3× and 7.0× for six graph algorithm benchmarks on six graph datasets, with marginal hardware cost.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Software/Hardware Co-design of 3D NoC-based GPU Architectures for Accelerated Graph Computations
Dwaipayan Choudhury,Reet Barik,Aravind Sukumaran Rajam,Ananth Kalyanaraman,And Partha Pratim Pande +4 more
TL;DR: This paper proposes design of a small-world NoC (SWNoC)-enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors and the memory controllers follow a power-law distribution, and proposes a software/hardware co-design framework for accelerating graph computations.
Triangle Dropping: An Occluded-geometry Predictor for Energy-efficient Mobile GPUs
David Corbalán-Navarro,Juan L. Aragón,Marti Anglada,Joan M. Parcerisa,Antonio González +4 more
TL;DR: A novel micro-architecture approach for mobile GPUs aimed at early removing the occluded geometry in a scene by leveraging frame-to-frame coherence, thus reducing the overall energy consumption and speedup is proposed.
Analyzing GCN Aggregation on GPU
TL;DR: In this paper , the performance of graph convolutional neural networks (GCN) aggregation kernels is investigated on real GPU hardware and a cycle-accurate GPU simulator, and the performance can be significantly influenced by kernel design approaches and feature density.
1
Analyzing GCN Aggregation on GPU
01 Jan 2022
TL;DR: In this paper , the performance of graph convolutional neural networks (GCN) aggregation kernels is investigated on real GPU hardware and a cycle-accurate GPU simulator, and the performance can be significantly influenced by kernel design approaches and feature density.
1
GPU-Accelerated Batch-Dynamic Subgraph Matching
Linshan Qiu,Lu Chen,Hailiang Jie,Xiangyu Ke,Yunjun Gao,Yang Liu,Zetao Zhang +6 more
- 13 May 2024
References
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
- 06 Jun 2010
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Analyzing CUDA workloads using a detailed GPU simulator
Ali Bakhoda,George L. Yuan,Wilson W. L. Fung,Henry Wong,Tor M. Aamodt +4 more
- 26 Apr 2009
TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.
A scalable processing-in-memory accelerator for parallel graph processing
Junwhan Ahn,Sungpack Hong,Sungjoo Yoo,Onur Mutlu,Kiyoung Choi +4 more
- 13 Jun 2015
TL;DR: This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.
Accelerating large graph algorithms on the GPU using CUDA
Pawan Harish,P. J. Narayanan +1 more
- 18 Dec 2007
TL;DR: This work presents a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs using the G80 line of Nvidia GPUs.
A lightweight infrastructure for graph analytics
Donald Nguyen,Andrew Lenharth,Keshav Pingali +2 more
- 03 Nov 2013
TL;DR: This paper argues that existing DSLs can be implemented on top of a general-purpose infrastructure that supports very fine-grain tasks, implements autonomous, speculative execution of these tasks, and allows application-specific control of task scheduling policies.
635
Related Papers (5)
Shuai Che
- 01 Sep 2014
Robest Kessl,Nilothpal Talukder,Pranay Anchuri,Mohammed J. Zaki +3 more
- 24 Aug 2014
Jianlong Zhong,Bingsheng He +1 more
- 02 Dec 2013