Open AccessPosted Content
GraphCage: Cache Aware Graph Processing on GPUs.
TL;DR: GraphCage is a cache centric optimization framework for highly efficient graph processing on GPUs that can improve performance by 2 ~ 4x compared to hand optimized implementations and state-of-the-art frameworks with less memory consumption.
read more
Abstract: Efficient Graph processing is challenging because of the irregularity of graph algorithms. Using GPUs to accelerate irregular graph algorithms is even more difficult to be efficient, since GPU's highly structured SIMT architecture is not a natural fit for irregular applications. With lots of previous efforts spent on subtly mapping graph algorithms onto the GPU, the performance of graph processing on GPUs is still highly memory-latency bound, leading to low utilization of compute resources. Random memory accesses generated by the sparse graph data structure are the major causes of this significant memory access latency. Simply applying the conventional cache blocking technique proposed for matrix computation have limited benefit due to the significant overhead on the GPU. We propose GraphCage, a cache centric optimization framework for highly efficient graph processing on GPUs. We first present a throughput-oriented cache blocking scheme (TOCAB) in both push and pull directions. Comparing with conventional cache blocking which suffers repeated accesses when processing large graphs on GPUs, TOCAB is specifically optimized for the GPU architecture to reduce this overhead and improve memory access efficiency. To integrate our scheme into state-of-the-art implementations without significant overhead, we coordinate TOCAB with load balancing strategies by considering the sparsity of subgraphs. To enable cache blocking for traversal-based algorithms, we consider the benefit and overhead in different iterations with different working set sizes, and apply TOCAB for topology-driven kernels in pull direction. Evaluation shows that GraphCage can improve performance by 2 ~ 4x compared to hand optimized implementations and state-of-the-art frameworks (e.g. CuSha and Gunrock), with less memory consumption than CuSha.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An analysis of the graph processing landscape
TL;DR: The use-case of performing global computations over a graph, it is first ingested into a graph processing system from one of many digital representations, and can be done with single-machine systems (with varying approaches to hardware usage), distributed systems (either homogeneous or heterogeneous groups of machines) and systems dedicated to high-performance computing (HPC) as discussed by the authors.
An analysis of the graph processing landscape
Miguel E. Coimbra,Miguel E. Coimbra,Alexandre P. Francisco,Alexandre P. Francisco,Luís Veiga,Luís Veiga +5 more
TL;DR: A recent survey as mentioned in this paper provides an overview of different aspects of the graph processing landscape and describes classes of systems based on a set of dimensions, including paradigms to express graph processing, different types of systems to use, coordination and communication models in distributed graph processing and partitioning techniques.
GX-Plug: a Middleware for Plugging Accelerators to Distributed Graph Processing
01 May 2022
TL;DR: GX-plug as discussed by the authors is a middleware for large-scale graph processing, which integrates the merits of distributed graph processing and high-performance accelerators by plugging accelerators to distributed graph systems.
GX-Plug: a Middleware for Plugging Accelerators to Distributed Graph Processing
TL;DR: For improving the middleware performance, a series of techniques, including pipeline shuffle, synchro-nization caching and skipping, and workload balancing, are studied, for intra-, inter-, and beyond-iteration optimizations, respectively.
1
NodeFetch: High Performance Graph Processing using Processing in Memory
M. Mosayebi,Masoud Dehyadegari +1 more
- 01 Jan 2021
TL;DR: NestFetch, a new method to access nodes and their neighbors while processing a graph by adding a new command to HMC system is proposed, a way of dealing with large-scale graph processing, considering recent advances in the field.
1
References
•Proceedings Article
The PageRank Citation Ranking : Bringing Order to the Web
Lawrence Page,Sergey Brin,Rajeev Motwani,Terry Winograd +3 more
- 11 Nov 1999
TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
16.4K
The university of Florida sparse matrix collection
Timothy A. Davis,Yifan Hu +1 more
TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.
4.3K
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
- 06 Jun 2010
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
KONECT: the Koblenz network collection
Jérôme Kunegis
- 13 May 2013
TL;DR: KONECT's taxonomy of networks datasets is described, an overview of the datasets included, a review of the supported statistics and plots, and the project's role in the area of web science and network science are discussed.
1.7K
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Yucheng Low,Danny Bickson,Joseph E. Gonzalez,Carlos Guestrin,Aapo Kyrola,Joseph M. Hellerstein +5 more
- 01 Apr 2012
TL;DR: GraphLab as discussed by the authors extends the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees to reduce network congestion and mitigate the effect of network latency in the shared-memory setting.