Journal Article
ByteGNN: Efficient Graph Neural Network Training at Large Scale
Che Zheng,Hongzhi Chen,Yuxuan Cheng,Zhezheng Song,Yifan Wu,Changji,Li,James Cheng,Huanming Yang,Shuai Zhang +9 more
- Vol. 15, pp 1228-1242
46
TL;DR:
read more
Abstract: Graph neural networks (GNNs) have shown excellent performance in a wide range of applications such as recommendation, risk control, and drug discovery. With the increase in the volume of graph data, distributed GNN systems become essential to support efficient GNN training. However, existing distributed GNN training systems suffer from various performance issues including high network communication cost, low CPU utilization, and poor end-to-end performance. In this paper, we propose ByteGNN, which addresses the limitations in existing distributed GNN systems with three key designs: (1) an abstraction of mini-batch graph sampling to support high parallelism, (2) a two-level scheduling strategy to improve resource utilization and to reduce the end-to-end GNN training time, and (3) a graph partitioning algorithm tailored for GNN workloads. Our experiments show that ByteGNN outperforms the state-of-the-art distributed GNN systems with up to 3.5-23.8 times faster end-to-end execution, 2-6 times higher CPU utilization, and around half of the network communication cost.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
DSP: Efficient GNN Training with Multiple GPUs
Zhenkun Cai,Qihui Zhou,Xiao Yan,Da Zheng,Xiang Song,Chenguang Zheng,James Cheng,George Karypis +7 more
- 25 Feb 2023
TL;DR: Distributed sampling and pipelining (DSP) as mentioned in this paper adopts a tailored data layout to utilize the fast NVLink connections among the GPUs, which stores the graph topology and popular node features in GPU memory.
23
Distributed Graph Neural Network Training: A Survey
TL;DR: In this paper , a survey of the optimization techniques for distributed GNN training is presented, focusing on the three major challenges in distributed training that are massive feature communication, the loss of model accuracy and workload imbalance.
Datasets, tasks, and training methods for large-scale hypergraph learning
Sunwoo Kim,Dongjin Lee,Yul Kim,Jungho Park,Taeho Hwang,Kijung Shin +5 more
TL;DR: This work introduces two pair-level hypergraph-learning tasks to formulate a wide range of real-world problems and proposes PCL, a scalable learning method for hypergraph neural networks, to tackle scalability issues.
10
DGI: An Easy and Efficient Framework for GNN Model Evaluation
Peiqi Yin,Xiao Yan,Jinjing Zhou,Qiang Fu,Zhenkun Cai,James Cheng,Bo Tang,Minjie Wang +7 more
- 04 Aug 2023
TL;DR: DGI is presented, which automatically translates the training code of a GNN model for layer-wise evaluation to minimize user effort and consistently outperforms node-wise Evaluation across different datasets and hardware settings, and the speedup can be over 1,000x.
9
HGL: Accelerating Heterogeneous GNN Training with Holistic Representation and Optimization
TL;DR: In this paper , the authors proposed a heterogeneity-aware system for GNN training, HGL, which provides a holistic representation of GNNs and enables cross-relation optimization in HetGNN training.
7
References
•Posted Content
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.
25.9K
•Posted Content
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf,Max Welling +1 more
TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.
22.7K
•Posted Content
TensorFlow: A system for large-scale machine learning
Martín Abadi,Paul Barham,Jianmin Chen,Zhifeng Chen,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Geoffrey Irving,Michael Isard,Manjunath Kudlur,Josh Levenberg,Rajat Monga,Sherry Moore,Derek G. Murray,Benoit Steiner,Paul A. Tucker,Vijay K. Vasudevan,Pete Warden,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +21 more
TL;DR: The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated.
Graph Attention Networks
Petar Veličković,Guillem Cucurull,Arantxa Casanova,Adriana Romero,Pietro Liò,Yoshua Bengio +5 more
- 15 Feb 2018
TL;DR: Graph Attention Networks (GATs) as mentioned in this paper leverage masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations.
•Posted Content
Inductive Representation Learning on Large Graphs
TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
11.9K