Book Chapter10.1007/978-3-319-43659-3_45
A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves
Weifeng Liu,Ang Li,JD Hogg,Iain S. Duff,Brian Vinter +4 more
- 24 Aug 2016
- pp 617-630
96
TL;DR: This paper proposes a novel approach for SpTRSV in which the ordering between components is naturally enforced within the solution stage, and is an order of magnitude faster for the preprocessing stage than existing methods.
read more
Abstract: The sparse triangular solve kernel, SpTRSV, is an important building block for a number of numerical linear algebra routines. Parallelizing SpTRSV on today's manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, most existing work introduces a preprocessing stage to partition the components into a group of level-sets or colour-sets so that components within a set are independent and can be processed simultaneously during the subsequent solution stage. However, this class of methods requires a long preprocessing time as well as significant runtime synchronization overhead between the sets. To address this, we propose in this paper a novel approach for SpTRSV in which the ordering between components is naturally enforced within the solution stage. In this way, the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated. A comparison with the state-of-the-art library supplied by the GPU vendor, using 11 sparse matrices on the latest GPU device, show that our approach obtains an average speedup of 2.3 times in single precision and 2.14 times in double precision. The maximum speedups are 5.95 and 3.65, respectively. In addition, our method is an order of magnitude faster for the preprocessing stage than existing methods.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
TL;DR: In this article, the authors conduct a thorough evaluation on five latest types of modern GPU interconnects: PCIe, NVLink-V1, NVlink-V2, NVSwitch, and NVLinkSLI.
184
swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures
Xinliang Wang,Weifeng Liu,Wei Xue,Li Wu +3 more
- 10 Feb 2018
TL;DR: A novel data layout called Sparse Level Tile is proposed to make all data reuse under control, and a Producer-Consumer pairing method is designed to make any inter-level synchronization only happen in very fast register communication.
Fast segmented sort on GPUs
Kaixi Hou,Weifeng Liu,Hao Wang,Wu-chun Feng +3 more
- 14 Jun 2017
TL;DR: This paper presents an adaptive segmented sort mechanism on GPUs that shows great improvements over the methods from CUB, CUSP and ModernGPU on NVIDIA K80-Kepler and TitanX-Pascal GPUs and applies it on two applications, i.e., suffix array construction and sparse matrix-matrix multiplication, and obtains obvious gains over state-of-the-art implementations.
68
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
Ang Li,Shuaiwen Leon Song,Jieyang Chen,Xu Liu,Nathan R. Tallent,Kevin J. Barker +5 more
- 01 Sep 2018
TL;DR: Evaluation results show that, unless the current CPU-GPU master-slave programming model can be replaced, it is difficult for scale-up multi-GPU applications to really benefit from faster intra-node interconnects such as NVLinks; while for inter-node scale-out applications, although interconnect is more crucial to the overall performance, GPUDirect-RDMA appears to be not always the optimal choice.
67
Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides
TL;DR: Novel approaches for SpTRSV and SpTRSM in which the ordering between components is naturally enforced within the solution stage are proposed, so the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated.
54
References
•Book
Iterative Methods for Sparse Linear Systems
Yousef Saad
- 01 Apr 2003
TL;DR: This chapter discusses methods related to the normal equations of linear algebra, and some of the techniques used in this chapter were derived from previous chapters of this book.
The university of Florida sparse matrix collection
Timothy A. Davis,Yifan Hu +1 more
TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.
4.3K
•Book
Numerical Methods for Least Squares Problems
Åke Björck
- 01 Apr 1996
TL;DR: Theorems and statistical properties of least squares solutions are explained and basic numerical methods for solving least squares problems are described.
3.6K
Direct methods for sparse matrices
TL;DR: This book aims to be suitable also for a student course, probably at MSc level, and the subject is intensely practical and this book is written with practicalities ever in mind.
2K
Direct Methods for Sparse Linear Systems
Timothy A. Davis
- 01 Jan 2006
TL;DR: Direct methods for sparse linear systems cover various algorithms and techniques for solving sparse systems efficiently. These methods include basic algorithms, solving triangular systems, Cholesky factorization, orthogonal methods, LU factorization, fill-reducing orderings, and CSparse library usage.
1.3K