A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

doi:10.1007/978-3-319-43659-3_45

Book Chapter10.1007/978-3-319-43659-3_45

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

Weifeng Liu, +4 more

- 24 Aug 2016

- pp 617-630

96

TL;DR: This paper proposes a novel approach for SpTRSV in which the ordering between components is naturally enforced within the solution stage, and is an order of magnitude faster for the preprocessing stage than existing methods.

Abstract: The sparse triangular solve kernel, SpTRSV, is an important building block for a number of numerical linear algebra routines. Parallelizing SpTRSV on today's manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, most existing work introduces a preprocessing stage to partition the components into a group of level-sets or colour-sets so that components within a set are independent and can be processed simultaneously during the subsequent solution stage. However, this class of methods requires a long preprocessing time as well as significant runtime synchronization overhead between the sets. To address this, we propose in this paper a novel approach for SpTRSV in which the ordering between components is naturally enforced within the solution stage. In this way, the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated. A comparison with the state-of-the-art library supplied by the GPU vendor, using 11 sparse matrices on the latest GPU device, show that our approach obtains an average speedup of 2.3 times in single precision and 2.14 times in double precision. The maximum speedups are 5.95 and 3.65, respectively. In addition, our method is an order of magnitude faster for the preprocessing stage than existing methods.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/TPDS.2019.2928289

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Ang Li, +6 more

- 11 Mar 2019

- arXiv: Hardware Architecture

TL;DR: In this article, the authors conduct a thorough evaluation on five latest types of modern GPU interconnects: PCIe, NVLink-V1, NVlink-V2, NVSwitch, and NVLinkSLI.

...read moreread less

184

•Proceedings Article•10.1145/3178487.3178513

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures

Xinliang Wang, +3 more

- 10 Feb 2018

TL;DR: A novel data layout called Sparse Level Tile is proposed to make all data reuse under control, and a Producer-Consumer pairing method is designed to make any inter-level synchronization only happen in very fast register communication.

...read moreread less

84

•Proceedings Article•10.1145/3079079.3079105

Fast segmented sort on GPUs

Kaixi Hou, +3 more

- 14 Jun 2017

TL;DR: This paper presents an adaptive segmented sort mechanism on GPUs that shows great improvements over the methods from CUB, CUSP and ModernGPU on NVIDIA K80-Kepler and TitanX-Pascal GPUs and applies it on two applications, i.e., suffix array construction and sparse matrix-matrix multiplication, and obtains obvious gains over state-of-the-art implementations.

...read moreread less

68

Proceedings Article•10.1109/IISWC.2018.8573483

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite

Ang Li, +5 more

- 01 Sep 2018

TL;DR: Evaluation results show that, unless the current CPU-GPU master-slave programming model can be replaced, it is difficult for scale-up multi-GPU applications to really benefit from faster intra-node interconnects such as NVLinks; while for inter-node scale-out applications, although interconnect is more crucial to the overall performance, GPUDirect-RDMA appears to be not always the optimal choice.

...read moreread less

67

•Journal Article•10.1002/CPE.4244

Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides

Weifeng Liu, +6 more

- 10 Nov 2017

- Concurrency and Computation: Practice an...

TL;DR: Novel approaches for SpTRSV and SpTRSM in which the ordering between components is naturally enforced within the solution stage are proposed, so the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated.

...read moreread less

54

...

Expand

References

•Book

Iterative Methods for Sparse Linear Systems

Yousef Saad

- 01 Apr 2003

TL;DR: This chapter discusses methods related to the normal equations of linear algebra, and some of the techniques used in this chapter were derived from previous chapters of this book.

...read moreread less

16.1K

Journal Article•10.1145/2049662.2049663

The university of Florida sparse matrix collection

Timothy A. Davis, +1 more

- 07 Dec 2011

- ACM Transactions on Mathematical Softwar...

TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.

...read moreread less

4.3K

•Book

Numerical Methods for Least Squares Problems

Åke Björck

- 01 Apr 1996

TL;DR: Theorems and statistical properties of least squares solutions are explained and basic numerical methods for solving least squares problems are described.

...read moreread less

3.6K

Monograph•10.1093/ACPROF:OSO/9780198508380.001.0001

Direct methods for sparse matrices

Iain S. Duff, +2 more

- 01 Nov 1986

- Mathematics of Computation

TL;DR: This book aims to be suitable also for a student course, probably at MSc level, and the subject is intensely practical and this book is written with practicalities ever in mind.

...read moreread less

2K

Book•10.1137/1.9780898718881

Direct Methods for Sparse Linear Systems

Timothy A. Davis

- 01 Jan 2006

TL;DR: Direct methods for sparse linear systems cover various algorithms and techniques for solving sparse systems efficiently. These methods include basic algorithms, solving triangular systems, Cholesky factorization, orthogonal methods, LU factorization, fill-reducing orderings, and CSparse library usage.

...read moreread less

1.3K