Proceedings Article10.1145/3404397.3404413
Efficient Block Algorithms for Parallel Sparse Triangular Solve
Zhengyang Lu,Yuyao Niu,Weifeng Liu +2 more
- 17 Aug 2020
21
TL;DR: This paper implements three block algorithms for parallel SpTRSV on modern GPUs, and proposes an adaptive approach that can automatically select the best kernels according to input sparsity structures, which is highly efficient for multiple right-hand sides and iterative scenarios.
read more
Abstract: The sparse triangular solve (SpTRSV) kernel is an important building block for a number of linear algebra routines such as sparse direct and iterative solvers. The major challenge of accelerating SpTRSV lies in the difficulties of finding higher parallelism. Existing work mainly focuses on reducing dependencies and synchronizations in the level-set methods. However, the 2D block layout of the input matrix has been largely ignored in designing more efficient SpTRSV algorithms. In this paper, we implement three block algorithms, i.e., column block, row block and recursive block algorithms, for parallel SpTRSV on modern GPUs, and propose an adaptive approach that can automatically select the best kernels according to input sparsity structures. By testing 159 sparse matrices on two high-end NVIDIA GPUs, the experimental results demonstrate that the recursive block algorithm has the best performance among the three block algorithms, and it is on average 4.72x (up to 72.03x) and 9.95x (up to 61.08x) faster than cuSPARSE v2 and Sync-free methods, respectively. Besides, our method merely needs moderate cost for preprocessing the input matrix, thus is highly efficient for multiple right-hand sides and iterative scenarios.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs
Yuyao Niu,Zhengyang Lu,Meichen Dong,Zhou Jin,Weifeng Liu,Guangming Tan +5 more
- 17 May 2021
TL;DR: In this paper, an efficient tiled algorithm called TileSpMV was proposed for optimizing SpMV on GPUs through exploiting 2D spatial sparsity structure of sparse matrices.
52
TileSpGEMM
Yuyao Niu,Zhengyang Lu,Hao Ji,Shuhui Song,Zhou Jin,Weifeng Liu +5 more
- 28 Mar 2022
TL;DR: The existing parallel approaches for shared memory SpGEMM mostly use the row-row style with possibly good parallelism as discussed by the authors , however, because of the irregularity in sparsity structures, the existing rowrow methods often suffer from three problems: load imbalance, high global space complexity and unsatisfactory data locality, and sparse accumulator selection.
33
CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs
Jiya Su,Feng Zhang,Weifeng Liu,Bingsheng He,Ruofan Wu,Xiaoyong Du,Rujia Wang +6 more
- 17 Aug 2020
TL;DR: CapelliniSp TRSV is proposed, a thread-level synchronization-free SpTRSV algorithm that can achieve very good performance on the most popular sparse matrix storage, compressed sparse row (CSR) format, and thus users do not need to conduct format conversion.
15
swSuperLU: A highly scalable sparse direct solver on Sunway manycore architecture
TL;DR: This work presents swSuperLU, a highly scalable sparse direct solver on Sunway manycore architecture based on sparse LU factorization, and introduces the hierarchical scheme to exploit the hierarchy of Sunwaymanycore architecture in process-level parallelism between MPEs and thread-level Parallelism between the CPE arrays.
8
A Split Execution Model for SpTRSV
TL;DR: A heuristics-based approach is used to automatically determine the suitability of an SpTRSV for split-execution, find the appropriate split-point, and execute Sp TRSV in a split fashion using two SpTRsV algorithms while managing any required inter-platform communication.
8
References
•Book
Iterative Methods for Sparse Linear Systems
Yousef Saad
- 01 Apr 2003
TL;DR: This chapter discusses methods related to the normal equations of linear algebra, and some of the techniques used in this chapter were derived from previous chapters of this book.
The university of Florida sparse matrix collection
Timothy A. Davis,Yifan Hu +1 more
TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.
4.3K
Direct methods for sparse matrices
TL;DR: This book aims to be suitable also for a student course, probably at MSc level, and the subject is intensely practical and this book is written with practicalities ever in mind.
2K
Direct Methods for Sparse Linear Systems
Timothy A. Davis
- 01 Jan 2006
TL;DR: Direct methods for sparse linear systems cover various algorithms and techniques for solving sparse systems efficiently. These methods include basic algorithms, solving triangular systems, Cholesky factorization, orthogonal methods, LU factorization, fill-reducing orderings, and CSparse library usage.
1.3K
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects
Emmanuel Agullo,James Demmel,Jack Dongarra,Bilel Hadri,Jakub Kurzak,Julien Langou,Hatem Ltaief,Piotr Luszczek,Stanimire Tomov +8 more
- 01 Jul 2009
TL;DR: A comparative study of PLASMA's performance against established linear algebra packages and some preliminary results of MAGMA on hybrid multi-core and GPU systems is presented.
512