An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs

doi:10.1145/2597652.2597678

Proceedings Article10.1145/2597652.2597678

An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs

Arash Ashari, +3 more

- 10 Jun 2014

- pp 273-282

74

TL;DR: A new blocked row-column (BRC) storage format with a novel two-dimensional blocking mechanism that effectively addresses the challenges: it reduces thread divergence by reordering and grouping rows of the input matrix with nearly equal number of non-zero elements onto the same execution units (i.e., warps).

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/2751205.2751209

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Weifeng Liu, +1 more

- 08 Jun 2015

TL;DR: CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi, is proposed for real-world applications such as a solver with only tens of iterations because of its low-overhead for format conversion.

...read moreread less

305

Proceedings Article•10.1109/SC.2014.69

Fast sparse matrix-vector multiplication on GPUs for graph applications

Arash Ashari, +4 more

- 16 Nov 2014

TL;DR: ACSR is presented, an adaptive SpMV algorithm that uses the standard CSR format but reduces thread divergence by combining rows into groups which have a similar number of non-zero elements, and thus avoids significant preprocessing overheads.

...read moreread less

174

•Posted Content

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Weifeng Liu, +1 more

- 17 Mar 2015

- arXiv: Mathematical Software

TL;DR: In this article, the authors proposed CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi.

...read moreread less

154

Proceedings Article•10.1145/2751205.2751244

Automatic Selection of Sparse Matrix Representation on GPUs

Naser Sedaghati, +4 more

- 08 Jun 2015

TL;DR: This paper performs extensive characterization of pertinent sparsity features of around 700 sparse matrices and their SpMV performance with a number of sparse representations implemented in the NVIDIA CUSP and cuSPARSE libraries, and builds a decision model using machine learning to automatically select the best representation to use for a given sparse matrix on a given target platform.

...read moreread less

143

Journal Article•10.1109/TPDS.2015.2401575

Evaluation Criteria for Sparse Matrix Storage Formats

Daniel Langr, +1 more

- 01 Feb 2016

- IEEE Transactions on Parallel and Distri...

TL;DR: Ten evaluation criteria for sparse matrix storage formats for sparse matrices are established, their advantages and disadvantages are discussed, and general suggestions for format authors/evaluators are provided to make their work more valuable for the HPC community.

...read moreread less

141

...

Expand

References

Proceedings Article•10.1145/1401132.1401152

Scalable parallel programming with CUDA

John R. Nickolls, +3 more

- 11 Aug 2008

TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.

...read moreread less

2.3K

•Journal Article•10.1145/1365490.1365500

Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?

John R. Nickolls, +3 more

- 01 Mar 2008

- ACM Queue

TL;DR: In this article, the authors present a framework to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism on manycore GPUs with widely varying numbers of cores.

...read moreread less

1.4K

•Proceedings Article•10.1145/1654059.1654078

Implementing sparse matrix-vector multiplication on throughput-oriented processors

Nathan Bell, +1 more

- 14 Nov 2009

TL;DR: This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices.

...read moreread less

1K

SPARSKIT: A basic tool kit for sparse matrix computations

Youcef Saad

- 21 May 1990

TL;DR: The main features of a tool package for manipulating and working with sparse matrices, to provide basic tools to facilitate the exchange of software and data between researchers in sparse matrix computations, are presented.

...read moreread less

805

•Proceedings Article•10.5555/1280094.1280110

Scan primitives for GPU computing

Shubhabrata Sengupta, +3 more

- 04 Aug 2007

TL;DR: Using the scan primitives, this work shows novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyzes the performance of the scanPrimitives, several sort algorithms that use the scan Primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.

...read moreread less

655

...

Expand

An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs

Chat with Paper

AI Agents for this Paper

Citations

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Fast sparse matrix-vector multiplication on GPUs for graph applications

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Automatic Selection of Sparse Matrix Representation on GPUs

Evaluation Criteria for Sparse Matrix Storage Formats

References

Scalable parallel programming with CUDA

Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?

Implementing sparse matrix-vector multiplication on throughput-oriented processors

SPARSKIT: A basic tool kit for sparse matrix computations

Scan primitives for GPU computing

Related Papers (5)

yaSpMV: yet another SpMV framework on GPUs

The university of Florida sparse matrix collection

Implementing sparse matrix-vector multiplication on throughput-oriented processors

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Fast sparse matrix-vector multiplication on GPUs for graph applications