Sparse matrix-vector multiplication

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1145/2049662.2049663•

The university of Florida sparse matrix collection

[...]

Timothy A. Davis¹, Yifan Hu²•Institutions (2)

University of Florida¹, AT&T Labs²

07 Dec 2011-ACM Transactions on Mathematical Software

TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.

...read moreread less

Abstract: We describe the University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications The Collection is widely used by the numerical linear algebra community for the development and performance evaluation of sparse matrix algorithms It allows for robust and repeatable experiments: robust because performance results with artificially generated matrices can be misleading, and repeatable because matrices are curated and made publicly available in many formats Its matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that typically do not have such geometry (optimization, circuit simulation, economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, power networks, and other networks and graphs) We provide software for accessing and managing the Collection, from MATLAB™, Mathematica™, Fortran, and C, as well as an online search capability Graph visualization of the matrices is provided, and a new multilevel coarsening scheme is proposed to facilitate this task

...read moreread less

4,397 citations

Proceedings Article•10.1145/1654059.1654078•

Implementing sparse matrix-vector multiplication on throughput-oriented processors

[...]

Nathan Bell¹, Michael Garland¹•Institutions (1)

Nvidia¹

14 Nov 2009

TL;DR: This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices.

...read moreread less

Abstract: Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns. We explore SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes. The techniques we propose are efficient, successfully utilizing large percentages of peak bandwidth. Furthermore, they deliver excellent total throughput, averaging 16 GFLOP/s and 10 GFLOP/s in double precision for structured grid and unstructured mesh matrices, respectively, on a GeForce GTX 285. This is roughly 2.8 times the throughput previously achieved on Cell BE and more than 10 times that of a quad-core Intel Clovertown system.

...read moreread less

1,030 citations

Journal Article•10.1109/71.780863•

Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication

[...]

Ümit V. Çatalyürek¹, Cevdet Aykanat¹•Institutions (1)

Bilkent University¹

01 Jul 1999-IEEE Transactions on Parallel and Distributed Systems

TL;DR: It is shown that the standard graph-partitioning-based decomposition of sparse matrices does not reflect the actual communication volume requirement for parallel matrix-vector multiplication, and two computational hypergraph models are proposed which avoid this crucial deficiency of the graph model.

...read moreread less

Abstract: In this work, we show that the standard graph-partitioning-based decomposition of sparse matrices does not reflect the actual communication volume requirement for parallel matrix-vector multiplication. We propose two computational hypergraph models which avoid this crucial deficiency of the graph model. The proposed models reduce the decomposition problem to the well-known hypergraph partitioning problem. The recently proposed successful multilevel framework is exploited to develop a multilevel hypergraph partitioning tool PaToH for the experimental verification of our proposed hypergraph models. Experimental results on a wide range of realistic sparse test matrices confirm the validity of the proposed hypergraph models. In the decomposition of the test matrices, the hypergraph models using PaToH and hMeTiS result in up to 63 percent less communication volume (30 to 38 percent less on the average) than the graph model using MeTiS, while PaToH is only 1.3-2.3 times slower than MeTiS on the average.

...read moreread less

648 citations

Proceedings Article•10.1145/1362622.1362674•

Optimization of sparse matrix-vector multiplication on emerging multicore platforms

[...]

Samuel Williams¹, Leonid Oliker², Richard Vuduc³, John Shalf², Katherine Yelick¹, James Demmel¹ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, Lawrence Berkeley National Laboratory², Lawrence Livermore National Laboratory³

10 Nov 2007

TL;DR: In this article, the authors examine sparse matrix-vector multiply (SpMV) kernels across a broad spectrum of multicore designs and present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations.

...read moreread less

Abstract: We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientific study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.

...read moreread less

463 citations

Proceedings Article•10.1145/1583991.1584053•

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

[...]

Aydin Buluc¹, Jeremy T. Fineman², Matteo Frigo, John R. Gilbert¹, Charles E. Leiserson² - Show less +1 more•Institutions (2)

University of California, Santa Barbara¹, Massachusetts Institute of Technology²

11 Aug 2009

TL;DR: In this article, a storage format for sparse matrices, called compressed sparse blocks (CSB), is introduced, which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector.

...read moreread less

Abstract: This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector. Our algorithms use Θ(nnz) work (serial running time) and Θ(√nlgn) span (critical-path length), yielding a parallelism of Θ(nnz/√nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is the same as that for the more-standard compressed-sparse-rows (CSR) format, for which computing Ax in parallel is easy but A,x is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A,x run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by off-chip memory bandwidth.

...read moreread less

458 citations

...

Expand

Year	Papers
2021	17
2020	30
2019	22
2018	28
2017	24
2016	42

Topic Tools

Papers published on a yearly basis

Papers

The university of Florida sparse matrix collection

Implementing sparse matrix-vector multiplication on throughput-oriented processors

Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication

Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Related Topics (5)

Performance Metrics