What is the current standard storage format for sparse matrices in scientific computing?

The current standard storage format for sparse matrices in scientific computing, compressed sparse rows (CSR) [32], is more efficient, because it stores only n + nnz indices or pointers.

What is the format for storing the nonzeros of each matrix row?

The compressed sparse row (CSR) format stores the nonzeros (and ideally only the nonzeros) of each matrix row in consecutive memory locations, and it stores an index to the first stored element of each row.

How do you get the mean/max values for CSB?

For CSB, the reported mean/max values are obtained by setting the block dimension β to be approximately √ n, so that they are comparable with statistics from CSC.

What is the CSB constructor considered to have balanced blockrows?

In other words, if max(nnz(Ai)) < 2 ·mean(nnz(Ai)), then the matrix is considered to have balanced blockrows and the optimization is applied.

What is the CSB constructor's order of the bitmasks?

The bitmasks are determined dynamically by the CSB constructor depending on the input matrix and the data type used for storing matrix indices.

What is the cost of converting to and from bit-interleaved integers?

Converting to and from bit-interleaved integers, however, is expensive with current hardware support,6 which would be necessary for the serial base case in lines 29–32.

What is the level of parallelization required to avoid races?

This level of parallelization requires care to avoid races, however, because two blocks in the same blockrow write to the same region within the output vector.

What is the order of the indices in the val array?

These indices are relative to the block containing the particular element, not the entire matrix, and hence they range from 0 to β−1.

What is the space of a work-stealing scheduler?

Although not all work-stealing schedulers are space efficient, those maintaining the busy-leaves property [5] (e.g., as used in the Cilk work-stealing scheduler [4]) are space efficient.

Open AccessProceedings Article10.1145/1583991.1584053

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Q: What have the authors contributed in "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks" ?

This paper introduces a storage format for sparse matrices, called compressed sparse blocks ( CSB ), which allows both Ax and ATx to be computed efficiently in parallel, where A is an n× n sparse matrix with nnz ≥ n nonzeros and x is a dense n-vector.

Q: How many bits are required for each element in val?

For each element in val, the authors use lgβ bits to represent the row index and lgβ bits to represent the column index, requiring a total of nnz lgβ bits for each of row_ind and col_ind.

Q: How many nonzeros can be distributed in parallel?

If the nonzeros were guaranteed to be distributed evenly among block rows, then the simple blockrow parallelism would yield an efficient algorithm with n/β-way parallelism by simply performing a serial multiplication for each blockrow.

Q: What is the Z-Morton ordering on nonzeros in each block?

The Z-Morton ordering on nonzeros in each block is equivalent to first interleaving the bits of row_ind and col_ind, and then sorting the nonzeros using these bit-interleaved values as the keys.

Q: What is the CSB constructor's order of the bitmasks?

The bitmasks are determined dynamically by the CSB constructor depending on the input matrix and the data type used for storing matrix indices.

Q: What is the cost of converting to and from bit-interleaved integers?

Converting to and from bit-interleaved integers, however, is expensive with current hardware support,6 which would be necessary for the serial base case in lines 29–32.

Aydin Buluc, +4 more

- 11 Aug 2009

- pp 233-244

454

TL;DR: In this article, a storage format for sparse matrices, called compressed sparse blocks (CSB), is introduced, which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector.

Abstract: This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector. Our algorithms use Θ(nnz) work (serial running time) and Θ(√nlgn) span (critical-path length), yielding a parallelism of Θ(nnz/√nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is the same as that for the more-standard compressed-sparse-rows (CSR) format, for which computing Ax in parallel is easy but A,x is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A,x run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by off-chip memory bandwidth.

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Most frequently asked questions

1. What have the authors contributed in "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks" ?

This paper introduces a storage format for sparse matrices, called compressed sparse blocks ( CSB ), which allows both Ax and ATx to be computed efficiently in parallel, where A is an n× n sparse matrix with nnz ≥ n nonzeros and x is a dense n-vector.

2. How many bits are required for each element in val?

For each element in val, the authors use lgβ bits to represent the row index and lgβ bits to represent the column index, requiring a total of nnz lgβ bits for each of row_ind and col_ind.

3. How many nonzeros can be distributed in parallel?

If the nonzeros were guaranteed to be distributed evenly among block rows, then the simple blockrow parallelism would yield an efficient algorithm with n/β-way parallelism by simply performing a serial multiplication for each blockrow.

4. What is the Z-Morton ordering on nonzeros in each block?

The Z-Morton ordering on nonzeros in each block is equivalent to first interleaving the bits of row_ind and col_ind, and then sorting the nonzeros using these bit-interleaved values as the keys.

Figure 16: CSB_SPMV performance on Nehalem.

Figure 17: Serial performance comparison of SpMV for CSB and CSR.

Figure 18: Serial performance comparison of SpMV_T for CSB and CSR.

Figure 15: CSB_SPMV performance on Harpertown.

Figure 1: Average performance ofAx andATx operations on 13 different matrices from our benchmark test suite. CSB_ pMV and CSB_SpMV_T use compressed sparse blocks to performAx andATx, respectively. CSR_ SpMV (Serial) and CSR_SpMV_T (Serial) use OSKI [39] and compressed sparse rows without any matrix-specific optimizations. Star-P (y=Ax) and Star-P (y’=x’A) use Star-P [34], a parallel code based on CSR. The experiments were run on a ccNUMA architecture powered by AMD Opteron 8214 (Santa Rosa) processors.

Figure 14: Parallelism test for CSB_SPMV on Asic_320k obtained by artificially increasing the flops per byte. The test shows that e algorithm exhibits substantial parallelism and scales almost perfectly given sufficient memory bandwidth.

Citations

Journal Article•10.1177/1094342011403516

The Combinatorial BLAS: design, implementation, and applications

Aydin Buluc, +1 more

- 01 Nov 2011

TL;DR: The parallel Combinatorial BLAS is described, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications, and an extensible library interface and some guiding principles for future development are provided.

...read moreread less

481

•Journal Article•10.1145/3133901

The tensor algebra compiler

Fredrik Kjolstad, +4 more

- 12 Oct 2017

TL;DR: TACO as mentioned in this paper is a C++ library that automatically generates compound tensor algebra operations on dense and sparse tensors, which can be used in machine learning, data analytics, engineering and the physical sciences.

...read moreread less

387

Proceedings Article•10.1145/2751205.2751209

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Weifeng Liu, +1 more

- 08 Jun 2015

TL;DR: CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi, is proposed for real-world applications such as a solver with only tens of iterations because of its low-overhead for format conversion.

...read moreread less

305

•Proceedings Article•10.1145/2749469.2750397

Data reorganization in memory using 3D-stacked DRAM

Berkin Akin, +2 more

- 13 Jun 2015

TL;DR: A two pronged approach for efficient data reorganization is presented, which combines a proposed DRAM-aware reshape accelerator integrated within 3D-stacked DRAM, and a mathematical framework that is used to represent and optimize the reorganization operations.

...read moreread less

220

•Journal Article

ThunderSVM: A Fast SVM Library on GPUs and CPUs

Zeyi Wen, +4 more

- 01 Jan 2018

- Journal of Machine Learning Research

TL;DR: An efficient and open source SVM software toolkit called ThunderSVM which exploits the high-performance of Graphics Processing Units (GPUs) and multi-core CPUs and designs a convex optimization solver in a general way such that SVC, SVR, and one-class SVMs share the same solver for the ease of maintenance.

...read moreread less

215

...

Expand

References

•Book

Introduction to Algorithms

Thomas H. Cormen, +2 more

- 01 Jan 1990

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.

...read moreread less

24.8K

•Book

Iterative Methods for Sparse Linear Systems

Yousef Saad

- 01 Apr 2003

TL;DR: This chapter discusses methods related to the normal equations of linear algebra, and some of the techniques used in this chapter were derived from previous chapters of this book.

...read moreread less

16.1K

Journal Article•10.1007/BF02837777

Introduction to algorithms: 4. Turtle graphics

R. K. Shyamasundar

- 01 Sep 1996

- Resonance

TL;DR: In this article, a language similar to logo is used to draw geometric pictures using this language and programs are developed to draw geometrical pictures using it, which is similar to the one we use in this paper.

...read moreread less

15.4K

•Book

The C++ Programming Language

Bjarne Stroustrup

- 01 Jan 1985

TL;DR: Bjarne Stroustrup makes C even more accessible to those new to the language, while adding advanced information and techniques that even expert C programmers will find invaluable.

...read moreread less

8.1K

Journal Article•10.1145/2049662.2049663

The university of Florida sparse matrix collection

Timothy A. Davis, +1 more

- 07 Dec 2011

- ACM Transactions on Mathematical Softwar...

TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.

...read moreread less

4.3K

...

Expand

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What have the authors contributed in "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks" ?

2. How many bits are required for each element in val?

3. How many nonzeros can be distributed in parallel?

4. What is the Z-Morton ordering on nonzeros in each block?

5. What is the current standard storage format for sparse matrices in scientific computing?

6. What is the format for storing the nonzeros of each matrix row?

7. How do you get the mean/max values for CSB?

8. What is the CSB constructor considered to have balanced blockrows?

9. What is the CSB constructor's order of the bitmasks?

10. What is the cost of converting to and from bit-interleaved integers?

11. What is the level of parallelization required to avoid races?

12. What is the order of the indices in the val array?

13. What is the space of a work-stealing scheduler?

Figures

Citations

The Combinatorial BLAS: design, implementation, and applications

The tensor algebra compiler

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Data reorganization in memory using 3D-stacked DRAM

ThunderSVM: A Fast SVM Library on GPUs and CPUs

References

Introduction to Algorithms

Iterative Methods for Sparse Linear Systems

Introduction to algorithms: 4. Turtle graphics

The C++ Programming Language

The university of Florida sparse matrix collection

Related Papers (5)

The university of Florida sparse matrix collection

Implementing sparse matrix-vector multiplication on throughput-oriented processors

Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication

Iterative Methods for Sparse Linear Systems

Roofline: an insightful visual performance model for multicore architectures