1. What have the authors contributed in "Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks" ?
This paper introduces a storage format for sparse matrices, called compressed sparse blocks ( CSB ), which allows both Ax and ATx to be computed efficiently in parallel, where A is an n× n sparse matrix with nnz ≥ n nonzeros and x is a dense n-vector.
read more
2. How many bits are required for each element in val?
For each element in val, the authors use lgβ bits to represent the row index and lgβ bits to represent the column index, requiring a total of nnz lgβ bits for each of row_ind and col_ind.
read more
3. How many nonzeros can be distributed in parallel?
If the nonzeros were guaranteed to be distributed evenly among block rows, then the simple blockrow parallelism would yield an efficient algorithm with n/β-way parallelism by simply performing a serial multiplication for each blockrow.
read more
4. What is the Z-Morton ordering on nonzeros in each block?
The Z-Morton ordering on nonzeros in each block is equivalent to first interleaving the bits of row_ind and col_ind, and then sorting the nonzeros using these bit-interleaved values as the keys.
read more




![Figure 1: Average performance ofAx andATx operations on 13 different matrices from our benchmark test suite. CSB_ pMV and CSB_SpMV_T use compressed sparse blocks to performAx andATx, respectively. CSR_ SpMV (Serial) and CSR_SpMV_T (Serial) use OSKI [39] and compressed sparse rows without any matrix-specific optimizations. Star-P (y=Ax) and Star-P (y’=x’A) use Star-P [34], a parallel code based on CSR. The experiments were run on a ccNUMA architecture powered by AMD Opteron 8214 (Santa Rosa) processors.](/figures/figure-1-average-performance-ofax-andatx-operations-on-13-fprqos5d.png)
