1. What are the contributions in "Scalable simd-efficient graph processing on gpus" ?
In this paper the authors develop techniques that greatly enhance the performance and scalability of vertexcentric graph processing on GPUs.. First, the authors present Warp Segmentation, a novel method that greatly enhances GPU device utilization by dynamically assigning appropriate number of SIMD threads to process a vertex with irregular-sized neighbors while employing compact CSR representation to maximize the graph size that can be kept inside the GPU global memory.. Second, the authors further scale graph processing to make use of multiple GPUs while proposing Vertex Refinement to address the challenge of judiciously using the limited bandwidth available for transferring data between GPUs via the PCIe bus.
read more
2. How many vertex values will be held by each GPU?
In addition to CSR representation buffers, each GPU will hold one Outbox buffer that is filled with updated vertex indices and vertex values of the GPU-specific division.
read more
3. How do the authors avoid contention over the atomic variable?
The authors avoid the contention over the atomic variable by mainly relying on binary prefix sum for vertex refinement and involving onlyone warp lane in the outbox region reservation process.
read more
4. What is the effect of adding more GPUs on the processing time of graphs?
In addition, higher density in larger graphs signifies the reduction in the processing time when scaling to multiple GPUs by downsizing inter-device vertex transfer volumes.
read more





