GPU multisplit

doi:10.1145/2851141.2851169

Proceedings Article10.1145/2851141.2851169

GPU multisplit

Saman Ashkiani, +3 more

- 27 Feb 2016

pp 12

36

TL;DR: This work provides a parallel model and multiple implementations for the multisplit problem, and uses warp-synchronous programming models to avoid branch divergence and reduce memory usage, as well as hierarchical reordering of input elements to achieve better coalescing of global memory accesses.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/3108140

Gunrock: GPU Graph Analytics

Yangzihao Wang, +10 more

- 23 Aug 2017

TL;DR: The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries, and better performance than any other GPU high-level graph library.

...read moreread less

157

Proceedings Article•10.1109/IPDPS.2018.00052

A Dynamic Hash Table for the GPU

Saman Ashkiani, +2 more

- 21 May 2018

TL;DR: A warp-cooperative work sharing strategy that reduces branch divergence and provides an efficient alternative to the traditional way of per-thread (or per-warp) work assignment and processing is proposed, which builds a dynamic non-blocking concurrent linked list, the slab list, that supports asynchronous, concurrent updates as well as search queries.

...read moreread less

76

•Posted Content

Gunrock: GPU Graph Analytics

Yangzihao Wang, +10 more

- 04 Jan 2017

- arXiv: Distributed, Parallel, and Cluste...

TL;DR: Gunrock as discussed by the authors is a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier for large-scale graph analytics.

...read moreread less

61

•Journal Article•10.1145/3570638

Optimization Techniques for GPU Programming

Pieter Hijma, +4 more

- 14 Nov 2022

- ACM Computing Surveys

TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.

...read moreread less

54

•Proceedings Article•10.1145/3035918.3064043

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Elias Stehle, +1 more

- 03 Nov 2016

- arXiv: Databases

TL;DR: This work proposes a novel approach that almost halves the amount of memory transfers and, therefore, considerably lifts the memory bandwidth limitation, and builds on the efficient GPU sorting approach with a pipelined heterogeneous sorting algorithm that mitigates the overhead associated with PCIe data transfers.

...read moreread less

39

...

Expand

References

•Journal Article•10.1007/BF01386390

A note on two problems in connexion with graphs

Edsger W. Dijkstra

- 01 Dec 1959

- Numerische Mathematik

TL;DR: A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given.

...read moreread less

25K

Journal Article•10.1145/2049662.2049663

The university of Florida sparse matrix collection

Timothy A. Davis, +1 more

- 07 Dec 2011

- ACM Transactions on Mathematical Softwar...

TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.

...read moreread less

4.3K

•Book

Digraphs Theory Algorithms And Applications

Jrgen Bang-Jensen, +1 more

- 05 Aug 2002

TL;DR: Digraphs is an essential, comprehensive reference for undergraduate and graduate students, and researchers in mathematics, operations research and computer science, and it will also prove invaluable to specialists in related areas, such as meteorology, physics and computational biology.

...read moreread less

2.4K

Proceedings Article•10.1145/1401132.1401152

Scalable parallel programming with CUDA

John R. Nickolls, +3 more

- 11 Aug 2008

TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.

...read moreread less

2.3K

Journal Article•10.1109/MM.2008.31

NVIDIA Tesla: A Unified Graphics and Computing Architecture

Erik Lindholm, +3 more

- 01 Mar 2008

- IEEE Micro

TL;DR: To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture, which is massively multithreaded and programmable in C or via graphics APIs.

...read moreread less

1.6K

...

Expand

GPU multisplit

Chat with Paper

AI Agents for this Paper

Citations

Gunrock: GPU Graph Analytics

A Dynamic Hash Table for the GPU

Gunrock: GPU Graph Analytics

Optimization Techniques for GPU Programming

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

References

A note on two problems in connexion with graphs

The university of Florida sparse matrix collection

Digraphs Theory Algorithms And Applications

Scalable parallel programming with CUDA

NVIDIA Tesla: A Unified Graphics and Computing Architecture

Related Papers (5)

Real-time parallel hashing on the GPU

A Dynamic Hash Table for the GPU

Gunrock: a high-performance graph processing library on the GPU

High performance dynamic lock-free hash tables and list-based sets

Performance Evaluation of Concurrent Lock-free Data Structures on GPUs