Open Access
A Cache-Efficient Sorting Algorithm for Database and Data Mining Computations using Graphics Processors
Naga K. Govindaraju,Nikunj Raghuvanshi,Michael Henson,David Tuft,Dinesh Manocha +4 more
- 22 Jun 2005
47
TL;DR: A fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications and has a memoryefficient data access pattern and an efficient instruction dispatch mechanism to improve the overall sorting performance.
read more
Abstract: We present a fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications. Our algorithm uses texture mapping and blending functionalities of GPUs to implement an efficient bitonic sorting network. We take into account the communication bandwidth overhead to the video memory on the GPUs and reduce the memory bandwidth requirements. We also present strategies to exploit the tile-based computational model of GPUs. Our new algorithm has a memoryefficient data access pattern and we describe an efficient instruction dispatch mechanism to improve the overall sorting performance. We have used our sorting algorithm to accelerate join-based queries and stream mining algorithms. Our results indicate up to an order of magnitude improvement over prior CPU-based and GPU-based sorting algorithms
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Glift: Generic, efficient, random-access GPU data structures
TL;DR: Glift, an abstraction and generic template library for defining complex, random-access graphics processor (GPU) data structures, is presented and several new GPU data structures are characterized and implemented using reusable Glift components.
GPU merge path: a GPU merging algorithm
Oded Green,Robert McColl,David A. Bader +2 more
- 25 Jun 2012
TL;DR: An algorithm that partitions the workload equally amongst the GPU Streaming Multi-processors (SM) and shows how each SM performs a parallel merge and how to divide the work so that all the GPU's Streaming Processors (SP) are utilized.
146
•Proceedings Article
Efficient stream reduction on the GPU
David Roger,Ulf Assarsson,Nicolas Holzschuch +2 more
- 04 Oct 2007
TL;DR: This paper presents a new efficient algorithm for stream reduction on the GPU that works by splitting the input stream into smaller components of a fixed size, on which it run a standard stream reduction pass with line drawing.
A Novel Sorting Algorithm for Many-core Architectures Based on Adaptive Bitonic Sort
Hagen Peters,Ole Schulz-Hildebrandt,Norbert Luttenberger +2 more
- 21 May 2012
TL;DR: This article presents a novel optimal sorting algorithm that is based on an approach similar to adaptive bitonic sort that does not use bitonic trees but uses the input array together with some additional information and turns out to be the fastest comparison-based sorting algorithm for GPUs found in literature.
41
Fast in-place, comparison-based sorting with CUDA: a study with bitonic sort
TL;DR: This work assigned compare/exchange operations to threads in a way that decreases low‐performance global‐memory access and makes efficient use of high‐performance shared memory, which greatly increases the performance of this in‐place, comparison‐based sorting algorithm.
37
References
Introduction to Algorithms
Xin-She Yang
- 01 Jan 2014
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
8.3K
Sorting networks and their applications
Kenneth E. Batcher
- 30 Apr 1968
TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.
•Book
Sorting and Searching
Donald E. Knuth
- 01 Jan 1973
TL;DR: The first revision of this third volume is a survey of classical computer techniques for sorting and searching that extends the treatment of data structures to consider both large and small databases and internal and external memories.
1.7K
•Proceedings Article
Weaving Relations for Cache Performance
Anastassia Ailamaki,David J. DeWitt,Mark D. Hill,Marios Skounakis +3 more
- 11 Sep 2001
TL;DR: This paper proposes a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page, and demonstrates that in-page data placement is the key to high cache performance.
Approximate counts and quantiles over sliding windows
Arvind Arasu,Gurmeet Singh Manku +1 more
- 14 Jun 2004
TL;DR: This work considers the problem of maintaining ε-approximate counts and quantiles over a stream sliding window using limited space and presents various deterministic and randomized algorithms for approximate counts andquantiles that require O(1/ε polylog( 1/ε, N)) space.
Related Papers (5)
A. Greb,Gabriel Zachmann +1 more
- 25 Apr 2006
Peter Kipfer,Mark Segal,Rüdiger Westermann +2 more
- 29 Aug 2004
Shubhabrata Sengupta,Mark J. Harris,Yao Zhang,John D. Owens +3 more
- 04 Aug 2007
Kenneth E. Batcher
- 30 Apr 1968
Nadathur Satish,Mark J. Harris,Michael Garland +2 more
- 23 May 2009