Open AccessProceedings Article
Sorting using BItonic netwoRk wIth CUDA
Ranieri Baraglia,Gabriele Capannini,Franco Maria Nardini,Fabrizio Silvestri +3 more
- 01 Jan 2009
TL;DR: This paper shows how to use graphics processors as coprocessors to speed up sorting while allowing CPU to perform other tasks, and introduces an efficient instruction dispatch mechanism to improve the overall sorting performance.
read more
Abstract: Novel “manycore” architectures, such as graphics processors, are high-parallel and high-performance shared-memory architectures [7] born to solve specific problems such as the graphical ones. Those architectures can be exploited to solve a wider range of problems by designing the related algorithm for such architectures. We present a fast sorting algorithm implementing an efficient bitonic sorting network. This algorithm is highly suitable for information retrieval applications. Sorting is a fundamental and universal problem in computer science. Even if sort has been extensively addressed by many research works, it still remains an interesting challenge to make it faster by exploiting novel technologies. In this light, this paper shows how to use graphics processors as coprocessors to speed up sorting while allowing CPU to perform other tasks. Our new algorithm exploits a memory-efficient data access pattern maintaining the minimum number of accesses to the memory out of the chip. We introduce an efficient instruction dispatch mechanism to improve the overall sorting performance. We also present a cache-based computational model for graphics processors. Experimental results highlight remarkable improvements over prior CPU-based sorting methods, and a significant improvement over previous GPU-based sorting algorithms.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Implementation in FPGA of Address-Based Data Sorting
Valery Sklyarov,Iouliia Skliarova,Dmitri Mihhailov,Alexander Sudnitson +3 more
- 05 Sep 2011
TL;DR: The proposed technique enables such type of address-based sorting to be applied either directly or through tree-walk tables permitting number of bits in sorted data items to be increased by constructing and traversing N-ary trees composed of no-match and working nodes.
Sorting on GPUs for large scale datasets: A thorough comparison
TL;DR: This paper designed an optimized version of sorting network in the K-model, a novel computational model designed to consider all the important features of many-core architectures and achieves a space complexity of @Q(1).
26
Bitonic sort on a chained-cubic tree interconnection network
TL;DR: This paper maps bitonic sort to a chained-cubic tree interconnection network, calling the result BSCCT, which achieves a speedup that is almost 12-fold relative to a bitonicsort on a single processor when 1024 processors were used to sort 32M keys.
24
Workshop on large-scale distributed systems for information retrieval
Flavio Junqueira,Vassilis Plachouras,Fabrizio Silvestri,Ivana Podnar +3 more
- 01 Dec 2007
TL;DR: Given the attendance and the good level of discussion, it is concluded that systems for information retrieval is a growing and promising area of research
13
Analyzing Power and Energy Efficiency of Bitonic Mergesort Based on Performance Evaluation
TL;DR: The results showed that BM outperformed AQ based on all the three metrics in most cases and it was found that fundamental software building blocks can offer a reasonable amount of power and energy saving that can offer new ways to tackle the power obstacle of the prospective exascale systems.
13
References
•Book
Introduction to Information Retrieval
Christopher D. Manning,Prabhakar Raghavan,Hinrich Schütze +2 more
- 01 Jan 2008
TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Sorting networks and their applications
Kenneth E. Batcher
- 30 Apr 1968
TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.
•Book
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Luiz Andre Barroso,Urs Hoelzle +1 more
- 01 Jan 2008
TL;DR: The architecture of WSCs is described, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base are described.
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Luiz Andre Barroso,U. Hölzle +1 more
TL;DR: The architecture of WSCs is described, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base are described.
1.4K
Web search for a planet: The Google cluster architecture
TL;DR: Googless architecture features clusters of more than 15,000 commodity-class PCs with fault tolerant software that achieves superior performance at a fraction of the cost of a system built from fewer, but more expensive, high-end servers.
1.2K
Related Papers (5)
Kenneth E. Batcher
- 30 Apr 1968
Neetu Faujdar,Shipra Saraswat +1 more
- 01 May 2017
Grigore Lupescu,Emil Slusanschi,Nicolae Tapus +2 more
- 01 Aug 2017