Topic

Burstsort

About: Burstsort is a research topic. Over the lifetime, 11 publications have been published within this topic receiving 230 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers

Journal Article•10.1145/1187436.1187439•

Cache-efficient string sorting using copying

[...]

Ranjan Sinha¹, Justin Zobel¹, David Ring•Institutions (1)

RMIT University¹

09 Feb 2007-ACM Journal of Experimental Algorithms

TL;DR: C-burstsort is introduced, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality and show that sorting is typically twice as fast as the original burstsort and four to five times faster than multikey quicksort and previous radixsorts.

...read moreread less

Abstract: Burstsort is a cache-oriented sorting technique that uses a dynamic trie to efficiently divide large sets of string keys into related subsets small enough to sort in cache. In our original burstsort, string keys sharing a common prefix were managed via a bucket of pointers represented as a list or array; this approach was found to be up to twice as fast as the previous best string sorts, mostly because of a sharp reduction in out-of-cache references. In this paper, we introduce C-burstsort, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality. On both Intel and PowerPC architectures, and on a wide range of string types, we show that sorting is typically twice as fast as our original burstsort and four to five times faster than multikey quicksort and previous radixsorts. A variant that copies both suffixes and record pointers to buckets, CP-burstsort, uses more memory, but provides stable sorting. In current computers, where performance is limited by memory access latencies, these new algorithms can dramatically reduce the time needed for internal sorting of large numbers of strings.

...read moreread less

56 citations

Proceedings Article•

Cache-Conscious Sorting of Large Sets of Strings with Dynamic Tries.

[...]

Ranjan Sinha¹, Justin Zobel¹•Institutions (1)

RMIT University¹

1 Jan 2003

TL;DR: This work proposes a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets, which is simple, fast, and efficient.

...read moreread less

Abstract: Ongoing changes in computer architecture are affecting the efficiency of string-sorting algorithms. The size of main memory in typical computers continues to grow but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing string-sorting algorithms as they do not utilize cache well for large data sets. We propose a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets. It is simple, fast, and efficient. We experimentally explore key implementation options and compare burstsort to existing string-sorting algorithms on large and small sets of strings with a range of characteristics. These experiments show that, for large sets of strings, burstsort is almost twice as fast as any previous algorithm, primarily due to a lower rate of cache miss.

...read moreread less

43 citations

Journal Article•10.1145/1064546.1180622•

Using random sampling to build approximate tries for efficient string sorting

[...]

Ranjan Sinha¹, Justin Zobel¹•Institutions (1)

RMIT University¹

31 Dec 2005-ACM Journal of Experimental Algorithms

TL;DR: New variants of burstsort, a new string-sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cache efficient are introduced: SR-burstsort, DR-burstort, and DRL-Burstsort.

...read moreread less

Abstract: Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string-sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cache efficient. Burstsort dynamically builds a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SR-burstsort, DR-burstsort, and DRL-burstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce, by up to 37p, cache misses further than did the original burstsort, while simultaneously reducing instruction counts by up to 24p. In pathological cases, even further savings can be obtained.

...read moreread less

33 citations

Efficient Trie-Based Sorting of Large Sets of Strings

[...]

Ranjan Sinha¹, Justin Zobel¹•Institutions (1)

RMIT University¹

1 Jan 2003

TL;DR: It is shown that better choice of data structures further improves the efficiency, at a small additional cost in memory, of the burstsort algorithm.

...read moreread less

Abstract: Sorting is a fundamental algorithmic task. Many general-purpose sorting algorithms have been developed, but efficiency gains can be achieved by designing algorithms for specific kinds of data, such as strings. In previous work we have shown that our burstsort, a trie-based algorithm for sorting strings, is for large data sets more efficient than all previous algorithms for this task. In this paper we re-evaluate some of the implementation details of burstsort, in particular the method for managing buckets held at leaves. We show that better choice of data structures further improves the efficiency, at a small additional cost in memory. For sets of around 30,000,000 strings, our improved burstsort is nearly twice as fast as the previous best sorting algorithm.

...read moreread less

30 citations

Book Chapter•10.1007/978-3-540-24838-5_39•

Using Random Sampling to Build Approximate Tries for Efficient String Sorting

[...]

Ranjan Sinha¹, Justin Zobel¹•Institutions (1)

RMIT University¹

25 May 2004-Lecture Notes in Computer Science

TL;DR: New variants of burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, are introduced: SR-burstsort, DR-burstort, and DRL-Burstsort.

...read moreread less

Abstract: Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cache-efficient. The approach in burstsort is to dynamically build a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SR-burstsort, DR-burstsort, and DRL-burstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce cache misses further than did the original burstsort, by up to 37%, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained.

...read moreread less

18 citations

Performance Metrics

Papers

156

Citations

No. of papers in the topic in previous years
Year	Papers
2013	1
2010	1
2008	1
2007	1
2005	1
2004	3

Burstsort

Topic Tools

Papers

Cache-efficient string sorting using copying

Cache-Conscious Sorting of Large Sets of Strings with Dynamic Tries.

Using random sampling to build approximate tries for efficient string sorting

Efficient Trie-Based Sorting of Large Sets of Strings

Using Random Sampling to Build Approximate Tries for Efficient String Sorting

Related Topics (5)

Performance Metrics