Linear probing

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•

KenLM: Faster and Smaller Language Model Queries

[...]

Kenneth Heafield¹•Institutions (1)

Carnegie Mellon University¹

30 Jul 2011

TL;DR: KenLM is a library that implements two data structures for efficient language model queries, reducing both time and memory costs and is integrated into the Moses, cdec, and Joshua translation systems.

...read moreread less

Abstract: We present KenLM, a library that implements two data structures for efficient language model queries, reducing both time and memory costs. The Probing data structure uses linear probing hash tables and is designed for speed. Compared with the widely-used SRILM, our Probing model is 2.4 times as fast while using 57% of the memory. The Trie data structure is a trie with bit-level packing, sorted records, interpolation search, and optional quantization aimed at lower memory consumption. Trie simultaneously uses less memory than the smallest lossless baseline and less CPU than the fastest baseline. Our code is open-source, thread-safe, and integrated into the Moses, cdec, and Joshua translation systems. This paper describes the several performance techniques used and presents benchmarks against alternative implementations.

...read moreread less

1,521 citations

Journal Article•10.1016/J.JALGOR.2003.12.002•

Cuckoo hashing

[...]

Rasmus Pagh¹, Flemming Friche Rodler²•Institutions (2)

IT University of Copenhagen¹, Aalborg University²

1 May 2004

TL;DR: In this paper, a simple dictionary with worst case constant lookup time was presented, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al.

...read moreread less

Abstract: We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. [SIAM J. Comput. 23 (4) (1994) 738-761]. The space usage is similar to that of binary search trees. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash functions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee on lookup time.

...read moreread less

1,392 citations

Journal Article•10.1109/12.641938•

Efficient hardware hashing functions for high performance computers

[...]

M.V. Ramakrishna¹, E. Fu², E. Bahcekapili²•Institutions (2)

Melbourne Institute of Technology¹, Michigan State University²

01 Dec 1997-IEEE Transactions on Computers

TL;DR: It is shown that, by choosing hashing functions at random from a particular class, called H/sub 3/, of hashing functions, the analytical performance of hashing can be achieved in practice on real-life data.

...read moreread less

Abstract: Hashing is critical for high performance computer architecture. Hashing is used extensively in hardware applications, such as page tables, for address translation. Bit extraction and exclusive ORing hashing "methods" are two commonly used hashing functions for hardware applications. There is no study of the performance of these functions and no mention anywhere of the practical performance of the hashing functions in comparison with the theoretical performance prediction of hashing schemes. In this paper, we show that, by choosing hashing functions at random from a particular class, called H/sub 3/, of hashing functions, the analytical performance of hashing can be achieved in practice on real-life data. Our results about the expected worst case performance of hashing are of special significance, as they provide evidence for earlier theoretical predictions.

...read moreread less

262 citations

Journal Article•10.1145/1618452.1618500•

Real-time parallel hashing on the GPU

[...]

Dan A. Alcantara¹, Andrei Sharf¹, Fatemeh Abbasinejad¹, Shubhabrata Sengupta¹, Michael Mitzenmacher², John D. Owens¹, Nina Amenta¹ - Show less +3 more•Institutions (2)

University of California, Davis¹, Harvard University²

1 Dec 2009

TL;DR: An efficient data-parallel algorithm for building large hash tables of millions of elements in real-time, which considers a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations.

...read moreread less

Abstract: We demonstrate an efficient data-parallel algorithm for building large hash tables of millions of elements in real-time. We consider two parallel algorithms for the construction: a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations. Our construction is a hybrid approach that uses both algorithms. We measure the construction time, access time, and memory usage of our implementations and demonstrate real-time performance on large datasets: for 5 million key-value pairs, we construct a hash table in 35.7 ms using 1.42 times as much memory as the input data itself, and we can access all the elements in that hash table in 15.3 ms. For comparison, sorting the same data requires 36.6 ms, but accessing all the elements via binary search requires 79.5 ms. Furthermore, we show how our hashing methods can be applied to two graphics applications: 3D surface intersection for moving data and geometric hashing for image matching.

...read moreread less

212 citations

Journal Article•10.1007/PL00009236•

On the Analysis of Linear Probing Hashing

[...]

Philippe Flajolet¹, Patricio V. Poblete², Alfredo Viola•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Chile²

01 Dec 1998-Algorithmica

TL;DR: In this article, moment analyses and characterizations of limit distributions for the construction cost of hash tables under the linear probing strategy are presented for full tables and sparse tables with a fixed filling ratio strictly smaller than one.

...read moreread less

Abstract: This paper presents moment analyses and characterizations of limit distributions for the construction cost of hash tables under the linear probing strategy. Two models are considered, that of full tables and that of sparse tables with a fixed filling ratio strictly smaller than one. For full tables, the construction cost has expectation O(n 3/2 ) , the standard deviation is of the same order, and a limit law of the Airy type holds. (The Airy distribution is a semiclassical distribution that is defined in terms of the usual Airy functions or equivalently in terms of Bessel functions of indices $ -\frac{1}{3},\frac{2}{3} $ .) For sparse tables, the construction cost has expectation O(n) , standard deviation O ( $ \sqrt{n} $ ), and a limit law of the Gaussian type. Combinatorial relations with other problems leading to Airy phenomena (like graph connectivity, tree inversions, tree path length, or area under excursions) are also briefly discussed.

...read moreread less

165 citations

...

Expand

Year	Papers
2021	2
2020	1
2019	7
2018	5
2017	6
2016	8

Topic Tools

Papers published on a yearly basis

Papers

KenLM: Faster and Smaller Language Model Queries

Cuckoo hashing

Efficient hardware hashing functions for high performance computers

Real-time parallel hashing on the GPU

On the Analysis of Linear Probing Hashing

Related Topics (5)

Performance Metrics