TL;DR: The cost of sampling in terms of the cost of successfully searching a hash file is given and how to exploit features of the dynamic hashing methods to improve sampling efficiency is shown.
Abstract: In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files and hash files with separate overflow chains. The dynamic hashing methods considered are Linear Hash files [Lit80] and Extendible Hash files [FNPS79]. We give the cost of sampling in terms of the cost of successfully searching a hash file and show how to exploit features of the dynamic hashing methods to improve sampling efficiency.
TL;DR: The results of investigations into the performance of some widely used hashing algorithms are presented and it is shown that some of these algorithms are far from optimal.
Abstract: Hashing is so commonly used in computing that one might expect hash functions to be well understood, and that choosing a suitable function should not be difficult. The results of investigations into the performance of some widely used hashing algorithms are presented and it is shown that some of these algorithms are far from optimal. Recommendations are made for choosing a hashing algorithm and measuring its performance.
TL;DR: The authors develop a novel technique in which concepts of both bucketing and open addressing schemes are modified in such a manner that they can be suitable for VLSI/WSI implementation, namely, dynamically reconfigurable hash tables.
Abstract: The authors develop a novel technique in which concepts of both bucketing and open addressing schemes are modified in such a manner that they can be suitable for VLSI/WSI implementation, namely, dynamically reconfigurable hash tables. In this method, finite storage is allocated for each bucket. Instead of searching the entire table or a part of the table for an empty storage place, the overflowing synonyms are inserted into the successor's bucket (next to the home bucket). If the successor's bucket overflows, the same technique is repeated until the table is stable. The host bucket takes care of all the relative operations for its guest items. As soon as an empty place arises in the original bucket, the host bucket returns the guest element to the original bucket: in essence, dynamically variable capacity buckets have been created. These buckets are designed using systolic arrays. >