TL;DR: In this article, a table of records is organized into a group of arrays, and a hashing algorithm is used to locate a record in the table, such that the record can be located relatively quickly in one of the arrays in the group.
Abstract: A method for searching for a record in a table in a memory of a computer system. A table of records is organized into a group of arrays. A hashing algorithm locates a record in the table. Multiple hashing functions are executed concurrently, according to the number of arrays in the group, such that the record can be located relatively quickly in one of the arrays in the group. The table is analyzed to determine the information content of each bit in a string of bits comprising an index value associated with the table, according to Shannon's formula for information-theoretic entropy. The entropy associated with each bit in the string of bits provides a basis for selecting a subset of bits in the string of bits from which to obtain the seed values utilized in the hashing functions. A rotating mask, based on Neumann's code, is applied to the subset of bits to obtain different seed values for each of the hashing functions, thereby minimizing the correlation of the keys provided by the hashing functions.
TL;DR: In this paper, a method and apparatus for using a hashing function to store data in a cache memory where the hashing function used is changed periodically is described, and the data at the index generated by the current hashing function does not match the incoming data, previous hashing functions are used to repeat the search.
Abstract: A method and apparatus for using a hashing function to store data in a cache memory. Briefly, a method and apparatus is provided for using a hashing function to store data in a cache memory where the hashing function used is changed periodically. In one embodiment, the cache memory stores the data, an indicator of the hashing function used and the index value generated by the hashing function used. To retrieve data from the cache memory, the current hashing function is used to generate an index for the incoming data. The data at the index is checked to determine whether the stored data matches the incoming data. If the data at the index generated by the current hashing function does not match the incoming data, previous hashing functions are used to repeat the search.
TL;DR: This paper characterize several expansion techniques used for linear hashing and presents how to analyze any linear hashing technique that expands based on local events or that mixes local events and global conditions.
Abstract: In this paper we characterize several expansion techniques used for linear hashing and we present how to analyze any linear hashing technique that expands based on local events or that mixes local events and global conditions. As an example we give a very simple randomized expansion technique, which is easy to analyze and implement. Furthermore, we obtain the analysis of the original hashing technique devised by Litwin, which was unsolved until now, comparing it to the later and more widely used version of Larson's. We also analyze one hybrid technique. Among other results, it is shown that the control function used by Litwin does not produce a good storage utilization, matching known experimental data.
TL;DR: This paper presents a new method of indexing image databases, called location hashing, that uses a special data structure, called the location hash tree, for organizing feature information from images of a database, based on the principle of geometric hashing.
Abstract: Queries referring to content embedded within images are an essential component of content-based search, browse, or summarize operations in image databases. Localization of such queries under changes in appearance, occlusions and background clutter, is a difficult problem, for which current spatial access structures in databases are not suitable. In this paper, we present a new method of indexing image databases, called location hashing, that uses a special data structure, called the location hash tree, for organizing feature information from images of a database. Location hashing is based on the principle of geometric hashing. It simultaneously determines the relevant images in the database, and the regions within them, which are most likely to contain 2D pattern query, without incurring a detailed search of either. The location hash tree being a red-black tree, allows for efficient search for candidate locations using pose-invariant feature information derived from the query.
TL;DR: A two-stage methodology that uses the knowledge of the hashing function to reorganize the group assignments so that the resulting groups have similar expected cardinalities, and is generally applicable and independent of the used hashing function.
Abstract: Increasingly larger data sets are being stored in networked architectures. Many of the available data structures are not easily amenable to parallel realizations. Hashing schemes show promise in that respect for the simple reason that the underlying data structure can be decomposed and spread among the set of cooperating nodes with minimal communication and maintenance requirements. In all cases, storage utilization and load balancing are issues that need to be addressed. One can identify two basic approaches to tackle the problem. One way is to address it as part of the design of the data structure that is used to store and retrieve the data. The other is to maintain the data structure intact but address the problem separately. The method that we present here falls in the latter category and is applicable whenever a hash table is the preferred data structure. Intrinsically attached to the used hash table is a hashing function that allows one to partition a possibly unbounded set of data items into a finite set of groups; the hashing function provides the partitioning by assigning each data item to one of the groups. In general, the hashing function cannot guarantee that the various groups will have the same cardinality on average, for all possible data item distributions. In this paper, we propose a two-stage methodology that uses the knowledge of the hashing function to reorganize the group assignments so that the resulting groups have similar expected cardinalities. The method is generally applicable and independent of the used hashing function. We show the power of the methodology using both synthetic and real-world databases. The derived quasi-uniform storage occupancy and associated load-balancing gains are significant.