TL;DR: A family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network, based on a special kind of hashing that is called consistent hashing.

...read moreread less

Abstract: We describe a family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems.

...read moreread less

2,310 citations

Journal Article•10.1145/320083.320092•

Extendible hashing—a fast access method for dynamic files

[...]

David K. Hsiao¹•Institutions (1)

Ohio State University¹

01 Sep 1979-ACM Transactions on Database Systems

TL;DR: This work studies, by analysis and simulation, the performance of extendible hashing and indicates that it provides an attractive alternative to other access methods, such as balanced trees.

...read moreread less

Abstract: Extendible hashing is a new access technique, in which the user is guaranteed no more than two page faults to locate the data associated with a given unique identifier, or key. Unlike conventional hashing, extendible hashing has a dynamic structure that grows and shrinks gracefully as the database grows and shrinks. This approach simultaneously solves the problem of making hash tables that are extendible and of making radix search trees that are balanced. We study, by analysis and simulation, the performance of extendible hashing. The results indicate that extendible hashing provides an attractive alternative to other access methods, such as balanced trees.

...read moreread less

756 citations

Journal Article•10.1002/SPE.587•

UbiCrawler: a scalable fully distributed web crawler

[...]

Paolo Boldi¹, Bruno Codenotti², Massimo Santini, Sebastiano Vigna¹•Institutions (2)

University of Milan¹, University of Iowa²

10 Jul 2004-Software - Practice and Experience

TL;DR: UbiCrawler as discussed by the authors is a scalable distributed Web crawler using the Java programming language, which has a very effective assignment function for partitioning the domain to crawl, and more in general the complete decentralization of every task.

...read moreread less

Abstract: We report our experience in implementing UbiCrawler, a scalable distributed Web crawler, using the Java programming language. The main features of UbiCrawler are platform independence, linear scalability, graceful degradation in the presence of faults, a very effective assignment function (based on consistent hashing) for partitioning the domain to crawl, and more in general the complete decentralization of every task. The necessity of handling very large sets of data has highlighted some limitations of the Java APIs, which prompted the authors to partially reimplement them.

...read moreread less

648 citations

Proceedings Article•

Linear hashing: a new tool for file and table addressing

[...]

Witold Litwin

1 Oct 1980

TL;DR: In this paper, a record in the file is, in general, found in one access, while the load may stay practically constant up to 90 %. No other algorithms attaining such a performance are known.

...read moreread less

Abstract: Linear hashing is a hashing in which the address space may grow or shrink dynamically. A file or a table may then support any number of insertions or deletions without access or memory load performance deterioration. A record in the file is, in general, found in one access, while the load may stay practically constant up to 90 %. A record in a table is found in a mean of 1.7 accesses, while the load is constantly 80 %. No other algorithms attaining such a performance are known.

...read moreread less

512 citations

Book Chapter•10.1007/11841036_42•

Less hashing, same performance: building a better bloom filter

[...]

Adam Kirsch¹, Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

11 Sep 2006

TL;DR: Only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability, leading to less computation and potentially less need for randomness in practice.

...read moreread less

Abstract: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + i h2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice.

...read moreread less

375 citations