TL;DR: Quantization-based Hashing (QBH) is a generic framework which incorporates the advantages of quantization error reduction methods into conventional property preserving hashing methods and can be applied to both unsupervised and supervised hashing methods.
TL;DR: A bagging–boosting-based semi-supervised multi-hashing with query-adaptive re-ranking (BBSHR) is proposed, which yields better precision and recall rates for given numbers of hash tables and bits.
TL;DR: Empirical results show that the proposed unified and concise unsupervised hashing framework, called binary multidimensional scaling, outperforms state-of-the-art methods by a large margin in terms of distance preservation, which is practical for real-world applications.
Abstract: Hashing is a useful technique for fast nearest neighbor search due to its low storage cost and fast query speed. Unsupervised hashing aims at learning binary hash codes for the original features so that the pairwise distances can be best preserved. While several works have targeted on this task, the results are not satisfactory mainly due to the over-simplified model. In this paper, we propose a unified and concise unsupervised hashing framework, called binary multidimensional scaling , which is able to learn the hash code for distance preservation in both batch and online mode. In the batch mode, unlike most existing hashing methods, we do not need to simplify the model by predefining the form of hash map. Instead, we learn the binary codes directly based on the pairwise distances among the normalized original features by alternating minimization. This enables a stronger expressive power of the hash map. In the online mode, we consider the holistic distance relationship between current query example and those we have already learned, rather than only focusing on current data chunk. It is useful when the data come in a streaming fashion. Empirical results show that while being efficient for training, our algorithm outperforms state-of-the-art methods by a large margin in terms of distance preservation, which is practical for real-world applications.
TL;DR: Here a full use of these collisions is obtained and therefore the spatial data compression rate is improved, and the performance of exclusive grouped spatial hashing is presented in 2D and 3D graphic examples.
TL;DR: Overall, this work provides a basic structure of a dedicated SIMD accelerated grouped aggregation framework that can be adapted with different hashing techniques and observes different impacts of vectorization on these techniques.
Abstract: Grouped aggregation is a commonly used analytical function. The common implementation of the function using hashing techniques suffers lower throughput rate due to the collision of the insert keys in the hashing techniques. During collision, the underlying technique searches for an alternative location to insert keys. Searching an alternative location increases the processing time for an individual key thereby degrading the overall throughput. In this work, we use Single Instruction Multiple Data (SIMD) vectorization to search multiple slots at an instant followed by direct aggregation of results. We provide our experimental results of our vectorized grouped aggregation with various open-addressing hashing techniques using several dataset distributions and our inferences on them. Among our findings, we observe different impacts of vectorization on these techniques. Namely, linear probing and two-choice hashing improve their performance with vectorization, whereas cuckoo and hopscotch hashing show a negative impact. Overall, we provide in this work a basic structure of a dedicated SIMD accelerated grouped aggregation framework that can be adapted with different hashing techniques.
TL;DR: This paper presents a new and innovative technique for collision resolution based on two-dimensional array based on a unique way of evaluating and implementing algorithms to resolve collisions in hash tables.
Abstract: Hashing is a well-known heuristic used for indexing and retrieving items from database as it uses a shorter hashed key, for finding the element, which is more efficient. In Data Structures, we use a hash table for looking up data rapidly. Hash functions enable rapid lookup of tables or databases by detecting duplicated records in a large file. Hash function should be properly designed to avoid collisions. However collisions are inevitable [1]. This paper presents a new and innovative technique for collision resolution based on two-dimensional array. The proposed strategy followed a unique way of evaluating and implementing algorithms to resolve collisions in hash tables. Analytical modelling and software simulations are quantifiable measures for the effectiveness of our algorithm. Efficient implementations that are easily realizable and productive in modern technologies are discussed. The performance benefits are significant and machines with moderate memory and speed specifications are prerequisites.
TL;DR: Experimental results show that the proposed hashing optimizations can find optimal solutions with limited steps, and the hashing method is superior to other state-of-the-art methods in terms of authentication and robustness.
Abstract: Robust image hashing is a promising technique to represent image’s perceptual content. However, when it comes to image authentication, tradeoff between robustness and discrimination is a non-negligible issue. The allowed content preserving operations and sensitive malicious manipulations on images are quite subjective to human’s perception. So it needs tactics to design good hashing methods. In this paper we incorporate the novel concept of core alignment into hashing, where the proposed core alignment improves the performances of balance. First, we formulize the hashing as a supervised minimal optimization problem based on Locality Sensitive Hashing, in which p-stable distribution is exploited to maintain high dimensional locality features. Then we solve this problem by two sub-optimization problems, i.e., searching for optimal shift and searching for optimal quantization intervals. By using particle swarm optimization and simulated annealing programming approaches we develop two stochastic solutions to those two problems, respectively. Experimental results show that our proposed hashing optimizations can find optimal solutions with limited steps, and the hashing method is superior to other state-of-the-art methods in terms of authentication and robustness.
TL;DR: This paper developed the locality-sensitive two-step hashing (LS-TSH) that generates the binary codes through LSH rather than any complex optimization technique, and could obtain comparable retrieval accuracy with state of the arts with two to three orders of magnitudes faster training speed.
Abstract: Hashing-based semantic similarity search is becoming increasingly important for building large-scale content-based retrieval system. The state-of-the-art supervised hashing techniques use flexible two-step strategy to learn hash functions. The first step learns binary codes for training data by solving binary optimization problems with millions of variables, thus usually requiring intensive computations. Despite simplicity and efficiency, locality-sensitive hashing (LSH) has never been recognized as a good way to generate such codes due to its poor performance in traditional approximate neighbor search. We claim in this paper that the true merit of LSH lies in transforming the semantic labels to obtain the binary codes, resulting in an effective and efficient two-step hashing framework. Specifically, we developed the locality-sensitive two-step hashing (LS-TSH) that generates the binary codes through LSH rather than any complex optimization technique. Theoretically, with proper assumption, LS-TSH is actually a useful LSH scheme, so that it preserves the label-based semantic similarity and possesses sublinear query complexity for hash lookup. Experimentally, LS-TSH could obtain comparable retrieval accuracy with state of the arts with two to three orders of magnitudes faster training speed.
TL;DR: In this paper, an unsupervised domain adaptation model is proposed to learn hash codes from training images belonging to seen classes, which can efficiently encode images of unseen classes to binary codes.
TL;DR: In this article, the authors proposed an algorithm to improve the performance of cuckoo hash tables without altering the properties of the original Cuckoo Hashing Table, and they also presented an implementation tailored to run efficiently on Intel Xeon processors to support NFV and softwarization trends.
Abstract: Hash tables are essential data-structures for networking applications (e.g., connection tracking, firewalls, network address translators). Among these, cuckoo hash tables provide excellent performance by processing lookups with very few memory accesses (2 to 3 per lookup). Yet, they remain memory bound and each memory access impacts performance. In this paper, we propose algorithmic improvements to cuckoo hash tables to eliminate unnecessary memory accesses, without altering the properties of the original cuckoo hash table so that all existing theoretical analysis remain applicable. We also present an implementation tailored to run efficiently on Intel Xeon processors, thus supporting NFV and softwarization trends and compare it to the optimized implementation of DPDK. On a single core, our implementation achieves 37M positive lookups per second (i.e., when the key looked up is present in the table), and 60M negative lookups per second, a 45% to 70% improvement over DPDK.