TL;DR: This paper proposes a simple data structure, called a join index, for improving the performance of joins in the context of complex queries, and analysis of the join algorithm using join indices shows its excellent performance.
Abstract: In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of complex queries. For most of the joins, updates to join indices incur very little overhead. Some properties of a join index are (i) its efficient use of memory and adaptiveness to parallel execution, (ii) its compatibility with other operations (including select and union), (iii) its support for abstract data type join predicates, (iv) its support for multirelation clustering, and (v) its use in representing directed graphs and in evaluating recursive queries. Finally, the analysis of the join algorithm using join indices shows its excellent performance.
TL;DR: A data base machineGRACE is proposed which adopts a novel relational algebraic processing algorithm based on hash and sort, which can execute join efficiently inO(N+M/K) time, whereN andM are the cardinalities of two relations andK the number of memory banks.
Abstract: In this paper we discuss the application of the dynamic clustering feature of hash to a relational data base machine. By partitioning the relation using hash, large load reductions in join and set operations are realized. Several machine architectures based on hash are presented. We propose a data base machineGRACE which adopts a novel relational algebraic processing algorithm based on hash and sort. Whereas conventional logic-per-track machines perform poorly in a join dominant environment,GRACE can execute join efficiently inO(N+M/K) time, whereN andM are the cardinalities of two relations andK the number of memory banks.
TL;DR: This paper proposes a new parallel hash join method, the bucket spreading strategy, which is robust for data skew and attains very good scalability.
Abstract: The Super Database Computer (SDC) is a highperformance relational database server for a joinintensive environment under development at University of Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other join algorithms, a hash-based algorithm is quite efficient and easily parallelieed, and has been employed by many database machines. However, in the presence of data skew, it’s hard to distribute load equally among processing modules (PMs) by statically allocating buckets to PMs, as in the conventional parallelieing strategy. Thus, performance is severly degraded. In this paper, we propose a new parallel hash join method, the bucket spreading strategy, which is robust for data skew. During partitioning relations, each bucket is again divided into fragments of the same sise and these fragments are temporarily placed on PMs one by one. Then each bucket is dynamically allocated to a PM which actually carries out the join of the bucket, and all fragments of the bucket are collected in the corresponding PM. In this way, the bucket spreading strategy evenly distributes the load among the PMs and parallelism is always fully exploited. The architecture of SDC is designed to support the bucket spreading strategy; a mechanism which distributes the buckets flatly among the PMs is embedded in the hardware of the interconnection network. Simulation results confirm that the bucket spreading strategy is robust for data skew and attains very good scalability. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage. the VLDB copyright notice and the title of the publication and its date appear. and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise. or to rcpuhlish. requires a kc and/or special permission from the Endowment. Proceedings of the 16th VLDB Conference Brisbane, Australia 1990 Yasushi Ogawa Research and Development Center, RICOH Co., Ltd. 16-l Shinei-cho, Kohoku-ku,
TL;DR: Three novel join algorithms depending on the ADS availability are presented that outperform two benchmark algorithms, often by several orders of magnitude, on all performance metrics, and effectively shift the workload to the outsourcing service.
Abstract: Database outsourcing requires that a query server constructs a proof of result correctness, which can be verified by the client using the data owner's signature. Previous authentication techniques deal with range queries on a single relation using an authenticated data structure (ADS). On the other hand, authenticated join processing is inherently more complex than ranges since only the base relations (but not their combination) are signed by the owner. In this paper, we present three novel join algorithms depending on the ADS availability: (i) Authenticated Indexed Sort Merge Join (AISM), which utilizes a single ADS on the join attribute, (ii) Authenticated Index Merge Join (AIM) that requires an ADS (on the join attribute) for both relations, and (iii) Authenticated Sort Merge Join (ASM), which does not rely on any ADS. We experimentally demonstrate that the proposed methods outperform two benchmark algorithms, often by several orders of magnitude, on all performance metrics, and effectively shift the workload to the outsourcing service. Finally, we extend our techniques to complex queries that combine multi-way joins with selections and projections.
TL;DR: Experimental results show that HMJ combines the advantages of two state-of-the-art nonblocking join algorithms (XJoin and Progressive Merge Join) while avoiding their shortcomings.
Abstract: We introduce the hash-merge join algorithm (HMJ, for short); a new nonblocking join algorithm that deals with data items from remote sources via unpredictable, slow, or bursty network traffic. The HMJ algorithm is designed with two goals in mind: (1) minimize the time to produce the first few results, and (2) produce join results even if the two sources of the join operator occasionally get blocked. The HMJ algorithm has two phases: The hashing phase and the merging phase. The hashing phase employs an in-memory hash-based join algorithm that produces join results as quickly as data arrives. The merging phase is responsible for producing join results if the two sources are blocked. Both phases of the HMJ algorithm are connected via a flushing policy that flushes in-memory parts into disk storage once the memory is exhausted. Experimental results show that HMJ combines the advantages of two state-of-the-art nonblocking join algorithms (XJoin and Progressive Merge Join) while avoiding their shortcomings.