DHTJoin: processing continuous join queries using DHT networks
TL;DR: A new method is presented, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incurring little overhead.
read more
Abstract: Continuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incurring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
HYBRIDJOIN for Near-Real-Time Data Warehousing
TL;DR: A robust stream-based join algorithm called Hybrid Join HYBRIDJOIN is introduced, which performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings.
Fault-tolerant query processing in structured P2P-systems
TL;DR: This paper proposes an in-query replication scheme which replicates the state of an operator among the neighbors of the processing peer and shows the effectiveness of the routing extension in networks of varying reliability.
29
Time-slide window join over data streams
Hyeon Gyu Kim,Yoo Hyun Park,Yang-Hyun Cho,Myoung Ho Kim +3 more
- 01 Oct 2014
TL;DR: The algorithm for the time-slide window join can vary according to (i) how frequently the join is evaluated and (ii) which structure is used for windowing.
26
X-HYBRIDJOIN for near-real-time data warehousing
Muhammad Asif Naeem,Gillian Dobbie,Gerald Weber +2 more
- 12 Jul 2011
TL;DR: In this article, an Extended Hybrid Join (X-HYBRIDJOIN) is proposed to adapt to data skew and store parts of the master data in memory permanently, reducing the disk access overhead significantly.
•Proceedings Article
Optimised X-HYBRIDJOIN for near-real-time data warehousing
M. Asif Naeem,Gillian Dobbie,Gerald Weber +2 more
- 31 Jan 2012
TL;DR: An algorithm Extended Hybrid Join (X-HYBRIDJOIN) is designed that is complementary to MESHJOIN in that it can adapt to data skew and stores parts of the master data in memory permanently, reducing the disk access overhead significantly.
17
References
Chord: A scalable peer-to-peer lookup service for internet applications
Ion Stoica,Robert Morris,David R. Karger,M. Frans Kaashoek,Hari Balakrishnan +4 more
- 27 Aug 2001
TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.
11.2K
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Antony Rowstron,Peter Druschel +1 more
TL;DR: Pastry as mentioned in this paper is a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications, which performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet.
A scalable content-addressable network
Sylvia Ratnasamy,Paul Francis,Mark Handley,Richard M. Karp,Scott Shenker +4 more
- 27 Aug 2001
TL;DR: The concept of a Content-Addressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales is introduced and its scalability, robustness and low-latency properties are demonstrated through simulation.
7.2K
•Book
Principles of Distributed Database Systems
M. Tamer zsu,Patrick Valduriez +1 more
- 01 Aug 1990
TL;DR: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.
2.7K
Tapestry: a resilient global-scale overlay for service deployment
TL;DR: Experimental results show that Tapestry exhibits stable behavior and performance as an overlay, despite the instability of the underlying network layers, illustrating its utility as a deployment infrastructure.
Related Papers (5)
Hao Zhang,Hai Jin,Qin Zhang +2 more
- 11 Oct 2009
Reza Akbarinia,Esther Pacitti,Patrick Valduriez +2 more
- 28 Aug 2007