P2P Join Query Processing over Data Streams

Open Access

P2P Join Query Processing over Data Streams

- 20 Oct 2009

1

TL;DR: A new method is presented that exploits the power of a Distributed Hash Table (DHT) combining hash-based placement of tuples and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incuring little overhead.

Abstract: Recent years have witnessed the growth of a new class of data-intensive applications that do not fit the DBMS data model and querying paradigm. Instead, the data arrive at high speeds taking the form of an unbounded sequence of values (data streams) and queries run continuously returning new results as new data arrive. In these applications, data streams from external sources flow into a Data Stream Management System (DSMS) where they are processed by different operators. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing continuous join queries over distributed data streams. We present a new method, called DHTJoin that exploits the power of a Distributed Hash Table (DHT) combining hash-based placement of tuples and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incuring little overhead. Unlike state of the art solutions that index all data, DHTJoin identiﬁes, using query predicates, a subset of tuples in order to index the data required by the user's queries, thus reducing network traffic. DHTJoin tackles the dynamic behavior of DHT networks during query execution and dissemination of queries. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Dissertation

Gestion de flux de données pour l'observation de systèmes

Loïc Petit

- 10 Dec 2012

TL;DR: Un modele algebrique Astral capable of traiter sans ambiguites semantiques des donnees provenant de flux ou relations dans notre systeme par l'introduction d'un modele de preferences top-k.

...read moreread less

7

References

Proceedings Article•10.1145/383059.383071

Chord: A scalable peer-to-peer lookup service for internet applications

Ion Stoica, +4 more

- 27 Aug 2001

TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

...read moreread less

11.2K

•Book Chapter•10.1007/3-540-45518-3_18

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Antony Rowstron, +1 more

- 12 Nov 2001

- Lecture Notes in Computer Science

TL;DR: Pastry as mentioned in this paper is a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications, which performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet.

...read moreread less

8K

•Proceedings Article•10.1145/383059.383072

A scalable content-addressable network

Sylvia Ratnasamy, +4 more

- 27 Aug 2001

TL;DR: The concept of a Content-Addressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales is introduced and its scalability, robustness and low-latency properties are demonstrated through simulation.

...read moreread less

7.2K

•Journal Article•10.1109/JSAC.2003.818784

Tapestry: a resilient global-scale overlay for service deployment

Ben Y. Zhao, +5 more

- 07 Jan 2004

- IEEE Journal on Selected Areas in Commun...

TL;DR: Experimental results show that Tapestry exhibits stable behavior and performance as an overlay, despite the instability of the underlying network layers, illustrating its utility as a deployment infrastructure.

...read moreread less

2K

•Proceedings Article

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World.

Sirish Chandrasekaran, +10 more

- 01 Jan 2003

TL;DR: The next generation Telegraph system, called TelegraphCQ, is focused on meeting the challenges that arise in handling large streams of continuous queries over high-volume, highly-variable data streams and leverages the PostgreSQL open source code base.

...read moreread less

1.2K