Open Access
P2P Join Query Processing over Data Streams
Wenceslao Palma,Reza Akbarinia,Esther Pacitti,Patrick Valduriez +3 more
- 20 Oct 2009
TL;DR: A new method is presented that exploits the power of a Distributed Hash Table (DHT) combining hash-based placement of tuples and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incuring little overhead.
read more
Abstract: Recent years have witnessed the growth of a new class of data-intensive applications that do not fit the DBMS data model and querying paradigm. Instead, the data arrive at high speeds taking the form of an unbounded sequence of values (data streams) and queries run continuously returning new results as new data arrive. In these applications, data streams from external sources flow into a Data Stream Management System (DSMS) where they are processed by different operators. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing continuous join queries over distributed data streams. We present a new method, called DHTJoin that exploits the power of a Distributed Hash Table (DHT) combining hash-based placement of tuples and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incuring little overhead. Unlike state of the art solutions that index all data, DHTJoin identifies, using query predicates, a subset of tuples in order to index the data required by the user's queries, thus reducing network traffic. DHTJoin tackles the dynamic behavior of DHT networks during query execution and dissemination of queries. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Dissertation
Gestion de flux de données pour l'observation de systèmes
Loïc Petit
- 10 Dec 2012
TL;DR: Un modele algebrique Astral capable of traiter sans ambiguites semantiques des donnees provenant de flux ou relations dans notre systeme par l'introduction d'un modele de preferences top-k.
References
Chord: A scalable peer-to-peer lookup service for internet applications
Ion Stoica,Robert Morris,David R. Karger,M. Frans Kaashoek,Hari Balakrishnan +4 more
- 27 Aug 2001
TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.
11.2K
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Antony Rowstron,Peter Druschel +1 more
TL;DR: Pastry as mentioned in this paper is a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications, which performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet.
A scalable content-addressable network
Sylvia Ratnasamy,Paul Francis,Mark Handley,Richard M. Karp,Scott Shenker +4 more
- 27 Aug 2001
TL;DR: The concept of a Content-Addressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales is introduced and its scalability, robustness and low-latency properties are demonstrated through simulation.
7.2K
Tapestry: a resilient global-scale overlay for service deployment
TL;DR: Experimental results show that Tapestry exhibits stable behavior and performance as an overlay, despite the instability of the underlying network layers, illustrating its utility as a deployment infrastructure.
•Proceedings Article
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World.
Sirish Chandrasekaran,Owen Cooper,Amol Deshpande,Michael J. Franklin,Joseph M. Hellerstein,Wei Hong,Sailesh Krishnamurthy,Samuel Madden,Vijayshankar Raman,Frederick Reiss,Mehul A. Shah +10 more
- 01 Jan 2003
TL;DR: The next generation Telegraph system, called TelegraphCQ, is focused on meeting the challenges that arise in handling large streams of continuous queries over high-volume, highly-variable data streams and leverages the PostgreSQL open source code base.
1.2K