Open Access10.6083/M4QV3JG0
Punctuated data streams
Peter A. Tucker,David Maier +1 more
- 01 Jan 2005
TL;DR: It is shown that a query benefits from an input punctuation scheme (in terms of being able to produce a given output scheme), if each set in the groupings induced by the operators of the query is covered by a finite number of punctuations in the scheme—a kind of compactness.
read more
Abstract: As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or unbounded data streams. Unbounded stateful operators maintain state with no upper bound on its size, and so eventually run out of memory. Blocking operators read the entire input before emitting a single output, and so might never produce a result. We believe that a priori semantic knowledge of a data stream can permit the use of such operators in some cases. We explore a kind of stream semantics called punctuated streams. Punctuations in a stream mark the end of substreams, allowing us to view a non-terminating stream as a mixture of terminating streams. We introduce three kinds of invariants to specify the proper behavior of query operators in the presence of punctuation. Pass invariants unblock blocking operators by defining when such an operator can pass results on. Keep invariants define what must be kept in local state to continue successful operation. Propagation invariants define when an operator can pass punctuation on. We then present a strategy for proving that implementations of these invariants are faithful to their finite table counterparts.
In practice, it is important to answer the following question: “How much additional overhead is required when using punctuations?” We use the scenario of a monitoring system for an online auction. Streams of bids, new items, and new users are sent to an online auction system. There are many interesting queries that can be posed over these auction streams. We define queries for this scenario, and execute them with different kinds and amounts of punctuations embedded in the input streams. We show that, for a reasonable ratio of punctuations to data items, the overhead is minimal. Additionally, we compare the behavior of a query using punctuations with the behavior of the same query using slack over data streams with disorder.
Clearly, not all punctuations are useful to a particular query, and it would be useful to make a determination of when they are. That is, we would like to answer the question “Can stream query Q benefit from a particular set of punctuations?” To that end, we first define punctuation schemes to specify the collection of punctuations that will be presented to a query on a particular data stream. We show how both punctuations and query operators induce groupings over the items in the domain of the input(s). We show that a query benefits from an input punctuation scheme (in terms of being able to produce a given output scheme), if each set in the groupings induced by the operators of the query is covered by a finite number of punctuations in the scheme—a kind of compactness.
We conclude with discussion on possible future directions of research related to punctuations and data streams. These directions focus on a variety of questions, ranging from issues in query optimization to other possible semantics that can be expressed using punctuations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Out-of-order processing: a new architecture for high-performance stream systems
Jin Li,Kristin Tufte,Vladislav Shkapenyuk,Vassilis Papadimos,Theodore Johnson,David Maier +5 more
- 01 Aug 2008
TL;DR: This work introduces a new architecture for stream systems, out-of-order processing (OOP), that avoids ordering constraints and shows that the OOP approach can significantly outperform IOP in a number of aspects, including memory, throughput and latency.
Stream warehousing with DataDepot
Lukasz Golab,Theodore Johnson,J. Spencer Seidel,Vladislav Shkapenyuk +3 more
- 29 Jun 2009
TL;DR: The DataDepot architecture is discussed, with an emphasis on several of its novel and critical features, which are currently being used for five very large warehousing projects within AT&T.
113
Annotations in Data Streams
Amit Chakrabarti,Graham Cormode,Andrew McGregor +2 more
- 06 Jul 2009
TL;DR: In this paper, the authors consider the problem of annotating a data stream as it is read and show upper bounds that achieve a non-trivial tradeoff between the amount of annotation used and the space required to verify it.
Scalable Scheduling of Updates in Streaming Data Warehouses
TL;DR: A scheduling framework is proposed that handles the complications encountered by a stream warehouse: view hierarchies and priorities, data consistency, inability to preempt updates, heterogeneity of update jobs caused by different interarrival times and data volumes among different sources, and transient overload.
49
•Journal Article
Joining punctuated streams
TL;DR: The experimental results of comparing the performance of PJoin with XJoin, a stream join operator without a constraint-exploiting mechanism, show that PJoin significantly outperforms XJoin with regard to both memory overhead and throughput.
43
References
•Book
Principles of mathematical analysis
Walter Rudin
- 01 Jan 1964
TL;DR: The real and complex number system as discussed by the authors is a real number system where the real number is defined by a real function and the complex number is represented by a complex field of functions.
7.5K
Models and issues in data stream systems
Brian Babcock,Shivnath Babu,Mayur Datar,Rajeev Motwani,Jennifer Widom +4 more
- 03 Jun 2002
TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.
Mining high-speed data streams
Pedro Domingos,Geoff Hulten +1 more
- 01 Aug 2000
TL;DR: This paper describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example, and applies it to mining the continuous stream of Web access data from the whole University of Washington main campus.
Access path selection in a relational database management system
P. Griffiths Selinger,Morton M. Astrahan,Donald D. Chamberlin,Raymond A. Lorie,T. G. Price +4 more
- 30 May 1979
TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.
Aurora: a new model and architecture for data stream management
Daniel J. Abadi,Don Carney,Uğur Çetintemel,Mitch Cherniack,Christian Convey,Sangdon Lee,Michael Stonebraker,Nesime Tatbul,Stan Zdonik +8 more
- 01 Aug 2003
TL;DR: The basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications, are described and a stream-oriented set of operators are described.