Punctuated data streams

doi:10.6083/M4QV3JG0

Punctuated data streams

- 01 Jan 2005

23

TL;DR: It is shown that a query benefits from an input punctuation scheme (in terms of being able to produce a given output scheme), if each set in the groupings induced by the operators of the query is covered by a finite number of punctuations in the scheme—a kind of compactness.

Abstract: As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or unbounded data streams. Unbounded stateful operators maintain state with no upper bound on its size, and so eventually run out of memory. Blocking operators read the entire input before emitting a single output, and so might never produce a result. We believe that a priori semantic knowledge of a data stream can permit the use of such operators in some cases. We explore a kind of stream semantics called punctuated streams. Punctuations in a stream mark the end of substreams, allowing us to view a non-terminating stream as a mixture of terminating streams. We introduce three kinds of invariants to specify the proper behavior of query operators in the presence of punctuation. Pass invariants unblock blocking operators by defining when such an operator can pass results on. Keep invariants define what must be kept in local state to continue successful operation. Propagation invariants define when an operator can pass punctuation on. We then present a strategy for proving that implementations of these invariants are faithful to their finite table counterparts. In practice, it is important to answer the following question: “How much additional overhead is required when using punctuations?” We use the scenario of a monitoring system for an online auction. Streams of bids, new items, and new users are sent to an online auction system. There are many interesting queries that can be posed over these auction streams. We define queries for this scenario, and execute them with different kinds and amounts of punctuations embedded in the input streams. We show that, for a reasonable ratio of punctuations to data items, the overhead is minimal. Additionally, we compare the behavior of a query using punctuations with the behavior of the same query using slack over data streams with disorder. Clearly, not all punctuations are useful to a particular query, and it would be useful to make a determination of when they are. That is, we would like to answer the question “Can stream query Q benefit from a particular set of punctuations?” To that end, we first define punctuation schemes to specify the collection of punctuations that will be presented to a query on a particular data stream. We show how both punctuations and query operators induce groupings over the items in the domain of the input(s). We show that a query benefits from an input punctuation scheme (in terms of being able to produce a given output scheme), if each set in the groupings induced by the operators of the query is covered by a finite number of punctuations in the scheme—a kind of compactness. We conclude with discussion on possible future directions of research related to punctuations and data streams. These directions focus on a variety of questions, ranging from issues in query optimization to other possible semantics that can be expressed using punctuations.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.14778/1453856.1453890

Out-of-order processing: a new architecture for high-performance stream systems

Jin Li, +5 more

- 01 Aug 2008

TL;DR: This work introduces a new architecture for stream systems, out-of-order processing (OOP), that avoids ordering constraints and shows that the OOP approach can significantly outperform IOP in a number of aspects, including memory, throughput and latency.

...read moreread less

206

Proceedings Article•10.1145/1559845.1559934

Stream warehousing with DataDepot

Lukasz Golab, +3 more

- 29 Jun 2009

TL;DR: The DataDepot architecture is discussed, with an emphasis on several of its novel and critical features, which are currently being used for five very large warehousing projects within AT&T.

...read moreread less

113

•Book Chapter•10.1007/978-3-642-02927-1_20

Annotations in Data Streams

Amit Chakrabarti, +2 more

- 06 Jul 2009

TL;DR: In this paper, the authors consider the problem of annotating a data stream as it is read and show upper bounds that achieve a non-trivial tradeoff between the amount of annotation used and the space required to verify it.

...read moreread less

60

Journal Article•10.1109/TKDE.2011.45

Scalable Scheduling of Updates in Streaming Data Warehouses

Lukasz Golab, +2 more

- 01 Jun 2012

- IEEE Transactions on Knowledge and Data ...

TL;DR: A scheduling framework is proposed that handles the complications encountered by a stream warehouse: view hierarchies and priorities, data consistency, inability to preempt updates, heterogeneity of update jobs caused by different interarrival times and data volumes among different sources, and transient overload.

...read moreread less

49

•Journal Article

Joining punctuated streams

Luping Ding, +3 more

- 01 Jan 2004

- Lecture Notes in Computer Science

TL;DR: The experimental results of comparing the performance of PJoin with XJoin, a stream join operator without a constraint-exploiting mechanism, show that PJoin significantly outperforms XJoin with regard to both memory overhead and throughput.

...read moreread less

43

...

Expand

References

•Book

Principles of mathematical analysis

Walter Rudin

- 01 Jan 1964

TL;DR: The real and complex number system as discussed by the authors is a real number system where the real number is defined by a real function and the complex number is represented by a complex field of functions.

...read moreread less

7.5K

•Proceedings Article•10.1145/543613.543615

Models and issues in data stream systems

Brian Babcock, +4 more

- 03 Jun 2002

TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.

...read moreread less

3K

Proceedings Article•10.1145/347090.347107

Mining high-speed data streams

Pedro Domingos, +1 more

- 01 Aug 2000

TL;DR: This paper describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example, and applies it to mining the continuous stream of Web access data from the whole University of Washington main campus.

...read moreread less

2.4K

Proceedings Article•10.1145/582095.582099

Access path selection in a relational database management system

P. Griffiths Selinger, +4 more

- 30 May 1979

TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.

...read moreread less

2.3K

•Journal Article•10.1007/S00778-003-0095-Z

Aurora: a new model and architecture for data stream management

Daniel J. Abadi, +8 more

- 01 Aug 2003

TL;DR: The basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications, are described and a stream-oriented set of operators are described.

...read moreread less

1.6K