Adaptive query processing in data stream management systems

Open AccessBook

Adaptive query processing in data stream management systems

- 01 Jan 2005

19

TL;DR: A generic framework, called StreaMon, for adaptive query processing in a Data Stream Management Systems (DSMS), is presented and three distinct combinations of continuous query type and adaptivity need are instantiated.

Abstract: Many modern applications need to process data streams that consist of data elements generated in a continuous unbounded fashion. Examples include network monitoring, financial monitoring over stock tickers, sensor processing for environmental monitoring or inventory tracking, telecommunications fraud detection, and others. These applications have spurred interest in a new class of systems, called Data Stream Management Systems (DSMSs), that enable applications to pose long-running continuous queries over data streams. A fundamental challenge faced by DSMSs is that stream conditions (e.g., data distribution, arrival rate) and system conditions (e.g., query load, memory availability) may vary significantly over the lifetime of a continuous query. When stream or system conditions change, a query execution strategy that was efficient before the change may become very inefficient. Consequently, it is important for a DSMS to support adaptive query processing: The DSMS must be prepared to change the execution plan for a continuous query while the query is running, based on how stream and system conditions change. Without adaptivity, plan performance may drop drastically over time. This thesis presents a generic framework, called StreaMon, for adaptive query processing in a DSMS. StreaMon has three core components: (i) An Executor, which runs the current plan for each query, (ii) a Profiler, which collects and maintains statistics about current stream and system conditions, and (iii) a Re-optimizer, which ensures that the current plans are the most efficient for current conditions. We instantiate the generic StreaMon framework for three distinct combinations of continuous query type and adaptivity need: (1) Adaptive processing of commutative filters over a stream to maximize throughput at all points in time. (2) Adaptive placement of subresult caches in pipelined plans for windowed stream joins to maximize throughput at all points in time. (3) Detecting relaxed constraints automatically in input streams and exploiting these constraints to reduce memory requirements in plans for windowed stream joins. For each problem, we provide the definition and motivating examples, develop and analyze adaptive algorithms, and present implementation techniques and experimental results from the STREAM general-purpose DSMS prototype developed at Stanford.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/IDEAS.2006.11

ARGUS: Efficient Scalable Continuous Query Optimization for Large-Volume Data Streams

Chun Jin, +1 more

- 11 Dec 2006

TL;DR: The architecture of ARGUS is presented, a stream processing system implemented atop commercial DBMSs to support large-scale complex continuous queries over data streams and achieves well over a 100-fold improvement in performance.

...read moreread less

61

Journal Article•10.1007/S10619-010-7066-3

Parallel processing of continuous queries over data streams

Ali Asghar Safaei, +1 more

- 01 Dec 2010

- Distributed and Parallel Databases

TL;DR: The presented system is shown to outperform them w.r.t. tuple latency (response time), memory usage, throughput and also tuple loss- critical parameters in any data stream management systems.

...read moreread less

25

Proceedings Article•10.1109/ICNDC.2011.20

TIFA: Enabling Real-Time Querying and Storage of Massive Stream Data

Jun Li, +5 more

- 21 Sep 2011

TL;DR: This paper integrates Time Machine and FastBit as a real-time querying and storage of massive stream data system, named TIFA, designed for operation in Gbps network, accompanied with UTM as a module or an independent device.

...read moreread less

14

Journal Article•10.1093/COMJNL/BXT021

Optimization of Monotonic Linear Progressive Queries Based on Dynamic Materialized Views

Chao Zhu, +2 more

- 01 May 2014

- The Computer Journal

TL;DR: This paper presents a novel technique to efficiently process a special type of PQ, called monotonic linear PQs, based on dynamically materializedviews, and proposes a superior relationship graph forSQs from historicalPQs that can be used to estimate the benefit of keeping the current SQ result as a materialized view.

...read moreread less

9

Journal Article•10.1007/S11227-011-0621-5

Dispatching stream operators in parallel execution of continuous queries

Ali Asghar Safaei, +1 more

- 01 Sep 2012

- The Journal of Supercomputing

TL;DR: Results show that the dispatching method significantly improves system performance in terms of tuple latency, throughput, and tuple loss, and the fluctuation of system performance parameters diminishes considerably and leads to high adaptivity with the underlying system.

...read moreread less

9

...

Expand

References

Journal Article•10.1145/362686.362692

Space/time trade-offs in hash coding with allowable errors

Burton H. Bloom

- 01 Jul 1970

- Communications of The ACM

TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

...read moreread less

8.3K

•Book

Approximation Algorithms for NP-Hard Problems

Dorit S. Hochba

- 01 Jan 1996

TL;DR: This book reviews the design techniques for approximation algorithms and the developments in this area since its inception about three decades ago and the "closeness" to optimum that is achievable in polynomial time.

...read moreread less

3.5K

•Proceedings Article•10.1145/543613.543615

Models and issues in data stream systems

Brian Babcock, +4 more

- 03 Jun 2002

TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.

...read moreread less

3K

Proceedings Article•10.1145/347090.347107

Mining high-speed data streams

Pedro Domingos, +1 more

- 01 Aug 2000

TL;DR: This paper describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example, and applies it to mining the continuous stream of Web access data from the whole University of Washington main campus.

...read moreread less

2.4K

Proceedings Article•10.1145/582095.582099

Access path selection in a relational database management system

P. Griffiths Selinger, +4 more

- 30 May 1979

TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.

...read moreread less

2.3K