Open AccessBook
Adaptive query processing in data stream management systems
Jennifer Widom,Shivnath Babu +1 more
- 01 Jan 2005
19
TL;DR: A generic framework, called StreaMon, for adaptive query processing in a Data Stream Management Systems (DSMS), is presented and three distinct combinations of continuous query type and adaptivity need are instantiated.
read more
Abstract: Many modern applications need to process data streams that consist of data elements generated in a continuous unbounded fashion. Examples include network monitoring, financial monitoring over stock tickers, sensor processing for environmental monitoring or inventory tracking, telecommunications fraud detection, and others. These applications have spurred interest in a new class of systems, called Data Stream Management Systems (DSMSs), that enable applications to pose long-running continuous queries over data streams.
A fundamental challenge faced by DSMSs is that stream conditions (e.g., data distribution, arrival rate) and system conditions (e.g., query load, memory availability) may vary significantly over the lifetime of a continuous query. When stream or system conditions change, a query execution strategy that was efficient before the change may become very inefficient. Consequently, it is important for a DSMS to support adaptive query processing: The DSMS must be prepared to change the execution plan for a continuous query while the query is running, based on how stream and system conditions change. Without adaptivity, plan performance may drop drastically over time.
This thesis presents a generic framework, called StreaMon, for adaptive query processing in a DSMS. StreaMon has three core components: (i) An Executor, which runs the current plan for each query, (ii) a Profiler, which collects and maintains statistics about current stream and system conditions, and (iii) a Re-optimizer, which ensures that the current plans are the most efficient for current conditions. We instantiate the generic StreaMon framework for three distinct combinations of continuous query type and adaptivity need: (1) Adaptive processing of commutative filters over a stream to maximize throughput at all points in time. (2) Adaptive placement of subresult caches in pipelined plans for windowed stream joins to maximize throughput at all points in time. (3) Detecting relaxed constraints automatically in input streams and exploiting these constraints to reduce memory requirements in plans for windowed stream joins. For each problem, we provide the definition and motivating examples, develop and analyze adaptive algorithms, and present implementation techniques and experimental results from the STREAM general-purpose DSMS prototype developed at Stanford.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
ARGUS: Efficient Scalable Continuous Query Optimization for Large-Volume Data Streams
Chun Jin,Jaime G. Carbonell +1 more
- 11 Dec 2006
TL;DR: The architecture of ARGUS is presented, a stream processing system implemented atop commercial DBMSs to support large-scale complex continuous queries over data streams and achieves well over a 100-fold improvement in performance.
Parallel processing of continuous queries over data streams
TL;DR: The presented system is shown to outperform them w.r.t. tuple latency (response time), memory usage, throughput and also tuple loss- critical parameters in any data stream management systems.
25
TIFA: Enabling Real-Time Querying and Storage of Massive Stream Data
TL;DR: This paper integrates Time Machine and FastBit as a real-time querying and storage of massive stream data system, named TIFA, designed for operation in Gbps network, accompanied with UTM as a module or an independent device.
14
Optimization of Monotonic Linear Progressive Queries Based on Dynamic Materialized Views
TL;DR: This paper presents a novel technique to efficiently process a special type of PQ, called monotonic linear PQs, based on dynamically materializedviews, and proposes a superior relationship graph forSQs from historicalPQs that can be used to estimate the benefit of keeping the current SQ result as a materialized view.
9
Dispatching stream operators in parallel execution of continuous queries
TL;DR: Results show that the dispatching method significantly improves system performance in terms of tuple latency, throughput, and tuple loss, and the fluctuation of system performance parameters diminishes considerably and leads to high adaptivity with the underlying system.
9
References
Space/time trade-offs in hash coding with allowable errors
TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
•Book
Approximation Algorithms for NP-Hard Problems
Dorit S. Hochba
- 01 Jan 1996
TL;DR: This book reviews the design techniques for approximation algorithms and the developments in this area since its inception about three decades ago and the "closeness" to optimum that is achievable in polynomial time.
3.5K
Models and issues in data stream systems
Brian Babcock,Shivnath Babu,Mayur Datar,Rajeev Motwani,Jennifer Widom +4 more
- 03 Jun 2002
TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.
Mining high-speed data streams
Pedro Domingos,Geoff Hulten +1 more
- 01 Aug 2000
TL;DR: This paper describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example, and applies it to mining the continuous stream of Web access data from the whole University of Washington main campus.
Access path selection in a relational database management system
P. Griffiths Selinger,Morton M. Astrahan,Donald D. Chamberlin,Raymond A. Lorie,T. G. Price +4 more
- 30 May 1979
TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.