Open Access
Real-time query systems for complex data sources
Matt Welsh,Ian Rose +1 more
- 01 Jan 2011
3
TL;DR: By including the data acquisition process in the overall system design, it is possible to build scalable, real-time stream processing systems for complex data sources, and allowing queries to provide feedback to the collection process.
read more
Abstract: This dissertation presents techniques for building scalable systems that allow real-time querying of complex data sources. In recent years, networking and sensing advances have dramatically increased the volume of information available to data consumers. However, coping with large scales and high data rates often requires processing data in real time, as it arrives, rather than storing it for later analysis. Our thesis is that by including the data acquisition process in the overall system design, it is possible to build scalable, real-time stream processing systems for complex data sources.
We have built two systems to demonstrate a number of unique design features required for scalable operation in our chosen domains. Cobra is a system that taps online RSS feeds (such as blogs, news articles and websites' user comments) as its data source. Cobra repeatedly crawls a set of RSS feeds, matching the contents to keyword-based user queries, similar to those used in Web search engines. As RSS-based content can change frequently, the design ensures that the latency between crawls is low, while still scaling to a large number of RSS feeds and many concurrent user queries.
Secondly, Argos is a system for widely-distributed, outdoor wireless network monitoring. Capturing 802.11 WiFi traffic across a large urban area, Argos enables a wide range of user queries, such as mobile node tracking, malware detection, and traffic characterization. Use of a wireless mesh network to connect the deployed sniffer nodes introduces additional challenges due to its limited bandwidth capacity. To address this restriction, we designed a novel in-network packet merging process and demonstrate its bandwidth savings. Additionally, Argos provides a variety of channel management schemes; 802.11 defines up to 14 radio channels but each sniffer can only capture from one channel at a time, necessitating policies for when to capture from which channel.
These systems are built around three design principles that aid in the real-time querying of complex data sources: query interfaces tailored to the application's specific data types, optimized data collection processes, and allowing queries to provide feedback to the collection process.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A low-delay protocol for multishop wireless body area networks
Benoît Latré,Bart Braem,Ingrid Moerman,Chris Blondia,Piet Demeester +4 more
- 01 Jan 2008
TL;DR: A new cross-layer communication protocol for WBANs: CICADA or Cascading Information retrieval by Controlling Access with Distributed slot Assignment, which offers low delay and good resilience to mobility.
237
•Book
Real-Time & Stream Data Management: Push-Based Data in Research & Practice
Felix Gessert,Wolfram Wingerath,Norbert Ritter +2 more
- 19 Jan 2019
TL;DR: Data stream management systems are similar to real-time databases in several ways: they support continuous queries, i.e. they proactively deliver information as soon as new data of relevance becomes available, and they are also capable of ad hoc queries over currently buffered data.
11
"One Size Fits All": An Idea Whose Time Has Come and Gone?
Jens Dittrich,Franz Färber,Goetz Graefe,Henrik Loeser,Wilfried Reimann,Harald Schöning +5 more
- 01 Jan 2011
TL;DR: In einer späteren Publikation [St07] stellte Michael Stonebraker sogar die These auf, dass es keine Anwendungen gibt, für die die traditionellen Datenbanksysteme die beste Alternative sind.
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
The anatomy of a large-scale hypertextual Web search engine
Sergey Brin,Lawrence Page +1 more
- 01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
•Journal Article
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin,Lawrence Page +1 more
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
13.3K
•Proceedings Article
The Anatomy of Large-scale Hypertextual Web Search Engine
S. Brin
- 01 Jan 1998
TL;DR: We present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext to produce better search results.
9.7K
Earthquake shakes Twitter users: real-time event detection by social sensors
Takeshi Sakaki,Makoto Okazaki,Yutaka Matsuo +2 more
- 26 Apr 2010
TL;DR: This paper investigates the real-time interaction of events such as earthquakes in Twitter and proposes an algorithm to monitor tweets and to detect a target event and produces a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location.
Related Papers (1)
J Colquhoun,Paul Watson +1 more
- 01 Jan 2010