Real-time query systems for complex data sources

Open Access

Real-time query systems for complex data sources

- 01 Jan 2011

3

TL;DR: By including the data acquisition process in the overall system design, it is possible to build scalable, real-time stream processing systems for complex data sources, and allowing queries to provide feedback to the collection process.

Abstract: This dissertation presents techniques for building scalable systems that allow real-time querying of complex data sources. In recent years, networking and sensing advances have dramatically increased the volume of information available to data consumers. However, coping with large scales and high data rates often requires processing data in real time, as it arrives, rather than storing it for later analysis. Our thesis is that by including the data acquisition process in the overall system design, it is possible to build scalable, real-time stream processing systems for complex data sources. We have built two systems to demonstrate a number of unique design features required for scalable operation in our chosen domains. Cobra is a system that taps online RSS feeds (such as blogs, news articles and websites' user comments) as its data source. Cobra repeatedly crawls a set of RSS feeds, matching the contents to keyword-based user queries, similar to those used in Web search engines. As RSS-based content can change frequently, the design ensures that the latency between crawls is low, while still scaling to a large number of RSS feeds and many concurrent user queries. Secondly, Argos is a system for widely-distributed, outdoor wireless network monitoring. Capturing 802.11 WiFi traffic across a large urban area, Argos enables a wide range of user queries, such as mobile node tracking, malware detection, and traffic characterization. Use of a wireless mesh network to connect the deployed sniffer nodes introduces additional challenges due to its limited bandwidth capacity. To address this restriction, we designed a novel in-network packet merging process and demonstrate its bandwidth savings. Additionally, Argos provides a variety of channel management schemes; 802.11 defines up to 14 radio channels but each sniffer can only capture from one channel at a time, necessitating policies for when to capture from which channel. These systems are built around three design principles that aid in the real-time querying of complex data sources: query interfaces tailored to the application's specific data types, optimized data collection processes, and allowing queries to provide feedback to the collection process.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

A low-delay protocol for multishop wireless body area networks

BenoÃ®t LatrÃ©, +4 more

- 01 Jan 2008

TL;DR: A new cross-layer communication protocol for WBANs: CICADA or Cascading Information retrieval by Controlling Access with Distributed slot Assignment, which offers low delay and good resilience to mobility.

...read moreread less

237

•Book

Real-Time & Stream Data Management: Push-Based Data in Research & Practice

Felix Gessert, +2 more

- 19 Jan 2019

TL;DR: Data stream management systems are similar to real-time databases in several ways: they support continuous queries, i.e. they proactively deliver information as soon as new data of relevance becomes available, and they are also capable of ad hoc queries over currently buffered data.

...read moreread less

11

"One Size Fits All": An Idea Whose Time Has Come and Gone?

Jens Dittrich, +5 more

- 01 Jan 2011

TL;DR: In einer späteren Publikation [St07] stellte Michael Stonebraker sogar die These auf, dass es keine Anwendungen gibt, für die die traditionellen Datenbanksysteme die beste Alternative sind.

...read moreread less

References

Journal Article•10.21276/IJRE.2018.5.5.4

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

22.7K

Journal Article•10.1016/S0169-7552(98)00110-X

The anatomy of a large-scale hypertextual Web search engine

Sergey Brin, +1 more

- 01 Apr 1998

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

16.6K

•Journal Article

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Sergey Brin, +1 more

- 01 Jan 1998

- Computer Networks

TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

...read moreread less

13.3K

•Proceedings Article

The Anatomy of Large-scale Hypertextual Web Search Engine

S. Brin

- 01 Jan 1998

TL;DR: We present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext to produce better search results.

...read moreread less

9.7K

Proceedings Article•10.1145/1772690.1772777

Earthquake shakes Twitter users: real-time event detection by social sensors

Takeshi Sakaki, +2 more

- 26 Apr 2010

TL;DR: This paper investigates the real-time interaction of events such as earthquakes in Twitter and proposes an algorithm to monitor tweets and to detect a target event and produces a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location.

...read moreread less

4.2K

...

Expand

Related Papers (1)

Query Matching in a BitTorrent-Based P2P Database System

[...]

J Colquhoun, +1 more

- 01 Jan 2010