Open AccessProceedings Article
Distributing frequency-dependent data stream computations
Sumit Ganguly
- 01 Jan 2009
- pp 163-170
TL;DR: It is shown that the class of data stream computations that approximate functions of the frequency vector of the stream can be computed efficiently in a distributed manner.
read more
Abstract: Data stream computations in domains such as internet applications are often performed in a highly distributed fashion in order to save time. An example is the class of applications that use the Google Mapreduce framework of scalable distributed processing as presented by (Dean & Ghemawat 2004). A basic question here is: what kind of data stream computations admit scalable and efficient distributed algorithms? We show that the class of data stream computations that approximate functions of the frequency vector of the stream can be computed efficiently in a distributed manner.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Turnstile streaming algorithms might as well be linear sketches
Yi Li,Huy Nguyen,David P. Woodruff +2 more
- 31 May 2014
TL;DR: This work shows that to prove space lower bounds for 1-pass streaming algorithms, it suffices to prove lower bounds in the simultaneous model of communication complexity, rather than the stronger 1-way model.
148
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
The Space Complexity of Approximating the Frequency Moments
TL;DR: In this paper, the authors considered the space complexity of randomized algorithms that approximate the frequency moments of a sequence, where the elements of the sequence are given one by one and cannot be stored.
1.8K
•Proceedings Article
On distributing symmetric streaming computations
Jon Feldman,S. Muthukrishnan,Anastasios Sidiropoulos,Cliff Stein,Zoya Svitkina +4 more
- 20 Jan 2008
TL;DR: In this article, the authors introduce a simple algorithmic model for massive, unordered, distributed (mud) computation, as implemented by MapReduce and Hadoop, and show that mud algorithms are equivalent in power to symmetric streaming algorithms.
Lower Bounds on Frequency Estimation of Data Streams (Extended Abstract)
Sumit Ganguly
- 07 Jun 2008
TL;DR: Any deterministic algorithm for this problem requires space Open image in new window bits, which is a basic problem in the general data streaming model.
27
Lower bounds on frequency estimation of data streams
Sumit Ganguly
- 07 Jun 2008
TL;DR: In this article, it was shown that any deterministic algorithm for this problem requires space Ω(∈-2(log∥f∥1)(log n)(log-1( ∈-1)) bits).