Comet: batched stream processing for data intensive distributed computing

doi:10.1145/1807128.1807139

Proceedings Article10.1145/1807128.1807139

Comet: batched stream processing for data intensive distributed computing

Bingsheng He, +6 more

- 10 Jun 2010

- pp 63-74

156

TL;DR: A query processing system called Comet is developed that embraces batched stream processing and integrates with DryadLINQ, and when applied to a real production trace covering over 19 million machine-hours shows an estimated I/O saving of over 50%.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

Matei Zaharia, +8 more

- 25 Apr 2012

TL;DR: Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.

...read moreread less

4.6K

•Proceedings Article•10.5555/1972457.1972488

Mesos: a platform for fine-grained resource sharing in the data center

Benjamin Hindman, +7 more

- 30 Mar 2011

TL;DR: The results show that Mesos can achieve near-optimal data locality when sharing the cluster among diverse frameworks, can scale to 50,000 (emulated) nodes, and is resilient to failures.

...read moreread less

1.9K

•Proceedings Article•10.1145/2517349.2522737

Discretized streams: fault-tolerant streaming computation at scale

Matei Zaharia, +5 more

- 03 Nov 2013

TL;DR: D-Streams enable a parallel recovery mechanism that improves efficiency over traditional replication and backup schemes, and tolerates stragglers, and can easily be composed with batch and interactive query models like MapReduce, enabling rich applications that combine these modes.

...read moreread less

1.1K

•Proceedings Article

Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters

Matei Zaharia, +4 more

- 12 Jun 2012

TL;DR: D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup solutions in streaming databases: parallel recovery of lost state across the cluster.

...read moreread less

551

•Journal Article•10.1016/J.JNCA.2017.12.001

Distributed data stream processing and edge computing

Marcos Dias de Assuno, +2 more

- 01 Feb 2018

- Journal of Network and Computer Applicat...

TL;DR: This work describes how existing solutions exploit resource elasticity features of cloud computing in stream processing and presents a gap analysis and future directions on stream processing on heterogeneous environments.

...read moreread less

323

...

Expand

References

Journal Article•10.21276/IJRE.2018.5.5.4

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

22.7K

Journal Article•10.1145/1327452.1327492

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008

- Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

18.6K

Journal Article•10.1145/1165389.945450

The Google file system

Sanjay Ghemawat, +2 more

- 19 Oct 2003

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.

...read moreread less

6.3K

•Proceedings Article•10.1145/543613.543615

Models and issues in data stream systems

Brian Babcock, +4 more

- 03 Jun 2002

TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.

...read moreread less

3K

Proceedings Article•10.1145/1272996.1273005

Dryad: distributed data-parallel programs from sequential building blocks

Michael Isard, +4 more

- 21 Mar 2007

TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.

...read moreread less

3K

...

Expand

Comet: batched stream processing for data intensive distributed computing

Chat with Paper

AI Agents for this Paper

Citations

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

Mesos: a platform for fine-grained resource sharing in the data center

Discretized streams: fault-tolerant streaming computation at scale

Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters

Distributed data stream processing and edge computing

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

The Google file system

Models and issues in data stream systems

Dryad: distributed data-parallel programs from sequential building blocks

Related Papers (5)

MapReduce: simplified data processing on large clusters

Dryad: distributed data-parallel programs from sequential building blocks

MapReduce online

Discretized streams: fault-tolerant streaming computation at scale

The Google file system