Benchmarking MapReduce Implementations for Application Usage Scenarios

doi:10.1109/GRID.2011.21

Open AccessProceedings Article10.1109/GRID.2011.21

Benchmarking MapReduce Implementations for Application Usage Scenarios

Zacharia Fadika, +3 more

- 21 Sep 2011

- pp 90-97

34

TL;DR: A standard benchmark suite for quantifying, comparing, and contrasting the performance of MapReduce platforms under a wide range of representative use cases is presented and allows Grid researchers to choose the Map Reduce implementation that best suits their application's needs.

Abstract: The MapReduce paradigm provides a scalable model for large scale data-intensive computing and associated fault-tolerance. With data production increasing daily due to ever growing application needs, scientific endeavors, and consumption, the MapReduce model and its implementations need to be further evaluated, improved, and strengthened. Several MapReduce frameworks with various degrees of conformance to the key tenets of the model are available today, each, optimized for specific features. HPC application and middleware developers must thus understand the complex dependencies between MapReduce features and their application. We present a standard benchmark suite for quantifying, comparing, and contrasting the performance of MapReduce platforms under a wide range of representative use cases. We report the performance of three different MapReduce implementations on the benchmarks, and draw conclusions about their current performance characteristics. The three platforms we chose for evaluation are the widely used Apache Hadoop implementation, Twister, which has been discussed in the literature, and LEMO-MR, our own implementation. The performance analysis we perform also throws light on the available design decisions for future implementations, and allows Grid researchers to choose the MapReduce implementation that best suits their application's needs.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1016/J.JNCA.2017.04.007

Load-balancing algorithms in cloud computing

Einollah Jafarnejad Ghomi, +2 more

- 15 Jun 2017

- Journal of Network and Computer Applicat...

TL;DR: This paper study the literature on the task scheduling and load-balancing algorithms and present a new classification of such algorithms, for example, Hadoop MapReduce load balancing category, Natural Phenomena-based load balancing categories, Agent-basedLoadBalancing category, General load balancingcategory, application-oriented category, network-aware category, and workflow specific category.

...read moreread less

407

•Proceedings Article

Proceedings of the nineteenth ACM symposium on Operating systems principles

Michael L. Scott, +1 more

- 19 Oct 2003

TL;DR: This year's SOSP features a strong and varied technical program, showcasing the most significant and influential work in operating systems today, and encouraged student attendance through reduced registration fees and a strong program of financial scholarships.

...read moreread less

130

•Proceedings Article•10.1109/BIGDATA.2013.6691583

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures

Austin R. Benson, +2 more

- 23 Dec 2013

TL;DR: This paper describes how to compute a stable tall-and-skinny QR factorization on a MapReduce architecture in only slightly more than 2 passes over the data, and finds that the new stable method is competitive with unstable methods for matrices with a modest number of columns.

...read moreread less

71

•Proceedings Article•10.1109/CLOUD.2013.31

An Evaluation of Cassandra for Hadoop

Elif Dede, +4 more

- 28 Jun 2013

TL;DR: This paper presents a thorough evaluation of the Cassandra NoSQL database when used in conjunction with the Hadoop MapReduce engine and characterize the performance for a wide range of representative use cases, and then compare, contrast, and evaluate.

...read moreread less

64

•Proceedings Article•10.1109/CCGRID.2012.135

MARLA: MapReduce for Heterogeneous Clusters

Zacharia Fadika, +3 more

- 13 May 2012

TL;DR: This paper addresses the problems associated with existing MapReduce implementations affecting cluster heterogeneity, and subsequently presents MARLA, a Map Reduce framework capable of performing well not only in homogeneous settings, but also when the cluster exhibits heterogeneous properties.

...read moreread less

53

...

Expand

References

Journal Article•10.21276/IJRE.2018.5.5.4

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

22.7K

Journal Article•10.1145/1327452.1327492

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008

- Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

18.6K

Journal Article•10.1145/1165389.945450

The Google file system

Sanjay Ghemawat, +2 more

- 19 Oct 2003

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.

...read moreread less

6.3K

Proceedings Article•10.1109/MSST.2010.5496972

The Hadoop Distributed File System

Konstantin Shvachko, +3 more

- 03 May 2010

TL;DR: The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.

...read moreread less

5.8K

•Proceedings Article•10.5555/1855741.1855744

Improving MapReduce performance in heterogeneous environments

Matei Zaharia, +4 more

- 08 Dec 2008

TL;DR: A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.

...read moreread less

1.9K

...

Expand

Benchmarking MapReduce Implementations for Application Usage Scenarios

Chat with Paper

AI Agents for this Paper

Citations

Load-balancing algorithms in cloud computing

Proceedings of the nineteenth ACM symposium on Operating systems principles

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures

An Evaluation of Cassandra for Hadoop

MARLA: MapReduce for Heterogeneous Clusters

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

The Google file system

The Hadoop Distributed File System

Improving MapReduce performance in heterogeneous environments

Related Papers (5)

MapReduce: simplified data processing on large clusters

Benchmarking MapReduce implementations under different application scenarios

The HiBench benchmark suite: Characterization of the MapReduce-based data analysis

The survey on mapreduce

The Hadoop Distributed File System