Open AccessPosted Content
A Frequency Scaling based Performance Indicator Framework for Big Data Systems
TL;DR: A novel indicator framework which can directly compare the impact of different indicators with each other is proposed to identify and analyze the performance bottleneck efficiently and a methodology which can construct the indicator from the performance change with the CPU frequency scaling is described.
read more
Abstract: It is important for big data systems to identify their performance bottleneck. However, the popular indicators such as resource utilizations, are often misleading and incomparable with each other. In this paper, a novel indicator framework which can directly compare the impact of different indicators with each other is proposed to identify and analyze the performance bottleneck efficiently. A methodology which can construct the indicator from the performance change with the CPU frequency scaling is described. Spark is used as an example of a big data system and two typical SQL benchmarks are used as the workloads to evaluate the proposed method. Experimental results show that the proposed method is accurate compared with the resource utilization method and easy to implement compared with some white-box method. Meanwhile, the analysis with our indicators lead to some interesting findings and valuable performance optimization suggestions for big data systems.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
The ganglia distributed monitoring system: design, implementation, and experience
Matt Massie,Brent N. Chun,David E. Culler +2 more
- 01 Jul 2004
TL;DR: The design, implementation, and evaluation of Ganglia are presented along with experience gained through real world deployments on systems of widely varying scale, configurations, and target application domains over the last two and a half years.
Runtime measurements in the cloud: observing, analyzing, and reducing variance
Jörg Schad,Jens Dittrich,Jorge-Arnulfo Quiané-Ruiz +2 more
- 01 Sep 2010
TL;DR: A study of the performance variance of the most widely used Cloud infrastructure (Amazon EC2) from different perspectives using established microbenchmarks to measure performance variance in CPU, I/O, and network and a multi-node MapReduce application to quantify the impact on real dataintensive applications.
Clash of the titans: MapReduce vs. Spark for large scale data analytics
Juwei Shi,Yunjie Qiu,Umar Farooq Minhas,Limei Jiao,Chen Wang,Berthold Reinwald,Fatma Ozcan +6 more
- 01 Sep 2015
TL;DR: This paper evaluates the major architectural components in MapReduce and Spark frameworks including: shuffle, execution model, and caching, by using a set of important analytic workloads and shows that Map Reduce's execution model is more efficient for shuffling data than Spark, thus making Sort run faster on MapReduces.
The making of TPC-DS
Raghunath Nambiar,Meikel Poess +1 more
- 01 Sep 2006
TL;DR: The main characteristics of TPC-DS are described, why some of the key decisions were made and which performance aspects of decision support system it measures are explained.
Related Papers (5)
Ka Yee Wong,Raymond K. Wong +1 more
- 01 Oct 2020
Tome Eftimov,Peter Korošec +1 more
- 07 Jul 2021
Donghun Lee,Jong-Jin Park +1 more
- 23 Aug 2012
Luigi Lavazza,Sandro Morasca +1 more
- 15 Jun 2017