Benchmarking cloud-based data management systems
Yingjie Shi,Xiaofeng Meng,Jing Zhao,Xiangmei Hu,Bingbing Liu,Haiping Wang +5 more
- 30 Oct 2010
- pp 47-54
58
TL;DR: This work conducted comprehensive experiments of several representative cloud-based data management systems to explore relative performance of different implementation approaches and the results are valuable for further research and development of cloud- based data management system.
read more
Abstract: Cloud-based data management system is emerging as a scalable, fault tolerant and efficient solution to large scale data management. More and more companies are moving their data management applications from expensive, high-end ser-vers to the cloud which is composed of cheaper, commodity machines. The implementations of existing cloud-based data management systems represent a wide range of approaches, including storage architectures, data models, tradeoffs in consistency and availability, etc. Several benchmarks have been proposed to evaluate the performance. However, there were no reported studies about these benchmark results which provide users with insights on the impacts of different implementation approaches on the performance. We conducted comprehensive experiments of several representative cloud-based data management systems to explore relative performance of different implementation approaches the results are valuable for further research and development of cloud-based data management systems.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Performance evaluation of NoSQL big-data applications using multi-formalism models
TL;DR: A dedicated modeling language and an application are presented, showing first how it is possible to ease the modeling process and second how the semantic gap between modeling logic and the domain can be reduced, by means of vertical multiformalism modeling.
112
A Novel Triple Encryption Scheme for Hadoop-Based Cloud Data Security
Chao Yang,Weiwei Lin,Mingqi Liu +2 more
- 09 Sep 2013
TL;DR: A novel triple encryption scheme is proposed in this paper, which combines HDFS files encryption using DEA and the data key encryption with RSA, and then encrypts the user's RSA private key using IDEA.
57
Design Patterns to Enable Data Portability between Clouds' Databases
Mahdi Negahi Shirazi,Ho Chin Kuan,Hossein Dolatabadi +2 more
- 18 Jun 2012
TL;DR: A solution for enabling portability between column family databases and graph databases as cloud databases by proposing design patterns is provided.
33
Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges
Fatih Gurcan,Muhammet Berigel +1 more
- 01 Oct 2018
TL;DR: A lifecycle for the real-time big data processing is defined by associating existing tools, tasks, and frameworks with the phases of the lifecycle, which include data ingestion, data storage, stream processing, analytical data store, and analysis and reporting.
31
Performance Evaluation of Range Queries in Key Value Stores
Pouria Pirzadeh,Junichi Tatemura,Oliver Po,Hakan Hacigumus +3 more
- 01 Mar 2012
TL;DR: This paper compares Cassandra, HBase and Voldemort in terms of their support for different types of query workloads, mainly focused on the range queries, and shows that there are trade-offs in the performance of the selected system and scheme, and the types of the queries that can be processed efficiently.
30
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
The Google file system
Sanjay Ghemawat,Howard Gobioff,Shun-Tak Albert Leung +2 more
- 19 Oct 2003
TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Benchmarking cloud serving systems with YCSB
Brian F. Cooper,Adam Silberstein,Erwin Tam,Raghu Ramakrishnan,Russell Sears +4 more
- 10 Jun 2010
TL;DR: This work presents the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems, and defines a core set of benchmarks and reports results for four widely used systems.
Bigtable: a distributed storage system for structured data
Fay W. Chang,Jeffrey Dean,Sanjay Ghemawat,Wilson C. Hsieh,Deborah A. Wallach,Michael Burrows,Tushar Deepak Chandra,Andrew Fikes,Robert E. Gruber +8 more
- 06 Nov 2006
TL;DR: Bigtable as discussed by the authors is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers, including web indexing, Google Earth and Google Finance.