Open AccessJournal Article
Implementation Issues of A Cloud Computing Platform.
Bo Peng,Bin Cui,Xiaoming Li +2 more
TL;DR: This paper designs a GFS compatible file system with variable chunk size to facilitate massive data processing, and introduces some implementation enhancement on MapReduce to improve the system throughput.
read more
Abstract: Cloud computing is Internet based system development in which large scalable computing resources are provided “as a service” over the Internet to users. The concept of cloud computing incorporates web infrastructure, software as a service (SaaS), Web 2.0 and other emerging technologies, and has attracted more and more attention from industry and research community. In this paper, we describe our experience and lessons learnt in construction of a cloud computing platform. Specifically, we design a GFS compatible file system with variable chunk size to facilitate massive data processing, and introduce some implementation enhancement on MapReduce to improve the system throughput. We also discuss some practical issues for system implementation. In association of the China web archive (Web InfoMall) which we have been accumulating since 2001 (now it contains over three billion Chinese web pages), this paper presents our attempt to implement a platform for a domain specific cloud computing service, with large scale web text mining as targeted application. And hopefully researchers besides our selves will benefit from the cloud when it is ready.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Survey on Open-source Cloud Computing Solutions
Patricia Takako Endo,Glauco Estácio Gonçalves,Judith Kelner,Djamel Sadok +3 more
- 01 Jan 2010
TL;DR: The state of the of open-source solutions for cloud computing is presented and the authors hope that the observation and classification of such solutions can leverage the cloud computing research area providing a good starting point to cope with some of the problems present in cloud computing environments.
Cloud Databases: A Paradigm Shift in Databases
Indu Arora,Anu Gupta +1 more
- 01 Jan 2012
TL;DR: The state of the art in the cloud databases and various architectures is reviewed, the challenges to develop cloud databases that meet the user requirements are assessed and popularly used Cloud databases such as Big Table, Sherpa and SimpleDB are discussed.
•Journal Article
Elliptic Curve Cryptography for Securing Cloud Computing Applications
TL;DR: Elliptic Curve Cryptography scheme is proposed as a secure tool to model a Secured platform for the Cloud Application.
Patent
Debugging a map reduce application on a cluster
Mikhail Berlyant,Daniel Stephen Rule,Christopher Edward Miller,Cynthia Lok +3 more
- 14 Sep 2010
TL;DR: In this paper, a map-reduce framework is installed on a cluster of two or more computers by installing an integrated development environment [IDE] onto each computer and data is placed into the cluster.
24
Different Job Scheduling Methodologies for Web Application and Web Server in a Cloud Computing Environment
Praveen Gupta,Nitin Rakesh +1 more
- 19 Nov 2010
TL;DR: The various methodologies adopted to handle all the processes and jobs concurrently executing and waiting into the web application and web server housed into the same system or different systems are dealt with.
23
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
The Google file system
Sanjay Ghemawat,Howard Gobioff,Shun-Tak Albert Leung +2 more
- 19 Oct 2003
TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Bigtable: A Distributed Storage System for Structured Data
Fay W. Chang,Jeffrey Dean,Sanjay Ghemawat,Wilson C. Hsieh,Deborah A. Wallach,Michael Burrows,Tushar Deepak Chandra,Andrew Fikes,Robert E. Gruber +8 more
TL;DR: The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.
3.5K
Improving MapReduce performance in heterogeneous environments
Matei Zaharia,Andy Konwinski,Anthony D. Joseph,Randy H. Katz,Ion Stoica +4 more
- 08 Dec 2008
TL;DR: A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.