Parallel Data Processing in the Cloud using Nephele
TL;DR: Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today’s IaaS clouds for both, task scheduling and execution.
read more
Abstract: In recent years, Infrastructure-as-a-Service (IaaS) clouds have become increasingly popular as a flexible and inexpensive platform for ad-hoc parallel data processing. Major players in cloud computing have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, currently used processing frameworks have been designed for static, homogeneous cluster systems and do not support the new features which distinguish the cloud platform. In this paper discussion is being done on the research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today‟s IaaS clouds for both, task scheduling and execution. First performance results of Nephele are presented and its efficiency is compared with one of the well-known software, MapReduce. MapReduce is chosen for comparison since it is open source software and currently enjoys high popularity in the data processing community.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Intrusion detection techniques for mobile cloud computing in heterogeneous 5G
TL;DR: It is concluded that the implementation of mobile cloud computing can be secured by the proposed framework because it will provide well-protected Web services and adaptable IDSs in the complicated heterogeneous 5G environment.
182
Approaches for optimizing virtual machine placement and migration in cloud environments: A survey
TL;DR: This work presents a cloud computing background, a review of several proposals, a discussion of problem formulations, advantages and shortcomings of reviewed works, and provides several open issues, showing the relevancy of the topic in an increasing and demanding market.
123
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Dryad: distributed data-parallel programs from sequential building blocks
Michael Isard,Mihai Budiu,Yuan Yu,Andrew Birrell,Dennis Fetterly +4 more
- 21 Mar 2007
TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Ewa Deelman,Gurmeet Singh,Mei-Hui Su,Jim Blythe,Yolanda Gil,Carl Kesselman,Gaurang Mehta,Karan Vahi,G. Bruce Berriman,John C. Good,Anastasia C. Laity,Joseph C. Jacob,Daniel S. Katz +12 more
TL;DR: The results of improving application performance through workflow restructuring which clusters multiple tasks in a workflow into single entities are presented.
Condor-G: A Computation Management Agent for Multi-Institutional Grids
TL;DR: Condor-G as discussed by the authors leverages software from Globus and Condor to enable users to harness multi-domain resources as if they all belong to one personal domain, and it handles job management, resource selection, security, and fault tolerance.
848
A Dynamic Resource Allocation Method for Parallel DataProcessing in Cloud Computing
TL;DR: A novel Turnaround time utility scheduling approach which focuses on both high priority and the low priority tasks that arrives for scheduling is proposed.
32