Proceedings Article10.1109/I-SPAN.2012.9
Big Data Processing in Cloud Computing Environments
Changqing Ji,Yu Li,Wenming Qiu,Uchechukwu Awada,Keqiu Li +4 more
- 13 Dec 2012
- pp 17-23
306
TL;DR: This paper presents the key issues of big data processing, including cloud computing platform, cloud architecture, cloud database and data storage scheme, and introduces Map Reduce optimization strategies and applications reported in the literature.
read more
Abstract: With the rapid growth of emerging applications like social network analysis, semantic Web analysis and bioinformatics network analysis, a variety of data to be processed continues to witness a quick increase. Effective management and analysis of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing technics from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including cloud computing platform, cloud architecture, cloud database and data storage scheme. Following the Map Reduce parallel processing framework, we then introduce Map Reduce optimization strategies and applications reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The rise of big data on cloud computing
Ibrahim Abaker Targio Hashem,Ibrar Yaqoob,Nor Badrul Anuar,Salimah Binti Mokhtar,Abdullah Gani,Samee U. Khan +5 more
TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.
2.6K
Applications of big data to smart cities
TL;DR: The review reveals that several opportunities are available for utilizing big data in smart cities; however, there are still many issues and challenges to be addressed to achieve better utilization of this technology.
Big Data and cloud computing: innovation opportunities and challenges
TL;DR: This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global digital earth science and applications.
774
The IoT for smart sustainable cities of the future: An analytical framework for sensor-based big data applications for environmental sustainability
TL;DR: This paper proposes a framework which brings together a large number of previous studies on smart cities and sustainable cities, including research directed at a more conceptual, analytical, and overarching level, as well as research on specific technologies and their novel applications to add additional depth to studies in the field of smart sustainable cities.
671
Internet of Things (IoT) and the Energy Sector
TL;DR: The existing literature on the application of IoT in in energy systems, in general, and in the context of smart grids particularly is reviewed, and challenges of deploying IoT in the energy sector are reviewed, including privacy and security.
632
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
The Google file system
Sanjay Ghemawat,Howard Gobioff,Shun-Tak Albert Leung +2 more
- 19 Oct 2003
TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Dynamo: amazon's highly available key-value store
Giuseppe deCandia,Deniz Hastorun,Madan Mohan Rao Jampani,Gunavardhan Kakulapati,Avinash Lakshman,Alex Pilchin,Swaminathan Sivasubramanian,Peter Sven Vosshall,Werner Vogels +8 more
- 14 Oct 2007
TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
- 06 Jun 2010
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.