Journal Article10.1007/S10766-017-0513-2
Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem
83
TL;DR: Results show that the proposed system working with Spark on top and GPUs under the parallel and distributed environment of Hadoop ecosystem is more efficient and real-time as compared to existing standalone CPU-based MapReduce implementation.
read more
Abstract: In this technological era, every person, authorities, entrepreneurs, businesses, and many things around us are connected to the internet, forming Internet of thing (IoT). This generates a massive amount of diverse data with very high-speed, termed as big data. However, this data is very useful that can be used as an asset for the businesses, organizations, and authorities to predict future in various aspects. However, efficiently processing Big Data while making real-time decisions is a quite challenging task. Some of the tools like Hadoop are used for Big Datasets processing. On the other hand, these tools could not perform well in the case of real-time high-speed stream processing. Therefore, in this paper, we proposed an efficient and real-time Big Data stream processing approach while mapping Hadoop MapReduce equivalent mechanism on graphics processing units (GPUs). We integrated a parallel and distributed environment of Hadoop ecosystem and a real-time streaming processing tool, i.e., Spark with GPU to make the system more powerful in order to handle the overwhelming amount of high-speed streaming. We designed a MapReduce equivalent algorithm for GPUs for a statistical parameter calculation by dividing overall Big Data files into fixed-size blocks. Finally, the system is evaluated while considering the efficiency aspect (processing time and throughput) using (1) large-size city traffic video data captured by static as well as moving vehicles’ cameras while identifying vehicles and (2) large text-based files, like twitter data files, structural data, etc. Results show that the proposed system working with Spark on top and GPUs under the parallel and distributed environment of Hadoop ecosystem is more efficient and real-time as compared to existing standalone CPU-based MapReduce implementation.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Exploiting IoT and big data analytics: Defining Smart Digital City using real-time urban data
TL;DR: This paper has established an IoT-based Smart City by using Big Data analytics while harvesting real-time data from the city by using existing smart systems and IoT devices as city data sources to develop the Smart Digital City.
339
Smart campus-A sketch.
Nasro Min-Allah,Saleh Alrashed +1 more
TL;DR: A list of smart campus initiatives that can be prioritized as per a university needs and geographical location is created and the generic model established in this work for a smart campus remains valid.
126
A survey on graphic processing unit computing for large‐scale data mining
TL;DR: This survey analyzes current trends in the use of GPU computing for large‐scale data mining, discusses GPU architecture advantages for handling volume and velocity of data, identifies limitation factors hampering the scalability of the problems, and discusses open issues and future directions.
98
A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
TL;DR: Spark has better performance as compared to Hadoop when data sets are small, achieving up to two times speedup in WordCount workloads and up to 14 times in TeraSort workloads when default parameter values are reconfigured.
Key performance indicators for Smart Campus and Microgrid
TL;DR: The aim of this work is to establish a mechanism that allows campus management to monitor the smartness of their university campus in general, and microgrid in particular.
60
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
•Book
Lapack Users' Guide
Ed Anderson
- 01 Feb 1995
TL;DR: The third edition of LAPACK provided a guide to troubleshooting and installation of Routines, as well as providing examples of how to convert from LINPACK or EISPACK to BLAS.
3.2K
A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems
Tracy D. Braun,Howard Jay Siegel,N.B. Beck,Ladislau Bölöni,Muthucumaru Maheswaran,Albert Reuther,James Patrick Robertson,Mitchell D. Theys,Bin Yao,Debra Hensgen,Richard F. Freund +10 more
TL;DR: It is shown that for the cases studied here, the relatively simple Min?min heuristic performs well in comparison to the other techniques, and one even basis for comparison and insights into circumstances where one technique will out-perform another.
1.9K
GPU Computing
John D. Owens,Mike Houston,David Luebke,Simon Green,John E. Stone,James C. Phillips +5 more
- 01 May 2008
TL;DR: The background, hardware, and programming model for GPU computing is described, the state of the art in tools and techniques are summarized, and four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications are presented.
1.7K
Related Papers (5)
Yassine Benlachmi,Moulay Lahcen Hasnaoui +1 more
- 27 Jul 2020
Vaibhav Fanibhare,Vijay Dahake +1 more
- 01 Feb 2016