Proceedings Article10.1109/ICDE.2012.120
M3: Stream Processing on Main-Memory MapReduce
Ahmed M. Aly,Asmaa Sallam,Bala M. Gnanasekaran,Long-Van Nguyen-Dinh,Walid G. Aref,Mourad Ouzzani,Arif Ghafoor +6 more
- 01 Apr 2012
- pp 1253-1256
TL;DR: M3 extends Hadoop, the open source implementation of MapReduce, bypassing the Hadoops Distributed File System (HDFS) to support main-memory-only processing, and supports continuous execution of the Map and Reduce phases where individual Mappers and Reducers never terminate.
read more
Abstract: The continuous growth of social web applications along with the development of sensor capabilities in electronic devices is creating countless opportunities to analyze the enormous amounts of data that is continuously steaming from these applications and devices. To process large scale data on large scale computing clusters, MapReduce has been introduced as a framework for parallel computing. However, most of the current implementations of the MapReduce framework support only the execution of fixed-input jobs. Such restriction makes these implementations inapplicable for most streaming applications, in which queries are continuous in nature, and input data streams are continuously received at high arrival rates. In this demonstration, we showcase M$^3$, a prototype implementation of the MapReduce framework in which continuous queries over streams of data can be efficiently answered. M$^3$ extends Hadoop, the open source implementation of MapReduce, bypassing the Hadoop Distributed File System (HDFS) to support main-memory-only processing. Moreover, M$^3$ supports continuous execution of the Map and Reduce phases where individual Mappers and Reducers never terminate.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Lambda architecture for cost-effective batch and speed big data processing
Mariam Kiran,Peter Murphy,Inder Monga,Jon Dugan,Sartaj Singh Baveja +4 more
- 29 Oct 2015
TL;DR: An implementation of the lambda architecture design pattern is presented to construct a data-handling backend on Amazon EC2, providing high throughput, dense and intense data demand delivered as services, minimizing the cost of the network maintenance.
T-Storm: Traffic-Aware Online Scheduling in Storm
Jielong Xu,Zhenhua Chen,Jian Tang,Sen Su +3 more
- 30 Jun 2014
TL;DR: A new stream data processing system based on Storm, namely, T-Storm, which accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic.
243
FELIX: fast and energy-efficient logic in memory
Saransh Gupta,Mohsen Imani,Tajana Rosing +2 more
- 05 Nov 2018
TL;DR: This paper proposes an in-memory implementation of fast and energy-efficient logic (FELIX) which combines the functionality of PIM with memories and is the first PIM logic to enable the single cycle NOR, NOT, NAND, minority, and OR directly in crossbar memory.
213
CEPSim: Modelling and Simulation of Complex Event Processing Systems in Cloud Environments
TL;DR: CEPSim is highly customizable and can be used to analyse the performance and scalability of user-defined queries and to evaluate the effects of various query processing strategies, as well as simulate existing systems in large Big Data scenarios with accuracy and precision.
51
Model-free control for distributed stream data processing using deep reinforcement learning
LiTeng,XuZhiyuan,TangJian,WangYanzhi +3 more
- 01 Feb 2018
TL;DR: In this paper, the authors focus on general-purpose Distributed Stream Data Processing Systems (DSDPSs), which deal with processing of unbounded streams of continuous data at scale distributedly in real or real-time.
41
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Hive: a warehousing solution over a map-reduce framework
Ashish Thusoo,Joydeep Sen Sarma,Namit Jain,Zheng Shao,Prasad Chakka,Suresh Anthony,Hao Liu,Pete Wyckoff,Raghotham Murthy +8 more
- 01 Aug 2009
TL;DR: Hadoop is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware.
MapReduce online
Tyson Condie,Neil Conway,Peter Alvaro,Joseph M. Hellerstein,Khaled Elmeleegy,Russell Sears +5 more
- 28 Apr 2010
TL;DR: A modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed, and can reduce completion times and improve system utilization for batch jobs as well.
930
A comparison of join algorithms for log processing in MaPreduce
Spyros Blanas,Jignesh M. Patel,Vuk Ercegovac,Jun Rao,Eugene J. Shekita,Yuanyuan Tian +5 more
- 06 Jun 2010
TL;DR: Key implementation details of a number of well-known join strategies in MapReduce are described and a comprehensive experimental comparison of these join techniques on a 100-node Hadoop cluster is presented.
490