Proceedings Article10.1145/3210284.3219768
Iterative Scheduling for Distributed Stream Processing Systems
Leila Eskandari,Jason Mair,Zhiyi Huang,David Eyers +3 more
- 25 Jun 2018
- pp 234-237
15
TL;DR: This research proposes a heuristic scheduling algorithm which reliably and efficiently finds the highly communicating tasks by exploiting graph partitioning algorithms and a mathematical optimisation software package and shows that this scheduler outperforms R-Storm, increasing throughput by between 3% and 30% and Online scheduler by 20--86% as a result of finding a more efficient schedule.
read more
Abstract: Nowadays data stream processing systems need to efficiently handle large volumes of data in near real-time. To achieve this, the schedulers within such systems minimise the data movement between highly communicating tasks, improving system throughput. However, finding an optimal schedule for these systems is NP-hard. In this research, we propose a heuristic scheduling algorithm which reliably and efficiently finds the highly communicating tasks by exploiting graph partitioning algorithms and a mathematical optimisation software package. We evaluate our scheduler with two popular existing schedulers R-Storm and Aniello et al.'s 'Online scheduler' using two real-world applications and show that our proposed scheduler outperforms R-Storm, increasing throughput by between 3% and 30% and Online scheduler by 20--86% as a result of finding a more efficient schedule.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Pipelined Dynamic Scheduling of Big Data Streams
TL;DR: This paper presents a pipeline-based dynamic modular arithmetic-based scheduler (PMOD scheduler), which can be used to re-schedule the streams distributed among a set of nodes and their tasks, when the system parameters change.
34
A3-Storm: topology-, traffic-, and resource-aware storm scheduler for heterogeneous clusters
TL;DR: A3-Storm, a scheduler, based on topology and traffic is proposed that optimizes resource usage for heterogeneous clusters and improves efficiency using resource-aware task assignments that results in enhanced throughput and resource utilization.
21
EQUALITY: Quality-aware intensive analytics on the edge
TL;DR: In this article , the authors develop a solution that trades latency for an increased fraction of incoming data for which data quality-related measurements and operations are performed, in jobs running over geo-distributed heterogeneous and constrained resources.
8
BAN-Storm: a Bandwidth-Aware Scheduling Mechanism for Stream Jobs
Asif Muhammad,Muhammad Aleem +1 more
TL;DR: In this article, the authors proposed BAN-Storm, a stream scheduler that considers inter-task communication along the other important scheduling aspects such as heterogeneity, etc. to schedule stream jobs.
6
An energy efficient and runtime-aware framework for distributed stream computing systems
TL;DR: Er-Stream as discussed by the authors proposes an energy efficient and runtime-aware framework for distributed stream computing systems, where task pairs with high communication cost are processed on the same compute node through a lightweight task partitioning strategy, minimizing the communication cost between nodes and avoiding frequent triggering of runtime scheduling.
5
References
Multilevelk-way Partitioning Scheme for Irregular Graphs
George Karypis,Vipin Kumar +1 more
TL;DR: This paper presents and study a class of graph partitioning algorithms that reduces the size of the graph by collapsing vertices and edges, they find ak-way partitioning of the smaller graph, and then they uncoarsen and refine it to construct ak- way partitioning for the original graph.
2K
Adaptive online scheduling in storm
Leonardo Aniello,Roberto Baldoni,Leonardo Querzoni +2 more
- 29 Jun 2013
TL;DR: Two advanced generic schedulers for Storm are proposed that provide improved performance for a wide range of application topologies and can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.
274
T-Storm: Traffic-Aware Online Scheduling in Storm
Jielong Xu,Zhenhua Chen,Jian Tang,Sen Su +3 more
- 30 Jun 2014
TL;DR: A new stream data processing system based on Storm, namely, T-Storm, which accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic.
243
R-Storm: Resource-Aware Scheduling in Storm
Boyang Peng,Mohammad Hosseini,Zhihao Hong,Reza Farivar,Roy H. Campbell +4 more
- 24 Nov 2015
TL;DR: R-Storm as mentioned in this paper implements resource-aware scheduling within Storm, which can satisfy both soft and hard resource constraints as well as minimize network distance between components that communicate with each other, achieving 30-47% higher throughput and 69-350% better CPU utilization than default Storm.
Optimal operator placement for distributed stream processing applications
Valeria Cardellini,Vincenzo Grassi,Francesco Lo Presti,Matteo Nardelli +3 more
- 13 Jun 2016
TL;DR: A general formulation of the optimal DSP placement (for short, ODP) as an Integer Linear Programming problem which takes explicitly into account the heterogeneity of computing and networking resources and which encompasses - as special cases - the different solutions proposed in the literature.
168