ML-Based Dynamic Operator-Level Query Mapping for Stream Processing Systems in Heterogeneous Computing Environments

doi:10.1109/cluster59578.2024.00027

Journal Article10.1109/cluster59578.2024.00027

ML-Based Dynamic Operator-Level Query Mapping for Stream Processing Systems in Heterogeneous Computing Environments

Seung-Hwan Oh, +2 more

- 24 Sep 2024

pp 226-237

TL;DR: DynO, a stream processing system, uses a tree-based machine learning algorithm to dynamically map queries to devices at the operator-level, optimizing performance by predicting execution times and leveraging GPU idle periods and prefetching.

Abstract: Mapping queries to optimal computing devices at the operator-level presents a significant challenge in stream processing systems (SPS) with heterogeneous computing resources. Inefficient query mapping can degrade the performance of the SPS. To address this issue, existing approaches employ static methods, such as mapping all queries to either CPUs or GPUs, or maintaining static mapping tables for queries or operators based on their predetermined device preferences. However, the static mapping scheme fails to provide an optimal solution, as the device preference for different query operators changes dynamically at runtime. In this paper, we propose DynO, a high performance SPS that dynamically maps queries to devices at the operator-level using a tree-based machine learning algorithm. To effectively determine an optimized device mapping plan for query operators, DynO employs a tree-based gradient boosting model to accurately predict the execution time for all potential mapping plan combinations. DynO also introduces a novel turn-based updating scheme to maximize performance in stream processing while training a tree-based gradient boosting model. Additionally, we devise an efficient device mapping scheme to expedite the process of determining the optimal device mapping plan by leveraging a direct acyclic graph (DAG) shortest path algorithm. DynO completely hides any overhead caused by the extra computation needed to find the optimal plan by utilizing prefetching and GPU idle periods. Experimental results using a variety of queries and traffic patterns show that DynO outperforms existing state-of-the-art approaches by ensuring high throughput, low latency, and high efficiency.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

•Journal Article•10.1007/S10994-006-6226-1

Extremely randomized trees

Pierre Geurts, +2 more

- 01 Apr 2006

- Machine Learning

TL;DR: A new tree-based ensemble method for supervised classification and regression problems that consists of randomizing strongly both attribute and cut-point choice while splitting a tree node and builds totally randomized trees whose structures are independent of the output values of the learning sample.

...read moreread less

7.7K

Journal Article•10.1016/S0167-9473(01)00065-2

Stochastic gradient boosting

Jerome H. Friedman

- 28 Feb 2002

- Computational Statistics & Data Analysis

TL;DR: It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure.

...read moreread less

7.2K

Journal Article•10.1007/S10021-005-0054-1

Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction

Anantha Prasad, +2 more

- 15 Mar 2006

- Ecosystems

TL;DR: In this article, the authors evaluated four statistical models (Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

...read moreread less

2.2K

•Journal Article•10.1093/BIOINFORMATICS/BTQ134

Permutation importance

Andre Altmann, +3 more

- 01 May 2010

- Bioinformatics

TL;DR: Almann et al. as discussed by the authors introduced a heuristic for normalizing feature importance measures that can correct the feature importance bias, based on repeated permutations of the outcome vector for estimating the distribution of measured importance for each variable in a non-informative setting.

...read moreread less

1.8K

...

Expand

ML-Based Dynamic Operator-Level Query Mapping for Stream Processing Systems in Heterogeneous Computing Environments

Chat with Paper

AI Agents for this Paper

References

Extremely randomized trees

Stochastic gradient boosting

Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction

Apache Spark: a unified engine for big data processing

Permutation importance