Performance optimization for CPU-GPU heterogeneous parallel system

doi:10.1109/FSKD.2016.7603359

Proceedings Article10.1109/FSKD.2016.7603359

Performance optimization for CPU-GPU heterogeneous parallel system

Yanhua Wang, +3 more

- 01 Aug 2016

pp 1259-1266

4

TL;DR: The proposed task allocation model can achieve up to 23.43% of performance improvement compared to some states of the art allocation strategies averagely and is evaluated by implemented on a real heterogeneous system and several benchmarks.

Abstract: With GPU (Graphics Processing Unit) taking part in general-purpose computing, a heterogeneous system usually achieves higher performance and efficiency. There are many studies on how to improve the performance of a heterogeneous system, among of which are a number of researches to achieve the goal by allocating workload into processors with different strategies. In the paper, we implement a task allocation model in the principle of making execution time of the partition on CPU closer to the partition on GPU to the maximum extent. The task allocation process contains two stages. Firstly, we make use of SVM (Support Vector Machine) to classify the tasks into two sets as CPU-kind and GPU-kind in pre-treating stage. Secondly, we adjust the two task sets in the light of the characteristic and current running status of processors, then we map the two well-adjusted task sets to processors. Moreover, we evaluate the proposed model by implementing them on a real heterogeneous system and several benchmarks. Experimental results demonstrate that our model can achieve up to 23.43% of performance improvement compared to some states of the art allocation strategies averagely.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1145/3570638

Optimization Techniques for GPU Programming

Pieter Hijma, +4 more

- 14 Nov 2022

- ACM Computing Surveys

TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.

...read moreread less

54

Proceedings Article•10.1109/ICSESS.2017.8342896

A task allocation method for heterogeneous multi-core system based on genetic algorithm

Juan Fang, +3 more

- 01 Nov 2017

TL;DR: Experimental results demonstrate that the algorithm can effectively improve the system performance, compared with the built-in task scheduling mechanism of Linux 2.6 kernel.

...read moreread less

2

Journal Article•10.1016/J.SUSCOM.2018.07.010

A survey on techniques for cooperative CPU-GPU computing

Raju K, +1 more

- 01 Sep 2018

- Sustainable Computing: Informatics and S...

TL;DR: This survey paper review the various techniques available for CPU-GPU cooperative computing to improve the resource utilization and reduce the energy consumption of heterogeneous systems by cooperatively performing the computation on both multicore CPUs and GPUs.

...read moreread less

Journal Article•10.1109/MCSE.2018.110145709

An Approximate Optimal Solution to GPU Workload Scheduling

Yanhua Wang, +3 more

- 11 Jan 2018

- Computing in Science and Engineering

TL;DR: The authors carry out scheduling of data transfer before workload execution scheduling, and propose an optimal scheduling algorithm for GPU workload based on the Dyer-Zemel algorithm, and deduce the fully polynomial-time scheme for PPTA.

...read moreread less

References

Proceedings Article•10.1145/1401132.1401152

Scalable parallel programming with CUDA

John R. Nickolls, +3 more

- 11 Aug 2008

TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.

...read moreread less

2.3K

Journal Article•10.1109/MM.2008.31

NVIDIA Tesla: A Unified Graphics and Computing Architecture

Erik Lindholm, +3 more

- 01 Mar 2008

- IEEE Micro

TL;DR: To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture, which is massively multithreaded and programmable in C or via graphics APIs.

...read moreread less

1.6K

•Journal Article•10.1145/1365490.1365500

Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?

John R. Nickolls, +3 more

- 01 Mar 2008

- ACM Queue

TL;DR: In this article, the authors present a framework to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism on manycore GPUs with widely varying numbers of cores.

...read moreread less

1.4K

Journal Article•10.1109/MM.2010.41

The GPU Computing Era

John R. Nickolls, +1 more

- 01 Mar 2010

- IEEE Micro

TL;DR: The rapid evolution of GPU architectures-from graphics processors to massively parallel many-core multiprocessors, recent developments in GPU computing architectures, and how the enthusiastic adoption of CPU+GPU coprocessing is accelerating parallel applications are described.

...read moreread less

1K

•Proceedings Article•10.1145/1555754.1555775

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Sunpyo Hong, +1 more

- 20 Jun 2009

TL;DR: A simple analytical model is proposed that estimates the execution time of massively parallel programs by considering the number of running threads and memory bandwidth and estimates the cost of memory requests, thereby estimating the overall executionTime of a program.

...read moreread less

749

...

Expand

Performance optimization for CPU-GPU heterogeneous parallel system

Chat with Paper

AI Agents for this Paper

Citations

Optimization Techniques for GPU Programming

A task allocation method for heterogeneous multi-core system based on genetic algorithm

A survey on techniques for cooperative CPU-GPU computing

An Approximate Optimal Solution to GPU Workload Scheduling

References

Scalable parallel programming with CUDA

NVIDIA Tesla: A Unified Graphics and Computing Architecture

Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?

The GPU Computing Era

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Related Papers (5)

Efficient Scheduling and High-Performance Graph Partitioning on Heterogeneous CPU-GPU Systems

A Deep Q-Learning Approach for GPU Task Scheduling

A Bi-objective Optimization Framework for Heterogeneous CPU/GPU Query Plans.

Graph Support and Scheduling for OpenCL on Heterogeneous Multi-core Systems

A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures