Proceedings Article10.1109/FSKD.2016.7603359
Performance optimization for CPU-GPU heterogeneous parallel system
Yanhua Wang,Jianzhong Qiao,Shukuan Lin,Tinglei Zhao +3 more
- 01 Aug 2016
pp 1259-1266
4
TL;DR: The proposed task allocation model can achieve up to 23.43% of performance improvement compared to some states of the art allocation strategies averagely and is evaluated by implemented on a real heterogeneous system and several benchmarks.
read more
Abstract: With GPU (Graphics Processing Unit) taking part in general-purpose computing, a heterogeneous system usually achieves higher performance and efficiency. There are many studies on how to improve the performance of a heterogeneous system, among of which are a number of researches to achieve the goal by allocating workload into processors with different strategies. In the paper, we implement a task allocation model in the principle of making execution time of the partition on CPU closer to the partition on GPU to the maximum extent. The task allocation process contains two stages. Firstly, we make use of SVM (Support Vector Machine) to classify the tasks into two sets as CPU-kind and GPU-kind in pre-treating stage. Secondly, we adjust the two task sets in the light of the characteristic and current running status of processors, then we map the two well-adjusted task sets to processors. Moreover, we evaluate the proposed model by implementing them on a real heterogeneous system and several benchmarks. Experimental results demonstrate that our model can achieve up to 23.43% of performance improvement compared to some states of the art allocation strategies averagely.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Optimization Techniques for GPU Programming
TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
54
A task allocation method for heterogeneous multi-core system based on genetic algorithm
Juan Fang,Mengxuan Wang,Mingxia Gao,Jianhua Wei +3 more
- 01 Nov 2017
TL;DR: Experimental results demonstrate that the algorithm can effectively improve the system performance, compared with the built-in task scheduling mechanism of Linux 2.6 kernel.
2
A survey on techniques for cooperative CPU-GPU computing
Raju K,Niranjan N. Chiplunkar +1 more
TL;DR: This survey paper review the various techniques available for CPU-GPU cooperative computing to improve the resource utilization and reduce the energy consumption of heterogeneous systems by cooperatively performing the computation on both multicore CPUs and GPUs.
An Approximate Optimal Solution to GPU Workload Scheduling
TL;DR: The authors carry out scheduling of data transfer before workload execution scheduling, and propose an optimal scheduling algorithm for GPU workload based on the Dyer-Zemel algorithm, and deduce the fully polynomial-time scheme for PPTA.
References
Scalable parallel programming with CUDA
John R. Nickolls,Ian Buck,Michael Garland,Kevin Skadron +3 more
- 11 Aug 2008
TL;DR: Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.
NVIDIA Tesla: A Unified Graphics and Computing Architecture
TL;DR: To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture, which is massively multithreaded and programmable in C or via graphics APIs.
Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?
TL;DR: In this article, the authors present a framework to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism on manycore GPUs with widely varying numbers of cores.
1.4K
The GPU Computing Era
TL;DR: The rapid evolution of GPU architectures-from graphics processors to massively parallel many-core multiprocessors, recent developments in GPU computing architectures, and how the enthusiastic adoption of CPU+GPU coprocessing is accelerating parallel applications are described.
1K
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Sunpyo Hong,Hyesoon Kim +1 more
- 20 Jun 2009
TL;DR: A simple analytical model is proposed that estimates the execution time of massively parallel programs by considering the number of running threads and memory bandwidth and estimates the cost of memory requests, thereby estimating the overall executionTime of a program.