Proceedings Article10.1109/SYNASC.2013.53
Algorithm for Cooperative CPU-GPU Computing
Razvan-Mihai Aciu,Horia Ciocarlie +1 more
- 23 Sep 2013
- pp 352-358
4
TL;DR: This paper proposes a framework which allows a programmer to split the code flow of a thread in parts and each of these parts will run on the most suitable computing resource, CPU or GPU, and evaluates its practical results.
read more
Abstract: Many applications have modules which could benefit greatly from the massive parallel numeric computing power provided by GPUs. Renderers, signal processing or simulators are only a few such applications. Due to the weaknesses of the GPUs such as stackless execution model or poor capabilities for pointer exchange with the host, sometimes is not feasible to convert an entire algorithm for GPU, even if it is highly parallel and some of its parts can be greatly accelerated on GPU. In such situations a programmer should have a framework which allows him to split the code flow of a thread in parts and each of these parts will run on the most suitable computing resource, CPU or GPU. For GPU execution, multiple data from host threads will be collected, run on GPU and the results returned to the original threads so they will be able to resume execution on host. In this paper we propose such an algorithm, analyze it and evaluate its practical results.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Journal Article
Large Data Visualization on Distributed Memory Mulit-GPU Clusters
TL;DR: This work studies a common visualization technique in a GPU-accelerated, distributed memory setting, and presents performance characteristics when scaling to extremely large data sets.
54
Dual buffer rotation four-stage pipeline for CPU–GPU cooperative computing
Tao Li,Qiankun Dong,Yifeng Wang,Xiaoli Gong,Yulu Yang +4 more
- 01 Feb 2019
TL;DR: The dual buffer rotation four-stage pipeline (DBFP) mechanism is proposed for CPU–GPU cooperative computation to efficiently handle data-intensive problems, which need larger memory than that of a single GPU, and is designed on top of the data block partition-based pipeline computing strategy.
7
Patent
System, apparatus and method for low overhead control transfer to alternate address space in a processor
Brent R. Boswell,Nagasundaram Banu Meenakshi,Michael D. Abbott,Dakshinamoorthy Srikanth,Jason Howard,Joshua B. Fryman +5 more
- 09 Dec 2016
TL;DR: In this paper, the authors describe an accelerator associated with a first address space and a core associated with an alternate address space configuration register to store configuration information to enable the core to execute instructions from the first address address space.
1
A survey on techniques for cooperative CPU-GPU computing
Raju K,Niranjan N. Chiplunkar +1 more
TL;DR: This survey paper review the various techniques available for CPU-GPU cooperative computing to improve the resource utilization and reduce the energy consumption of heterogeneous systems by cooperatively performing the computation on both multicore CPUs and GPUs.
References
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
TL;DR: The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators as mentioned in this paper, such as accelerators.
The GPU Computing Era
TL;DR: The rapid evolution of GPU architectures-from graphics processors to massively parallel many-core multiprocessors, recent developments in GPU computing architectures, and how the enthusiastic adoption of CPU+GPU coprocessing is accelerating parallel applications are described.
1K
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
Mayank Daga,Ashwin M. Aji,Wu-chun Feng +2 more
- 19 Jul 2011
TL;DR: This paper empirically characterize and analyze the efficacy of AMD Fusion, an architecture that combines general-purposex86 cores and programmable accelerator cores on the same silicon die, and characterize its performance via a set of micro-benchmarks.
Parallelism via Multithreaded and Multicore CPUs
TL;DR: Multicore and multithreaded CPUs have become the new approach to obtaining increases in CPU performance by using chip circuitry for maximizing throughput via multiple threads per core.
105
Hyperspectral Unmixing on GPUs and Multi-Core Processors: A Comparison
Sergio Bernabe,Sergio Sánchez,Antonio Plaza,Sebastian Lopez,Jon Atli Benediktsson,Roberto Sarmiento +5 more
TL;DR: This paper provides real-time unmixing performance in two different analysis scenarios using hyperspectral data collected by NASA's Airborne Visible Infra-Red Imaging Spectrometer over the Cuprite mining district in Nevada and the World Trade Center complex in New York City.
89
Related Papers (5)
Feng Ji,Heshan Lin,Xiaosong Ma +2 more
- 07 Oct 2013
Mark McKenney,Gabriel De Luna,Schiller Hill,Logan Lowell +3 more
- 01 Nov 2011