Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores

doi:10.1007/978-3-642-19595-2_13

Book Chapter10.1007/978-3-642-19595-2_13

Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores

Akihiro Hayashi, +7 more

- 07 Oct 2010

- Vol. 6548, pp 184-198

17

TL;DR: This paper describes the proposed compilation framework which bridges a gap between programmers and heterogeneous multicores based on OSCAR compiler and attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP accelerator cores against sequential execution by a single processor core.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/TC.2014.2315628

Architecture Support for Task Out-of-Order Execution in MPSoCs

Chao Wang, +6 more

- 01 May 2015

- IEEE Transactions on Computers

TL;DR: A novel high level architecture support for automatic out-of-order (OoO) task execution on FPGA based heterogeneous MPSoCs on the basis of a hierarchical middleware with an automatic task level OoO parallel execution engine.

...read moreread less

32

Proceedings Article•10.1145/2897937.2897991

Automatic parallelization and accelerator offloading for embedded applications on heterogeneous MPSoCs

Miguel Angel Aguilar, +3 more

- 05 Jun 2016

TL;DR: Results show that the proposed approach is able to speedup sequential embedded applications significantly on a commercial heterogeneous MPSoC, which incorporates a quad-core ARM cluster and an octa-core DSP cluster.

...read moreread less

23

Journal Article•10.1109/TPDS.2015.2487346

Hardware Implementation on FPGA for Task-Level Parallel Dataflow Execution Engine

Chao Wang, +4 more

- 01 Aug 2016

- IEEE Transactions on Parallel and Distri...

TL;DR: A FPGA implementation of a hardware out-of-order scheduler on heterogeneous multicore platform is proposed, capable of exploring potential inter-task dependency, leading to a significant acceleration of dependence-aware applications.

...read moreread less

22

Journal Article•10.1109/TSC.2017.2777478

SOLAR: Services-Oriented Deep Learning Architectures-Deep Learning as a Service

Chao Wang, +6 more

- 01 Jan 2021

- IEEE Transactions on Services Computing

TL;DR: Experimental results demonstrate that the SOLAR is able to provide a ubiquitous framework for diverse applications without increasing the burden of the programmers, and the speedup of the GPU and FPGA hardware accelerator in SOLAR can achieve significant speedup comparing to the conventional Intel i5 processors with great scalability.

...read moreread less

18

•Proceedings Article•10.1109/RTAS.2017.4

Parcus: Energy-Aware and Robust Parallelization of AUTOSAR Legacy Applications

Sebastian Kehr, +4 more

- 18 Apr 2017

TL;DR: Parcus explicitly models the traversal of data from sensor to actuator through task instances, enabling to consider the latency imposed by parallelization techniques, and can fully utilize the processor's energy-saving potential.

...read moreread less

15

...

Expand

References

Journal Article•10.1109/MM.2008.57

Parallel Computing Experiences with CUDA

Michael Garland, +8 more

- 01 Jul 2008

- IEEE Micro

TL;DR: Experiences gained in applying CUDA to a diverse set of problems are surveyed and the parallel speedups over sequential codes running on traditional CPU architectures attained by executing key computations on the GPU are surveyed.

...read moreread less

626

Proceedings Article•10.1145/1669112.1669121

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Chi-Keung Luk, +2 more

- 12 Dec 2009

TL;DR: Adaptive mapping is proposed, a fully automatic technique to map computations to processing elements on a CPU+GPU machine and it is shown that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduced in energy consumption than static mappings on average for a set of important computation benchmarks.

...read moreread less

585

Proceedings Article•10.1145/1188455.1188672

GPGPU: general-purpose computation on graphics hardware

David Luebke, +8 more

- 11 Nov 2006

TL;DR: General-Purpose Computation on GPUs (GPGPU) as discussed by the authors is a popular general-purpose programming language for GPUs that supports vector operations and IEEE floating-point precision.

...read moreread less

264

•Proceedings Article

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Barbara Horner-Miller

- 11 Nov 2006

TL;DR: SC06 will explore the ways in which high performance computing, networking, storage and analysis lead to advances in research, education and commerce, and introduce an initiative focusing on those emerging concepts and technologies that have the potential to reshape the HPC landscape.

...read moreread less

226

Proceedings Article•10.1109/SC.2006.17