Book Chapter10.1007/978-3-642-19595-2_13
Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicores
Akihiro Hayashi,Yasutaka Wada,Takeshi Watanabe,Takeshi Sekiguchi,Masayoshi Mase,Jun Shirako,Keiji Kimura,Hironori Kasahara +7 more
- 07 Oct 2010
- Vol. 6548, pp 184-198
TL;DR: This paper describes the proposed compilation framework which bridges a gap between programmers and heterogeneous multicores based on OSCAR compiler and attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP accelerator cores against sequential execution by a single processor core.
read more
Abstract: Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Architecture Support for Task Out-of-Order Execution in MPSoCs
TL;DR: A novel high level architecture support for automatic out-of-order (OoO) task execution on FPGA based heterogeneous MPSoCs on the basis of a hierarchical middleware with an automatic task level OoO parallel execution engine.
32
Automatic parallelization and accelerator offloading for embedded applications on heterogeneous MPSoCs
Miguel Angel Aguilar,Rainer Leupers,Gerd Ascheid,Luis Gabriel Murillo +3 more
- 05 Jun 2016
TL;DR: Results show that the proposed approach is able to speedup sequential embedded applications significantly on a commercial heterogeneous MPSoC, which incorporates a quad-core ARM cluster and an octa-core DSP cluster.
23
Hardware Implementation on FPGA for Task-Level Parallel Dataflow Execution Engine
TL;DR: A FPGA implementation of a hardware out-of-order scheduler on heterogeneous multicore platform is proposed, capable of exploring potential inter-task dependency, leading to a significant acceleration of dependence-aware applications.
22
SOLAR: Services-Oriented Deep Learning Architectures-Deep Learning as a Service
TL;DR: Experimental results demonstrate that the SOLAR is able to provide a ubiquitous framework for diverse applications without increasing the burden of the programmers, and the speedup of the GPU and FPGA hardware accelerator in SOLAR can achieve significant speedup comparing to the conventional Intel i5 processors with great scalability.
18
Parcus: Energy-Aware and Robust Parallelization of AUTOSAR Legacy Applications
Sebastian Kehr,Eduardo Quinones,Dominik Langen,Bert Böddeker,Günter Schäfer +4 more
- 18 Apr 2017
TL;DR: Parcus explicitly models the traversal of data from sensor to actuator through task instances, enabling to consider the latency imposed by parallelization techniques, and can fully utilize the processor's energy-saving potential.
References
Parallel Computing Experiences with CUDA
Michael Garland,S. Le Grand,John R. Nickolls,Joshua A. Anderson,J. Hardwick,S. Morton,E. Phillips,Yao Zhang,Vasily Volkov +8 more
TL;DR: Experiences gained in applying CUDA to a diverse set of problems are surveyed and the parallel speedups over sequential codes running on traditional CPU architectures attained by executing key computations on the GPU are surveyed.
626
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Chi-Keung Luk,Sunpyo Hong,Hyesoon Kim +2 more
- 12 Dec 2009
TL;DR: Adaptive mapping is proposed, a fully automatic technique to map computations to processing elements on a CPU+GPU machine and it is shown that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduced in energy consumption than static mappings on average for a set of important computation benchmarks.
GPGPU: general-purpose computation on graphics hardware
David Luebke,Mark J. Harris,Naga K. Govindaraju,Aaron Lefohn,Mike Houston,John D. Owens,Mark Segal,Matthew Papakipos,Ian Buck +8 more
- 11 Nov 2006
TL;DR: General-Purpose Computation on GPUs (GPGPU) as discussed by the authors is a popular general-purpose programming language for GPUs that supports vector operations and IEEE floating-point precision.
264
•Proceedings Article
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Barbara Horner-Miller
- 11 Nov 2006
TL;DR: SC06 will explore the ways in which high performance computing, networking, storage and analysis lead to advances in research, education and commerce, and introduce an initiative focusing on those emerging concepts and technologies that have the potential to reshape the HPC landscape.
226