A static-placement, dynamic-issue framework for CGRA loop accelerator

doi:10.23919/DATE.2017.7927202

Proceedings Article10.23919/DATE.2017.7927202

A static-placement, dynamic-issue framework for CGRA loop accelerator

Zhongyuan Zhao, +4 more

- 27 Mar 2017

- pp 1348-1353

7

TL;DR: A static-placement, dynamic-issue (SPDI) framework for the coarse-grained reconfigurable architecture (CGRA), which includes the compiler that statically places the operations and hardware design, a SPDI CGRA, that automatically schedule the operations.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/TPDS.2020.2982137

A Black-Box Fork-Join Latency Prediction Model for Data-Intensive Applications

Minh Quoc Nguyen, +4 more

- 01 Sep 2020

- IEEE Transactions on Parallel and Distri...

TL;DR: A black-box Fork-Join model is proposed that covers a wide range of Fork- join structures for the prediction of tail and mean latency, called ForkTail and ForkMean, respectively, and can be used as a powerful tool to aid the design of tail-and-mean-latency guaranteed job scheduling and resource provisioning, especially at high load, for datacenter applications.

...read moreread less

25

Proceedings Article•10.1109/HPCC/SMARTCITY/DSS.2019.00057

C-MAP: Improving the Effectiveness of Mapping Method for CGRA by Reducing NoC Congestion

An Shuqian, +6 more

- 01 Aug 2019

TL;DR: C-Map improves the effectiveness of CGRA mapping in the perspective of reducing network congestion and enhancing the continuity of the data-flow and analyzes the impact of several key considerations in CGRA instruction mapping, such as NoC workload reduction and workload balance.

...read moreread less

4

•Journal Article•10.1049/CJE.2020.05.002

The Principle and Progress of Dynamically Reconfigurable Computing Technologies

Shaojun Wei, +1 more

- 01 Jul 2020

- Chinese Journal of Electronics

TL;DR: This paper summarizes the latest progress in key technologies of dynamically reconfigurable computing and provides an introduction of the application achievements.

...read moreread less

3

•Proceedings Article•10.1109/SIPS52927.2021.00055

A Multi-Domain Architectural Efficiency Metric

Sumeet Singh Nagi, +1 more

- 01 Oct 2021

TL;DR: In this article, an architectural efficiency metric that quantifies the number of instructions or the size of reconfiguration bits required to perform a computation over a range of program sizes in the architecture is proposed.

...read moreread less

Journal Article•10.1145/3773768

DFGAS: Exploring the Balance of HW-SW Scheduling through the DFG-Aware Scheme

Tianyu Liu, +9 more

- 28 Oct 2025

- ACM Transactions on Architecture and Cod...

Abstract: Coarse-Grained Reconfigurable Architectures (CGRAs) have been regarded as promising spatial computing fabric for the ever-evolving algorithms in multiple domains. However, pure software scheduling cannot compensate for the deficiencies in over-serialization and load imbalancing of these pure static CGRA designs. To address the issues caused by limited hardware flexibility, an in-depth study on the balance between the software and hardware scheduling design of CGRA is needed to achieve more precise, accurate, and adaptive scheduling of dataflow. In this paper, we propose DFGAS ( DFG - A ware S cheduling), a dataflow-driven CGRA which provides a comprehensive scheduling approach that encompasses software prediction, runtime adaptive execution, and post-execution refinement. Prior to execution, the TimeStamp prediction algorithm, coupled with the inherent dataflow execution model, enables coarse-grained (block-level) prediction for prioritized transfer and computation on NoC and PEs. During execution, the execution of key dataflow graph (DFG) blocks and edges is accelerated by incorporating a dynamic and adaptive dataflow mechanism. It leverages hardware-software co-design to obtain a holistic view of the entire DFG and continuously self-adaptively optimizes the scheduling process. Furthermore, a complete workflow is implemented, supporting making refinements to the software DFG mapping results. DFGAS represents a scheduling scheme of CGRA that is worth exploring, achieving hardware-software co-design that balances energy efficiency and flexibility. Experiments show that DFGAS achieves 1.35 × energy efficiency improvement over a dataflow-driven CGRA and 1.9 × energy efficiency improvement over a state-of-the-art pure static CGRA.

...read moreread less

References

•Proceedings Article•10.5555/977395.977673

LLVM: a compilation framework for lifelong program analysis & transformation

Chris Lattner, +1 more

- 20 Mar 2004

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

5.4K

•Proceedings Article•10.1109/WWC.2001.15

MiBench: A free, commercially representative embedded benchmark suite

Matthew R. Guthaus, +5 more

- 02 Dec 2001

TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.

...read moreread less

3.7K

The Landscape of Parallel Computing Research: A View from Berkeley

Krste Asanovic, +10 more

- 18 Dec 2006

TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.

...read moreread less

2.4K

Proceedings Article•10.1145/859618.859667

Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Karthikeyan Sankaralingam, +7 more

- 01 May 2003

TL;DR: Results show that high performance can be obtained in each of the three modes--ILP, TLP, and DLP-demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.

...read moreread less

528

Journal Article•10.1109/MM.2003.1261386

Exploiting ILP, TLP, and DLP with the polymorphous trips architecture

Karthikeyan Sankaralingam, +7 more

- 01 Nov 2003

- IEEE Micro

TL;DR: The Tera-op reliable intelligently adaptive processing system (TRIPS) architecture seeks to deliver system-level configurability to applications and runtime systems by employing the concept of polymorphism.

...read moreread less

318