Journal Article10.1002/SPE.642
Compiler transformations for effectively exploiting a zero overhead loop buffer
Gang-Ryung Uh,Yuhong Wang,David Whalley,Sanjay Jinturkar,Yunheung Paek,Vincent Cao,Chris Burns +6 more
5
TL;DR: This paper describes strategies for generating code to effectively use a Zero Overhead Loop Buffer and finds that many common code improving transformations used by optimizing compilers on conventional architectures can be easily used to allow more loops to be placed in a ZOLB.
read more
Abstract: A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specified number of times without incurring any loop overhead. Unlike loop unrolling, a loop buffer can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to effectively use a ZOLB. We have found that many common code improving transformations used by optimizing compiler on conventional architectures can be easily used to (1) allow more loops to be placed in a ZOLB, (2) further reduce loop overhead of the loops placed in a ZOLB, and (3) avoid redundant loading of ZOLB loops. The results given in this paper demonstrate that this architectural feature can often be exploited with substantial improvements in execution time and slight reductions in code size for various signal processing applications.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Elimination of Overhead Operations in Complex Loop Structures for Embedded Microprocessors
TL;DR: A novel zero-overhead loop controller (ZOLC) supporting arbitrary loop structures with multiple-entry and multiple-exit nodes is described and utilized to enhance embedded RISC processors.
21
Derivation of efficient FSM from loop nests
Tomofumi Yuki,Antoine Morvan,Steven Derrien +2 more
- 01 Dec 2013
TL;DR: This paper presents an automatic transformation targeting HLS that improves the effectiveness of nested loop pipelining, by efficient implementations of the control-path, and presents an analytical model that captures the trade-off between gain in cycles and loss in frequency.
6
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP
Libo Huang,Nong Xiao,Zhiying Wang,Yongwen Wang,Mingche Lai +4 more
- 01 Oct 2013
TL;DR: This work proposes optimized SIMD engines that have the capabilities for combining VLIW or TTA processing with a unified scalar and long vector computations as well as efficient SIMD hardware for real computation.
4
Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors
Doosan Cho,Ravi Ayyagari,Gang-Ryung Uh,Yunheung Paek +3 more
- 26 Mar 2007
TL;DR: A compiler preprocessing strategy that capitalizes on two techniques for effective modulo scheduling, referred to as cloning1 and cloning2, which lies in the direct relaxation of cyclic data dependences by exploiting functional units which are otherwise left unused.
A compilation method for zero overhead loop in DSPs with VLIW
Chang Rui,Jun Wu,Haoqi Ren +2 more
- 01 Oct 2017
TL;DR: A compiler transformation method for zero overhead loop (ZOL) that supports very long instruction word (VLIW), internal branches and the loops whose iterative times are known at runtime and before execution.
1
References
•Book
Computer Architecture: A Quantitative Approach
John L. Hennessy,David A. Patterson +1 more
- 01 Dec 1989
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
12.6K
Compiler transformations for high-performance computing
TL;DR: This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran, and describes the purpose of each transformation, how to determine if it is legal, and an example of its application.
1K
Software pipelining: an effective scheduling technique for VLIW machines
Monica S. Lam
- 01 Jun 1988
TL;DR: This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors, and proposes a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block.
Iterative module scheduling: an algorithm for software pipelining loops
B. Ramakrishna Rau
- 30 Nov 1994
TL;DR: This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models and characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
749
Lifetime-sensitive modulo scheduling
Richard A. Huff
- 01 Jun 1993
TL;DR: This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time, and empirical results indicate near-optimal performance.
259