Processor architecture and data buffering
H. Mulder,Michael J. Flynn +1 more
TL;DR: It is shown that scalable architectures require at least 32 words of local memory and therefore are not applicable for low-density technologies, and it is also shown that software support can bridge the performance gap between scalable and nonscalable architectures.
read more
Abstract: The tradeoff between visualizing or hiding the highest levels of the memory hierarchy, which impacts both performance and scalability, is examined by comparing a set of architectures from three major architecture families: stack, register, and memory-to-memory. The stack architecture is used as reference. It is shown that scalable architectures require at least 32 words of local memory and therefore are not applicable for low-density technologies. It is also shown that software support can bridge the performance gap between scalable and nonscalable architectures. A register architecture with 32 words of local storage allocated interprocedurally outperforms scalable architectures with equal sized local memories and even some with larger sized local memories. When a small cache is added to an unscalable architecture, their performance advantage becomes significant. >
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The effects of processor architecture on instruction memory traffic
TL;DR: This study has clearly indicated that cache factors should be taken into consideration when making architectural tradeoffs.
7
Quantitative assessment of machine-stack behaviour for better computer performance.
C. Bailey,R. Sotudeh +1 more
- 01 Jan 1994
TL;DR: Quantitative assessment of stack behaviour will clearly demonstrate statistical and probabilistic examples of stack actions, helping to guide future designs, not only improving stack machine efficiency, but perhaps influencing future designs in RISC and CISC technologies.
4
•Dissertation
Loop optimization techniques on multi-issue architectures
Dan R. Kaiser
- 01 Sep 1995
TL;DR: The results show that the scheduling technique chosen for the compiler has a significant impact on the overall system performance and can even change the rank ordering when comparing the performance of VLIW, DAE and superscalar architectures.
3
A dedicated multiprocessor system for pixel treatment in rastering and medical imaging applications
V. Di Lecce,E. Di Sciascio +1 more
- 13 May 1996
TL;DR: The paper presents the design and evaluation of an efficient routing structure for implementation within a dedicated multiprocessor architecture whose target is 3D medical imaging applications, and computer graphics rastering.
1
Application of the AFT technique for low-cost and accurate measurements
Gregorio Andria,V. Di Lecce,M. Savino +2 more
- 13 May 1996
TL;DR: This paper deals with an evaluation of the use of the arithmetic Fourier transform (AFT) in fast and accurate measurement of the analyzed signal amplitude, and finds that the approach in the frequency domain is preferable to that in the time domain.
References
A methodology for the real world
Gregory J. Chaitin,Marc Alan Auslander,Ashok K. Chandra,John Cocke,Martin Edward Hopkins,Peter Willy Markstein +5 more
TL;DR: Preliminary results of an experimental implementation in a PL/I optimizing compiler suggest that global register allocation approaching that of hand-coded assembly language may be attainable.
800
Global register allocation at link time
David W. Wall
- 01 Jul 1986
TL;DR: Construction of the call graph allows us to use the same register for locals of procedures that are not simultaneously active, giving us most of the advantages of a full-scale coloring without the expense.
•Book
Reduced instruction set computer architectures for VLSI
Manolis Katevenis
- 01 Jan 1985
TL;DR: This dissertation shows that the recent trend in computer architecture towards instruction sets of increasing complexity leads to inefficient use of scarce resources and investigates the alternative of Reduced Instruction Set Computer (RISC) architectures which allow effective use of on-chip transistors in functional units that provide fast access to frequently used operands and instructions.
213
An area model for on-chip memories and its application
TL;DR: An area model suitable for comparing data buffers of different organizations and arbitrary sizes is described and it is shown that, comparing caches and register files in terms of area for the same storage capacity, caches generally occupy more area per bit than register files for small caches because the overhead dominates the cache area at these sizes.
Reduced Instruction Set Computer Architectures for VLSI
Emmanuel Katevenis
- 01 Oct 1983
TL;DR: In this paper, the authors investigate the alternative of Reduced Instruction Set Computer (RISC) architectures which allow effective use of on-chip transistors in functional units that provide fast access to frequently used operands and instructions.
181