Open Access
Vector microprocessors
Krste Asanovic,John Wawrzynek +1 more
- 01 Jan 1998
179
TL;DR: This thesis presents the design, implementation, and evaluation of T0 (Torrent-0): the first single-chip vector microprocessor, a compact but highly parallel processor that can sustain over 24 operations per cycle while issuing only a single 32-bit instruction per cycle.
read more
Abstract: Most previous research into vector architectures has concentrated on supercomputing applications and small enhancements to existing vector supercomputer implementations. This thesis expands the body of vector research by examining designs appropriate for single-chip full-custom vector microprocessor implementations targeting a much broader range of applications.
I present the design, implementation, and evaluation of T0 (Torrent-0): the first single-chip vector microprocessor. T0 is a compact but highly parallel processor that can sustain over 24 operations per cycle while issuing only a single 32-bit instruction per cycle. T0 demonstrates that vector architectures are well suited to full-custom VLSI implementation and that they perform well on many multimedia and human-machine interface tasks.
The remainder of the thesis contains proposals for future vector microprocessor designs. I show that the most area-efficient vector register file designs have several banks with several ports, rather than many banks with few ports as used by traditional vector supercomputers, or one bank with many ports as used by superscalar microprocessors. To extend the range of vector processing, I propose a vector flag processing model which enables speculative vectorization of "while" loops. To improve the performance of inexpensive vector memory systems, I introduce virtual processor caches, a new form of primary vector cache which can convert some forms of strided and indexed vector accesses into unit-stride bursts.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
Computer Architecture, Fifth Edition: A Quantitative Approach
John L. Hennessy,David A. Patterson +1 more
- 29 Sep 2011
TL;DR: The Fifth Edition of Computer Architecture focuses on this dramatic shift in the ways in which software and technology in the "cloud" are accessed by cell phones, tablets, laptops, and other mobile computing devices.
1K
F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption
Nikola Samardzic,Axel Feldmann,Aleksandar Krastev,Srinivas Devadas,Ronald G. Dreslinski,Chris Peikert,Daniel Sanchez +6 more
- 18 Oct 2021
TL;DR: F1 as discussed by the authors is the first FHE accelerator that is programmable, i.e., capable of executing full FHE programs, based on an in-depth architectural analysis of the characteristics of FHE computations that reveals acceleration opportunities.
CraterLake: a hardware accelerator for efficient unbounded computation on encrypted data
Nikola Samardžić,Axel Feldmann,Aleksandar Krastev,Nathan Manohar,Nicholas Genise,Srinivas Devadas,Karim El Defrawy,Chris Peikert,Daniel S. Sanchez +8 more
- 18 Jun 2022
TL;DR: This work presents CraterLake, the first FHE accelerator that enables FHE programs of unbounded size (i.e., unbounded multiplicative depth), and introduces a new hardware architecture that efficiently scales to very large cipher-texts, novel functional units to accelerate key kernels, and new algorithms and compiler techniques to reduce data movement.
Application-specific memory management for embedded systems using software-controlled caches
Derek Chiou,Prabhat Jain,Larry Rudolph,Srinivas Devadas +3 more
- 01 Jun 2000
TL;DR: A way to improve the performance of embedded processors running data-intensive applications by allowing software to allocate on-chip memory on an application-specific basis via a novel hardware mechanism, called column caching.
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Yunsup Lee,Rimas Avizienis,Alex Bishara,Richard Xia,Derek Lockhart,Christopher Batten,Krste Asanovic +6 more
- 04 Jun 2011
TL;DR: The results suggest that the Maven VT microarchitecture is superior to the traditional vector-SIMD architecture, providing both greater efficiency and easier programmability.
References
•Book
Computer Architecture: A Quantitative Approach
John L. Hennessy,David A. Patterson +1 more
- 01 Dec 1989
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
12.6K
•Book
JPEG: Still Image Data Compression Standard
William B. Pennebaker,Joan L. Mitchell +1 more
- 31 Dec 1992
TL;DR: This chapter discusses JPEG Syntax and Data Organization, the history of JPEG, and some of the aspects of the Human Visual Systems that make up JPEG.
3.3K
Shared memory consistency models: a tutorial
TL;DR: This work describes an alternative, programmer-centric view of relaxed consistency models that describes them in terms of program behavior, not system optimizations, and most of these models emphasize the system optimizations they support.
Very high-speed computing systems
Michael J. Flynn
- 01 Dec 1966
TL;DR: In this paper, the authors classified very high-speed computers as follows: 1) Single Instruction Stream-Single Data Stream (SISD) 2) SIMD 3) MIMD 4) MISD-MIMD.
1K
The SGI Origin: a ccNUMA highly scalable server
James Laudon,Daniel E. Lenoski +1 more
- 01 May 1997
TL;DR: The motivation for building the Origin 2000 is discussed and the architecture and implementation of the multiprocessor is described, and performance results are presented for the NAS Parallel Benchmarks V2.2 and the SPLASH2 applications.