Proceedings Article10.1109/ISSCC.2005.1493905
A streaming processing unit for a CELL processor
Brian Flachs,Shigehiro Asano,Sang Hoo Dhong,P. Hotstee,Gilles Gervais,Roy Moonseuk Kim,T. Le,Peichun Liu,Jentje Leenstra,John S. Liberty,Brad W. Michael,Hwa-Joon Oh,Silvia Melitta Mueller,O. Takahashi,A. Hatakeyama,Yukio Watanabe,N. Yano +16 more
- 29 Aug 2005
- pp 134-135
158
TL;DR: The design of a 4-way SIMD streaming data processor emphasizes achievable performance in area and power and minimizes instruction latency and provides fine-grain clock control to reduce power.
read more
Abstract: The design of a 4-way SIMD streaming data processor emphasizes achievable performance in area and power. Software controls data movement and instruction flow, and improves data bandwidth and pipeline utilization. The micro-architecture minimizes instruction latency and provides fine-grain clock control to reduce power.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The potential of the cell processor for scientific computing
Samuel Williams,John Shalf,Leonid Oliker,Shoaib Kamil,Parry Husbands,Katherine Yelick +5 more
- 03 May 2006
TL;DR: This work introduces a performance model for Cell and applies it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs, and proposes modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations.
The Potential of the Cell Processor for Scientific Computing
Samuel Williams,John Shalf,Leonid Oliker,Parry Husbands,Shoaib Kamil,Katherine Yelick +5 more
- 14 Oct 2005
TL;DR: In this article, the authors examined the potential of using the STI Cell processor as a building block for future high-end computing systems and proposed modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations.
Ray Tracing on the Cell Processor
Carsten Benthin,Ingo Wald,Michael Scherbaum,Heiko Friedrich +3 more
- 01 Sep 2006
TL;DR: Using a combination of low-level optimized kernel routines, a streaming software architecture, explicit caching, and a virtual software-hyperthreading approach to hide DMA latencies, for a single cell a pure ray tracing performance of nearly one order of magnitude over that achieved by a commodity CPU is achieved.
Hyperfast parallel-beam and cone-beam backprojection using the cell general purpose hardware.
TL;DR: The dual cell-based blade (Mercury Computer Systems) allows to 2D backproject 330 images/s and one can complete the 3D cone-beam backprojection in 6.8 s, greatly outperforms today's top-notch backprojections based on graphical processing units.
113
Scientific computing Kernels on the cell processor
TL;DR: A performance model for Cell is introduced and applied to several key numerical kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs and is validated by comparing results against published hardware data, as well as the own Cell blade implementations.
References
A 4.8GHz fully pipelined embedded SRAM in the streaming processor of a CELL processor
S.H. Dhong,Osamu Takahashi,M. White,Toru Asano,T. Nakazato,Joel Abraham Silberman,A. Kawasumi,H. Yoshihara +7 more
- 29 Aug 2005
TL;DR: A 6-stage fully pipelined embedded SRAM is implemented in a 90nm SOI technology that uses a conventional 6-transistor memory cell and sense amplifier to achieve the cycle time while minimizing the impact of device variation.
A fully-pipelined single-precision floating point unit in the synergistic processor element of a CELL processor
Hwa-Joon Oh,Silvia Melitta Mueller,C. Jacobi,Kevin D. Tran,S.R. Cottier,Brad W. Michael,Hiroo Nishikawa,Y. Totsuka,T. Namatame,N. Yano,T. Machida,Sang Hoo Dhong +11 more
- 16 Jun 2005
TL;DR: The floating point unit in the synergistic processor element of a CELL processor is a fully-pipelined 4-way SIMD unit designed to accelerate media and data streaming.
The vector fixed point unit of the synergistic processor element of the cell architecture processor
N. Mading,Jens Leenstra,Juergen Pille,Rolf Sautter,S. Buttner,Sebastian Ehrenreich,Wilhelm Haller +6 more
- 05 Dec 2005
TL;DR: A vector fixed point unit (FXU) is designed to speed up multi-media processing and implements SIMD style integer arithmetic and permute operations.
10
The design and implementation of a first-generation CELL processor
D. Pham,Shigehiro Asano,M. Bolliger,M. N. Day,Harm Peter Hofstee,Charles Ray Johns,J. Kahle,Atsushi Kameyama,J. Keaty,Y. Masubuchi,Mack W. Riley,David Shippy,Daniel Lawrence Stasiak,Masakazu Suzuoki,Michael Fan Wang,James D. Warnock,S. Weitzel,D. Wendel,Takeshi Yamazaki,Kazuaki Yazawa +19 more
- 29 Aug 2005
TL;DR: A CELL processor is a multi-core chip consisting of a 64b power architecture processor, multiple streaming processors, a flexible IO interface, and a memory interface controller that is implemented in 90nm SOI technology.