Proceedings Article10.1145/951710.951733
A new look at exploiting data parallelism in embedded systems
Hillery C. Hunter,Jaime H. Moreno +1 more
- 30 Oct 2003
- pp 159-169
TL;DR: While some algorithms exhibit data-level parallelism suited to packed vector computation, it is shown that other kernels are most efficiently scheduled with more flexible vector models, which motivates exploration of non-traditional processor architectures for the embedded domain.
read more
Abstract: This paper describes and evaluates three architectural methods for accomplishing data parallel computation in a programmable embedded system. Comparisons are made between the well-studied Very Long Instruction Word (VLIW) and Single Instruction Multiple Packed Data (SIMpD) paradigms; the less-common Single Instruction Multiple Disjoint Data (SIMdD) architecture is described and evaluated. A taxonomy is defined for data-level parallel architectures, and patterns of data access for parallel computation are studied, with measurements presented for over 40 essential telecommunication and media kernels. While some algorithms exhibit data-level parallelism suited to packed vector computation, it is shown that other kernels are most efficiently scheduled with more flexible vector models. This motivates exploration of non-traditional processor architectures for the embedded domain.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Parallel computing works
TL;DR: Parallel Computing Works! by G.C.C Fox, R.D. Williams, and P. c. Messina is a guide to parallel computing in the 21st Century.
•Book
Real-Time Image and Video Processing: From Research to Reality
Nasser Kehtarnavaz,Mark N. Gamadia +1 more
- 05 Jul 2006
TL;DR: This book presents an overview of the guidelines and strategies for transitioning an image or video processing algorithm from a research environment into a real-time constrained environment, consisting of algorithm simplifications, hardware architectures, and software methods.
94
Real-Time Image and Video Processing: From Research to Reality
Nasser Kehtarnavaz,M. Gamadia +1 more
- 05 Jul 2006
TL;DR: This book presents an overview of the guidelines and strategies for transitioning an image or video processing algorithm from a research environment into a real-time constrained environment, consisting of algorithm simplifications, hardware architectures, and software methods.
76
Design and Implementation of Turbo Decoders for Software Defined Radio
Yuan Lin,Scott Mahlke,Trevor Mudge,Chaitali Chakrabarti,Alastair Reid,Krisztian Flautner +5 more
- 01 Oct 2006
TL;DR: A programmable DSP architecture for SDR is presented that includes a set of architectural features to accelerate turbo decoder computations and a parallel window scheduling for MAX-Log-MAP component decoder that matches well with the D SP architecture.
A Low-Power DSP for Wireless Communications
TL;DR: A low-power high-throughput digital signal processor (DSP) for baseband processing in wireless terminals by deploying operation chaining, pipelined execution of SIMD units, staggered memory access, and multicycling of computation units is proposed.
37
References
The ILLIAC IV Computer
TL;DR: The structure of ILLIAC IV, a parallel-array computer containing 256 processing elements, is described, special features include multiarray processing, multiprecision arithmetic, and fast data-routing interconnections.
614
•Book
Parallel Computing Works
Roy Williams,Guiseppe C. Messina,Geoffrey C. Fox,Paul Messina +3 more
- 15 May 1994
TL;DR: This chapter discusses synchronous applications, the Zipcode Message-Passing System, and the DIME Programming Environment, which simplifies the development of asynchronous applications.
292
Parallel computing works
TL;DR: Parallel Computing Works! by G.C.C Fox, R.D. Williams, and P. c. Messina is a guide to parallel computing in the 21st Century.
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Christoforos Kozyrakis,David A. Patterson +1 more
- 18 Nov 2002
TL;DR: This paper uses EEMBC, an industrial benchmark suite, to compare the VIRAM vector architecture to superscalar and VLIW processors for embedded multimedia applications and demonstrates that executable code for VirAM is up to 10 times smaller than V LIW code and comparable to x86 CISC code.
A high-performance embedded DSP core with novel SIMD features
J.H. Derby,J.H. Moreno +1 more
- 06 Apr 2003
TL;DR: An overview of the architecture of this low-power, high-performance, compiler-friendly DSP core, with a focus on its SIMD features, is provided.
141