Register pointer architecture for efficient embedded processors

doi:10.5555/1266366.1266493

Open AccessProceedings Article10.5555/1266366.1266493

Register pointer architecture for efficient embedded processors

Jongsoo Park, +5 more

- 16 Apr 2007

- pp 600-605

19

TL;DR: The RPA is introduced, which allows registers to be accessed indirectly through register pointers and affords additional flexibility in naming registers, which reduces the need to apply loop unrolling in order to maximize reuse of register allocated variables.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/1950413.1950420

VEGAS: soft vector processor with scratchpad memory

Christopher Han-Yu Chou, +5 more

- 27 Feb 2011

TL;DR: VEGAS as mentioned in this paper is a new soft vector architecture, in which the vector processor reads and writes directly to a scratchpad memory instead of a vector register file, allowing up to 9x more data elements to fit into on-chip memory, and the use of fracturable ALUs allow efficient processing of bytes, halfwords and words in the same processor instance, providing up to 4x the operations compared to existing fixed-width soft vector ALUs.

...read moreread less

103

•Proceedings Article•10.1109/ICSAMOS.2010.5642059

A Polymorphic Register File for matrix operations

Cbtblin Ciobanu, +3 more

- 19 Jul 2010

TL;DR: A register file organization which allows dynamic creation of a variable number of multidimensional registers of arbitrary sizes referred to as a Polymorphic Register File is proposed to evaluate the performance benefits of the proposed organization.

...read moreread less

14

•Book Chapter•10.1007/978-3-642-19137-4_2

Scalability evaluation of a polymorphic register file: A CG case study

Catalin Bogdan Ciobanu, +4 more

- 24 Feb 2011

TL;DR: A heterogeneous multi-processor architecture, taking into consideration critical parameters such as cache bandwidth and memory latency, is focused on, with results suggesting that for the Sparse Matrix Vector Multiplication kernel, absolute speedups of up to 200 times can be obtained.

...read moreread less

12

Patent

Computation table for block computation

Ravi Kumar Arimilli, +1 more

- 16 Dec 2008

TL;DR: In this paper, a compiler identifies a code section that is not candidate for acceleration and identifying a code block specifying an iterated operation that is a candidate for accelerated code, and in response to identifying the code block, the compiler creates and outputs an operation data structure separate from the post-processed code that identifies the iterated operations.

...read moreread less

7

Patent

Block driven computation using a caching policy specified in an operand data structure

Ravi Kumar Arimilli, +1 more

- 16 Dec 2008

TL;DR: A processor has an associated memory hierarchy including a cache memory as discussed by the authors, and a computation engine computes and stores operands in the memory hierarchy in accordance with the cache policies indicated within the operand data structure.

...read moreread less

7

...

Expand

References

•Proceedings Article•10.1109/WWC.2001.15

MiBench: A free, commercially representative embedded benchmark suite

Matthew R. Guthaus, +5 more

- 02 Dec 2001

TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.

...read moreread less

3.7K

Journal Article•10.1145/989393.989420

Software pipelining: an effective scheduling technique for VLIW machines

Monica S. Lam

- 01 Jun 1988

TL;DR: This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors, and proposes a hierarchical reduction scheme whereby entire control constructs are reduced to an object similar to an operation in a basic block.

...read moreread less

940

•Journal Article•10.1109/JSSC.1996.542315

A 160 MHz 32 b 0.5 W CMOS RISC microprocessor

J. Montanaro, +19 more

- 08 Feb 1996

TL;DR: This custom VLSI implementation of a microprocessor architecture delivers 184 Drystone/MIPS at 162 MHz dissipating 0.5 W using an 1.5 V internal supply and Clock generation uses an on-chip PLL with 3.68 MHz input clock to minimize high frequency clock signals on the board.

...read moreread less

733

Proceedings Article•10.1145/951710.951714

Vectorizing for a SIMdD DSP architecture

Dorit Naishlos, +3 more

- 30 Oct 2003

TL;DR: This paper demonstrates how SIMdD (SIMD on disjoint data) supports e#ective vectorization of digital signal processing (DSP) benchmarks, by facilitating data reorganization and reuse.

...read moreread less

79

Journal Article•10.1109/12.946998

Evaluating the use of register queues in software pipelined loops

Gary Tyson, +2 more

- 01 Aug 2001

- IEEE Transactions on Computers

TL;DR: It is shown that decoupling the architected register space from the physical register space can greatly increase the applicability of software pipelining, even as memory latencies increase.

...read moreread less

29