Compiling for the Impulse Memory Controller

doi:10.5555/645988.674173

Open AccessProceedings Article10.5555/645988.674173

Compiling for the Impulse Memory Controller

Xianglong Huang, +2 more

- 08 Sep 2001

- pp 141-150

11

TL;DR: Comp compiler cost models using dependence and locality analysis are presented that determine when to use Impulse to improve performance based on the reduction in misses, the additional cost for misses in Impulse, and the fixed cost for setting up a remapping.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/TVLSI.2008.2000821

Dynamic Memory Access Management for High-Performance DSP Applications Using High-Level Synthesis

Bertrand Le Gal, +2 more

- 01 Nov 2008

- IEEE Transactions on Very Large Scale In...

TL;DR: This paper focuses on implementing memory interfacing modules that can be automatically generated from a high-level synthesis tool and which can efficiently handle predictable address patterns as well as random ones to save power consumption and reduce latency.

...read moreread less

30

•Book Chapter•10.1007/3-540-45937-5_22

Value-Profile Guided Stride Prefetching for Irregular Code

Youfeng Wu, +4 more

- 08 Apr 2002

TL;DR: A novel compiler technique to profile and prefetch for those loads with near-constant strides, which captures not only the dominant stride values for each profiled load, but also the differences between the successive strides of the load.

...read moreread less

28

•Journal Article•10.1145/1138035.1138036

A lifetime optimal algorithm for speculative PRE

Jingling Xue, +1 more

- 01 Jun 2006

- ACM Transactions on Architecture and Cod...

TL;DR: A lifetime optimal algorithm, called MC-PRE, is presented for the first time that performs speculative PRE based on edge profiles and is capable of eliminating more partial redundancies than both LCM and CMP-PRE (especially in functions with complex control flow), and, in addition, MC-pre inserts temporaries with shorter lifetimes than MC- PREcopt.

...read moreread less

20

•Dissertation

Contribution à la prise en compte des contraintes des applications TDSI dans la synthèse de haut niveau

Bertrand Le Gal

- 01 Jan 2005

TL;DR: The concept of composant virtuel de niveau comportemental, proposed par le LESTER, autorise une grande flexibilite and une bonne adequation entre algorithme and architecture as discussed by the authors.

...read moreread less

8

Proceedings Article•10.1109/PACT.2009.43

Region Based Structure Layout Optimization by Selective Data Copying

Sandya Mannarswamy, +2 more

- 12 Sep 2009

TL;DR: The RBSL framework is described, implemented in the production compiler for C/C++ on HP-UX IA-64 and it is shown that acting in complement to the existing and mature WPSL transformation framework in the compiler, RBSS improves application performance in pointer intensive SPEC benchmarks ranging from 3% to 28% over WpsL.

...read moreread less

6

References

Journal Article•10.1145/135226.135233

A practical algorithm for exact array dependence analysis

William Pugh

- 01 Aug 1992

- Communications of The ACM

TL;DR: A fundamental analis step in an ad',nced optimizing compiler (as well as many other software tools) is data dependence analysis f o r arrays, which determines whether two references to an array can refer to the same e lement and under what conditions.

...read moreread less

617

•Journal Article•10.1145/233561.233564

Improving data locality with loop transformations

Kathryn S. McKinley, +2 more

- 01 Jul 1996

- ACM Transactions on Programming Language...

TL;DR: This article presents compiler optimizations to improve data locality based on a simple yet accurate cost model and finds performance improvements were difficult to achieve, but improved several programs.

...read moreread less

590

Proceedings Article•10.1109/HPCA.1999.744334

Impulse: building a smarter memory controller

John B. Carter, +11 more

- 09 Jan 1999

TL;DR: The design of the Impulse architecture is described, and how an Impulse memory system can be used to improve the performance of memory-bound programs is shown, which improves performance for the NAS conjugate gradient benchmark by 67%.

...read moreread less

279

Proceedings Article•10.1145/277650.277661

Data transformations for eliminating conflict misses

Gabriel Rivera, +1 more

- 01 May 1998

TL;DR: Experiments on arange of programs indicate PADLITE can eliminate conflicts for benchmarks, but PAD is more effective over a range of cache and problem sizes, with some SPEC95 programs improving up to 15%.

...read moreread less

258

Proceedings Article•10.1145/263580.263657

Cache miss equations: an analytical representation of cache misses

Soumyadip Ghosh, +2 more

- 11 Jul 1997

TL;DR: In this article, the authors describe methods for generating and solving cache miss equations that give a detailed representation of the cache misses in loop-oriented scientific code, which can be used to guide code optimizations for improving cache performance.

...read moreread less

209

...

Expand

Compiling for the Impulse Memory Controller

Chat with Paper

AI Agents for this Paper

Citations

Dynamic Memory Access Management for High-Performance DSP Applications Using High-Level Synthesis

Value-Profile Guided Stride Prefetching for Irregular Code

A lifetime optimal algorithm for speculative PRE

Contribution à la prise en compte des contraintes des applications TDSI dans la synthèse de haut niveau

Region Based Structure Layout Optimization by Selective Data Copying

References

A practical algorithm for exact array dependence analysis

Improving data locality with loop transformations

Impulse: building a smarter memory controller

Data transformations for eliminating conflict misses

Cache miss equations: an analytical representation of cache misses

Related Papers (5)

Memory data organization for improved cache performance in embedded processor applications

Compiler Support for Optimizing Memory Bank-Level Parallelism

Let's study whole-program cache behaviour analytically

Cache miss heuristics and preloading techniques for general-purpose programs

Measurement-based modeling of the cache replacement policy