Journal Article10.1109/71.706049
A compiler optimization algorithm for shared-memory multiprocessors
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Estimating cache misses and locality using stack distances
Calin CaΒcaval,David Padua +1 more
- 23 Jun 2003
TL;DR: This paper presents a method to estimate the number of cache misses, at compile time, using a machine independent model based on stack algorithms, which provides a very good approximation for set-associative caches and programs with non-constant dependence distances.
Patent
Loop optimization with mapping code on an architecture
Koen Danckaert,Francky Catthoor +1 more
- 31 Jan 2000
TL;DR: In this paper, a loop transformation step, to improve data locality and regularity of the algorithm described by the code, is presented, which works globally and is feasible for realistic code sizes.
57
Performance modeling of communication and computation in hybrid MPI and OpenMP applications
Laksono Adhianto,Barbara Chapman +1 more
TL;DR: The construction of a model that is based upon a small number of parameters, but is able to capture the complexity of the runtime system is proposed, and how this tool can be applied to a sample code is shown.
56
Performance modeling of communication and computation in hybrid MPI and OpenMP applications
Laksono Adhianto,Barbara Chapman +1 more
- 12 Jul 2006
TL;DR: This paper proposes the construction of a model that is based upon a small number of parameters, but is able to capture the complexity of the runtime system, and describes the underlying framework, the performance model, and shows how it can be applied to a sample code.
52
Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining
Matthias Korch,Thomas Rauber +1 more
TL;DR: This paper considers embedded Runge-Kutta methods for the solution of ordinary differential equations and explores how the potential parallelism in the stage vector computation of such equations can be exploited in a pipelining approach leading to a better locality behavior and a higher scalability.
43
References
•Book
LINPACK Users' Guide
Jack Dongarra,Cleve B. Moler,J. R. Bunch,G. W. Stewart +3 more
- 01 Jan 1987
TL;DR: General matrices Band matrices positive definite matrices Positive definite band matrices Symmetric Indefinite Matrices Triangular matrices Tridiagonal matrices The Cholesky decomposition The QR decomposition up to and including the singular value decomposition is studied.
1.7K
Dependence graphs and compiler optimizations
David J. Kuck,Robert H. Kuhn,David Padua,Bruce Leasure,Michael Wolfe +4 more
- 26 Jan 1981
TL;DR: This paper defines such graphs and discusses two kinds of transformations, simple rewriting transformations that remove dependence arcs and abstraction transformations that deal more globally with a dependence graph.
752
A loop transformation theory and an algorithm to maximize parallelism
Michael Wolf,Monica S. Lam +1 more
TL;DR: The loop transformation theory is applied to the problem of maximizing the degree of coarse- or fine-grain parallelism in a loop nest and it is shown that the maximum degree of parallelism can be achieved by transforming the loops into a nest of coarsest fullypermutable loop nests and wavefronting the fully permutable nests.
727
Improving data locality with loop transformations
TL;DR: This article presents compiler optimizations to improve data locality based on a simple yet accurate cost model and finds performance improvements were difficult to achieve, but improved several programs.
Direct Search Methods on Parallel Machines
John E. Dennis,Virginia Torczon +1 more
TL;DR: Direct search methods are methods designed to solve unconstrained minimization problems of the form min x in R n f(x), distinguished by the fact that they neither use nor require explicit derivative information; the search for a local minimizer is driven solely by function information.