Loop fission

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Proceedings Article•10.1145/113445.113449•

A data locality optimizing algorithm

[...]

Michael Wolf¹, Monica S. Lam¹•Institutions (1)

Stanford University¹

1 May 1991

TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.

...read moreread less

Abstract: This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The loop transformation algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations.The algorithm has been implemented in the SUIF (Stanford University Intermediate Format) compiler, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code.

...read moreread less

1,423 citations

Journal Article•10.1016/J.AUTOMATICA.2009.10.035•

Brief paper: A state-feedback approach to event-based control

[...]

Jan Lunze¹, Daniel Lehmann¹•Institutions (1)

Ruhr University Bochum¹

01 Jan 2010-Automatica

TL;DR: An upper bound of the difference between both loops is derived, which shows that the approximation of the continuous state-feedback loop by the event-based control loop can be made arbitrarily tight by appropriately choosing the threshold parameter of the event generator.

...read moreread less

1,091 citations

Journal Article•10.1145/233561.233564•

Improving data locality with loop transformations

[...]

Kathryn S. McKinley¹, Steve Carr², Chau-Wen Tseng³•Institutions (3)

University of Massachusetts Amherst¹, Michigan Technological University², University of Maryland, College Park³

01 Jul 1996-ACM Transactions on Programming Languages and Systems

TL;DR: This article presents compiler optimizations to improve data locality based on a simple yet accurate cost model and finds performance improvements were difficult to achieve, but improved several programs.

...read moreread less

Abstract: In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In the this article, we present compiler optimizations to improve data locality based on a simple yet accurate cost model. The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations. The cost model drives the application of compound transformations consisting of loop permutation, loop fusion, loop distribution, and loop reversal. To validate our optimization strategy, we implemented our algorithms and ran experiments on a large collection of scientific programs and kernels. Experiments illustrate that for kernels our model and algorithm can select and achieve the best loop structure for a nest. For over 30 complete applications, we executed the original and transformed versions and simulated cache hit rates. We collected statistics about the inherent characteristics of these programs and our ability to improve their data locality. To our knowledge, these studies are the first of such breadth and depth. We found performance improvements were difficult to achieve bacause benchmark programs typically have high hit rates even for small data caches; however, our optimizations significanty improved several programs.

...read moreread less

590 citations

Journal Article•10.1145/212094.212131•

Software pipelining

[...]

Vicki H. Allan¹, Reese B. Jones², Randall M. Lee, Stephen J. Allan¹•Institutions (2)

Utah State University¹, Evans & Sutherland²

01 Sep 1995-ACM Computing Surveys

TL;DR: A comparison of the alternative methods for software pipelining is presented, and the relationships between the methods are explored and possibilities for improvement highlighted.

...read moreread less

Abstract: Utilizing parallelism at the instruction level is an important way to improve performance. Because the time spent in loop execution dominates total execution time, a large body of optimizations focuses on decreasing the time to execute each iteration. Software pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism.Let {ABC}n represent a loop containing operations A, B, C that is executed n times. Although the operations of a single iteration can be parallelized, more parallelism may be achieved if the entire loop is considered rather than a single iteration. The software pipelining transformation utilizes the fact that a loop {ABC}n is equivalent to A{BCA}n−1BC. Although the operations contained in the loop do not change, the operations are from different iterations of the original loop.Various algorithms for software pipelining exist. A comparison of the alternative methods for software pipelining is presented. The relationships between the methods are explored and possibilities for improvement highlighted.

...read moreread less

386 citations

Journal Article•10.1145/960116.54021•

Optimal loop parallelization

[...]

Alex Aiken¹, Alexandru Nicolau¹•Institutions (1)

Cornell University¹

1 Jun 1988

TL;DR: This paper presents a new technique bridging the gap between fine-and coarse-grain loop parallelization, allowing the exploitation of parallelism inside and across loop iterations, and shows that, given a loop and a set of dependencies between its statements, the execution schedule is time optimal.

...read moreread less

Abstract: Parallelizing compilers promise to exploit the parallelism available in a given program, particularly parallelism that is too low-level or irregular to be expressed by hand in an algorithm. However, existing parallelization techniques do not handle loops in a satisfactory manner. Fine-grain (instruction level) parallelization, or compaction, captures irregular parallelism inside a loop body but does not exploit parallelism across loop iterations. Coarser methods, such as doacross [9], sacrifice irregular forms of parallelism in favor of pipelining iterations (software pipelining). Both of these approaches often yield suboptimal speedups even under the best conditions-when resources are plentiful and processors are synchronous. In this paper we present a new technique bridging the gap between fine-and coarse-grain loop parallelization, allowing the exploitation of parallelism inside and across loop iterations. Furthermore, we show that, given a loop and a set of dependencies between its statements, the execution schedule obtained by our transformation is time optimal: no transformation of the loop based on the given data-dependencies can yield a shorter running time for that loop.

...read moreread less

266 citations

...

Expand

Year	Papers
2025	3
2024	4
2023	7
2022	10
2020	2
2018	3

Topic Tools

Papers published on a yearly basis

Papers

A data locality optimizing algorithm

Brief paper: A state-feedback approach to event-based control

Improving data locality with loop transformations

Software pipelining

Optimal loop parallelization

Related Topics (5)

Performance Metrics