Nested loop join

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/113445.113449•

A data locality optimizing algorithm

[...]

Michael Wolf¹, Monica S. Lam¹•Institutions (1)

Stanford University¹

1 May 1991

TL;DR: An algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling is proposed, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation, LU decomposition without pivoting, and Givens QR factorization.

...read moreread less

Abstract: This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The loop transformation algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations.The algorithm has been implemented in the SUIF (Stanford University Intermediate Format) compiler, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code.

...read moreread less

1,423 citations

Proceedings Article•10.1145/1375581.1375595•

A practical automatic polyhedral parallelizer and locality optimizer

[...]

Uday Bondhugula¹, Albert Hartono¹, J. Ramanujam², P. Sadayappan¹•Institutions (2)

Ohio State University¹, Louisiana State University²

7 Jun 2008

TL;DR: An automatic polyhedral source-to-source transformation framework that can optimize regular programs for parallelism and locality simultaneously simultaneously and is implemented into a tool to automatically generate OpenMP parallel code from C program sections.

...read moreread less

Abstract: We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model -- far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.

...read moreread less

1,096 citations

Proceedings Article•10.1145/956750.956758•

Mining distance-based outliers in near linear time with randomization and a simple pruning rule

[...]

Stephen D. Bay, Mark Schwabacher¹•Institutions (1)

Ames Research Center¹

24 Aug 2003

TL;DR: This work shows that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used.

...read moreread less

Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

...read moreread less

742 citations

Journal Article•10.1145/2400682.2400713•

Polyhedral parallel code generation for CUDA

[...]

Sven Verdoolaege¹, Juan Carlos Juega², Albert Cohen¹, José Ignacio Gómez², Christian Tenllado², Francky Catthoor³ - Show less +2 more•Institutions (3)

École Normale Supérieure¹, Complutense University of Madrid², IMEC³

20 Jan 2013

TL;DR: A novel source-to-source compiler called PPCG is presented, which introduces a multilevel tiling strategy and a code generation scheme for the parallelization and locality optimization of imperfectly nested loops, managing memory and exposing concurrency according to the constraints of modern GPUs.

...read moreread less

Abstract: This article addresses the compilation of a sequential program for parallel execution on a modern GPU. To this end, we present a novel source-to-source compiler called PPCG. PPCG singles out for its ability to accelerate computations from any static control loop nest, generating multiple CUDA kernels when necessary. We introduce a multilevel tiling strategy and a code generation scheme for the parallelization and locality optimization of imperfectly nested loops, managing memory and exposing concurrency according to the constraints of modern GPUs. We evaluate our algorithms and tool on the entire PolyBench suite.

...read moreread less

431 citations

Journal Article•10.1145/1133255.1134029•

Termination proofs for systems code

[...]

Byron Cook¹, Andreas Podelski², Andrey Rybalchenko²•Institutions (2)

Microsoft¹, Max Planck Society²

11 Jun 2006

TL;DR: A new program termination prover is described that performs a path-sensitive and context-sensitive program analysis and provides capacity for large program fragments together with support for programming language features such as arbitrarily nested loops, pointers, function-pointers, side-effects, etc.

...read moreread less

Abstract: Program termination is central to the process of ensuring that systems code can always react. We describe a new program termination prover that performs a path-sensitive and context-sensitive program analysis and provides capacity for large program fragments (i.e. more than 20,000 lines of code) together with support for programming language features such as arbitrarily nested loops, pointers, function-pointers, side-effects, etc.We also present experimental results on device driver dispatch routines from theWindows operating system. The most distinguishing aspect of our tool is how it shifts the balance between the two tasks of constructing and respectively checking the termination argument. Checking becomes the hard step. In this paper we show how we solve the corresponding challenge of checking with binary reachability analysis.

...read moreread less

426 citations

...

Expand

Year	Papers
2025	2
2024	6
2023	10
2022	28
2021	31
2020	31

Topic Tools

Papers published on a yearly basis

Papers

A data locality optimizing algorithm

A practical automatic polyhedral parallelizer and locality optimizer

Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Polyhedral parallel code generation for CUDA

Termination proofs for systems code

Related Topics (5)

Performance Metrics