Topic

Stencil code

About: Stencil code is a research topic. Over the lifetime, 474 publications have been published within this topic receiving 11442 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/2491956.2462176•

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

[...]

Jonathan Ragan-Kelley¹, Connelly Barnes², Andrew Adams¹, Sylvain Paris², Frédo Durand¹, Saman Amarasinghe¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

16 Jun 2013

TL;DR: A systematic model of the tradeoff space fundamental to stencil pipelines is presented, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule are presented.

...read moreread less

Abstract: Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values.We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.

...read moreread less

1,262 citations

Proceedings Article•10.1109/ICPPW.2010.38•

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

[...]

Jan Treibig¹, Georg Hager¹, Gerhard Wellein¹•Institutions (1)

University of Erlangen-Nuremberg¹

13 Sep 2010

TL;DR: LIKWID as mentioned in this paper is a set of command-line utilities that address four key problems: probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and toggling hardware prefetchers.

...read moreread less

Abstract: Exploiting the performance of today's processors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command-line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and toggling hardware prefetchers. An API for using the performance counting features from user code is also included. We clearly state the differences to the widely used PAPI interface. To demonstrate the capabilities of the tool set we show the influence of thread pinning on performance using the well-known OpenMP STREAM triad benchmark, and use the affinity and hardware counter tools to study the performance of a stencil code specifically optimized to utilize shared caches on multicore chips.

...read moreread less

497 citations

Proceedings Article•10.1109/IPDPS.2011.70•

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

[...]

Matthias Christen¹, Olaf Schenk¹, Helmar Burkhart¹•Institutions (1)

University of Basel¹

16 May 2011

TL;DR: This work presents a code generation and auto-tuning framework for stencil computations targeted at multi- and many core processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the auto tuning methodology to optimize strategy-dependent parameters for the given hardware architecture.

...read moreread less

Abstract: Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore crucial in order to reduce the time to solution. However, in the current complex hardware micro architectures, meticulous architecture-specific tuning is required to elicit the machine's full compute power. We present a code generation and auto-tuning framework \textsc{Patus} for stencil computations targeted at multi- and many core processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the auto tuning methodology to optimize strategy-dependent parameters for the given hardware architecture.

...read moreread less

387 citations

Posted Content•

LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments

[...]

Jan Treibig¹, Georg Hager¹, Gerhard Wellein¹•Institutions (1)

University of Erlangen-Nuremberg¹

26 Apr 2010-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work shows the influence of thread pinning on performance using the well-known OpenMP STREAM triad benchmark, and uses the affinity and hardware counter tools to study the performance of a stencil code specifically optimized to utilize shared caches on multicore chips.

...read moreread less

349 citations

Journal Article•10.1111/J.1365-246X.2004.02289.X•

Mixed‐grid and staggered‐grid finite‐difference methods for frequency‐domain acoustic wave modelling

[...]

Bernhard Hustedt¹, Stéphane Operto¹, Jean Virieux¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Jun 2004-Geophysical Journal International

TL;DR: In this article, the authors compare different finite-difference schemes for two-dimensional (2D) acoustic frequency-domain forward modelling based on staggered-grid stencils.

...read moreread less

Abstract: SUMMARY We compare different finite-difference schemes for two-dimensional (2-D) acoustic frequency-domain forward modelling. The schemes are based on staggered-grid stencils of various accuracy and grid rotation strategies to discretize the derivatives of the wave equation. A combination of two staggered-grid stencils on the classical Cartesian coordinate system and the 45° rotated grid is the basis of the so-called mixed-grid stencil. This method is compared with a parsimonious staggered-grid method based on a fourth-order approximation of the first derivative operator. Averaging of the mass acceleration can be incorporated in the two stencils. Sponge-like perfectly matched layer absorbing boundary conditions are also examined for each stencil and shown to be effective. The deduced numerical stencils are examined for both the wavelength content and azimuthal variation. The accuracy of the fourth-order staggered-grid stencil is slightly superior in terms of phase velocity dispersion to that of the mixed-grid stencil when averaging of the mass acceleration term is applied to the staggered-grid stencil. For fourth-order derivative approximations, the classical staggered-grid geometry leads to a stencil that incorporates 13 grid nodes. The mixed-grid approach combines only nine grid nodes. In both cases, wavefield solutions are computed using a direct matrix solver based on an optimized multifrontal method. For this 2-D geometry, the staggered-grid strategy is significantly less efficient in terms of memory and CPU time requirements because of the enlarged bandwidth of the impedance matrix and increased number of coefficients in the discrete stencil. Therefore, the mixed-grid approach should be suggested as the routine scheme for 2-D acoustic wave propagation modelling in the frequency domain.

...read moreread less

340 citations

...

Expand

Performance Metrics

474

Papers

2,939

Citations

No. of papers in the topic in previous years
Year	Papers
2021	19
2020	33
2019	33
2018	41
2017	51
2016	44

Stencil code

Topic Tools

Papers published on a yearly basis

Papers

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments

Mixed‐grid and staggered‐grid finite‐difference methods for frequency‐domain acoustic wave modelling

Related Topics (5)

Performance Metrics