FFT Compiler Techniques
Stefan Kral,Franz Franchetti,Juergen Lorenz,Christoph W. Ueberhuber,Peter Wurzinger +4 more
- 29 Mar 2004
- pp 217-231
TL;DR: Comp compiler technology that targets general purpose microprocessors augmented with SIMD execution units for exploiting data level parallelism is presented.
read more
Abstract: This paper presents compiler technology that targets general purpose microprocessors augmented with SIMD execution units for exploiting data level parallelism. Numerical applications are accelerated by automatically vectorizing blocks of straight line code to be run on processors featuring two-way short vector SIMD extensions like Intel’s SSE 2 on Pentium 4, SSE 3 on Intel Prescott, AMD’s 3DNow! , and IBM’s SIMD operations implemented on the new processors of the BlueGene/L supercomputer.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Efficient Utilization of SIMD Extensions
Franz Franchetti,Stefan Kral,Juergen Lorenz,Christoph W. Ueberhuber +3 more
- 27 Jun 2005
TL;DR: Special-purpose compiler technology that supports automatic performance tuning on machines with vector instructions is described, which leads to substantial speedups over the best scalar C codes generated by the original systems as well as roughly matching the performance of hand-tuned vendor libraries.
Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements
TL;DR: The volumetric FFT outperforms a port of the FFTW Version 2.1.5 library on large-node-count partitions and compared with that of the Fastest Fourier Transform in the West (FFTW) library.
63
Register optimizations for stencils on GPUs
Prashant Singh Rawat,Fabrice Rastello,Aravind Sukumaran-Rajam,Louis-Noël Pouchet,Atanas Rountev,P. Sadayappan +5 more
- 10 Feb 2018
TL;DR: A statement reordering framework is developed that models stencil computations as a DAG of trees with shared leaves, and adapts an optimal scheduling algorithm for minimizing register usage for expression trees.
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets
Daniel S. McFarlin,Volodymyr Arbatov,Franz Franchetti,Markus Püschel +3 more
- 31 May 2011
TL;DR: A peephole-based vectorization system that takes as input the vector instruction semantics and outputs a library of basic data reorganization blocks such as small transpositions and perfect shuffles that are needed in a variety of high performance computing applications is presented.
Performance measurements of the 3D FFT on the blue gene/l supercomputer
Maria Eleftheriou,Blake G. Fitch,Aleksandr Rayshubskiy,T. J. Christopher Ward,Robert S. Germain +4 more
- 30 Aug 2005
TL;DR: In this paper, the authors present performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture, and compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives).
References
•Book
Advanced Compiler Design and Implementation
Steven S. Muchnick
- 01 Jan 1997
TL;DR: Advanced Compiler Design and Implementation by Steven Muchnick Preface to Advanced Topics
2.6K
FFTW: an adaptive software architecture for the FFT
Matteo Frigo,Steven G. Johnson +1 more
- 12 May 1998
TL;DR: An adaptive FFT program that tunes the computation automatically for any particular hardware, and tests show that FFTW's self-optimizing approach usually yields significantly better performance than all other publicly available software.
A study of replacement algorithms for a virtual-storage computer
TL;DR: One of the basic limitations of a digital computer is the size of its available memory; an approach that permits the programmer to use a sufficiently large address range can accomplish this objective, assuming that means are provided for automatic execution of the memory-overlay functions.
A fast Fourier transform compiler
Matteo Frigo
- 01 May 1999
TL;DR: The internals of this special-purpose compiler, called genfft, are described in some detail, and it is argued that a specialized compiler is a valuable tool.
Exploiting superword level parallelism with multimedia instruction sets
Samuel Larsen,Saman Amarasinghe +1 more
- 01 May 2000
TL;DR: This paper has developed a simple and robust compiler for detecting SLPP that targets basic blocks rather than loop nests, and is able to exploit parallelism both across loop iterations and within basic blocks.
Related Papers (5)
Matteo Frigo,Steven G. Johnson +1 more
- 12 May 1998
Matteo Frigo,Steven G. Johnson +1 more
- 24 Jan 2005
Samuel Larsen,Saman Amarasinghe +1 more
- 01 May 2000
Matteo Frigo
- 01 May 1999
Franz Franchetti,Markus Püschel +1 more
- 15 Apr 2002