SWAR

Topic Tools

Papers

Compiling for SIMD Within a Register

[...]

Randall J. Fisher¹, Henry G. Dietz¹•Institutions (1)

7 Aug 1998

TL;DR: This paper focuses on how these missing operations can be implemented using either the existing SWAR hardware or even conventional 32-bit integer instructions, and offers a few new challenges for compiler optimization.

...read moreread less

Abstract: Although SIMD (Single Instruction stream Multiple Data stream) parallel computers have existed for decades, it is only in the past few years that a new version of SIMD has evolved: SIMD Within A Register (SWAR). Unlike other styles of SIMD hardware, SWAR models are tuned to be integrated within conventional microprocessors, using their existing memory reference and instruction handling mechanisms, with the primary goal of improving the speed of specific multimedia operations. Because the SWAR implementations for various microprocessors vary widely and each is missing instructions for some SWAR operations that are needed to support a more general, portable, high-level SIMD execution model, this paper focuses on how these missing operations can be implemented using either the existing SWAR hardware or even conventional 32-bit integer instructions. In addition, SWAR offers a few new challenges for compiler optimization, and these are briefly introduced.

...read moreread less

107 citations

General-purpose simd within a register: parallel processing on consumer microprocessors

[...]

Randall J. Fisher, Henry G. Dietz, Leah H. Jamieson

1 Jan 2003

TL;DR: This thesis will define a general-purpose SWAR (SIMD Within A Register) programming model that will be implemented for multiple target architectures: initially as compatible libraries, then as optimizing compilers accepting a simple high-level parallel language.

...read moreread less

Abstract: Recent extensions to microprocessor instruction sets are intended to speed-up multimedia algorithms by allowing SIMD parallel processing over multiple data fields within each processor register. These extensions, while effectively supporting hand-coding of some multimedia tasks, do not directly support a high-level parallel programming model. Unfortunately, the extensions vary widely across different processor families, making portability difficult to achieve. Even within one set of extensions, each operation is supported only for certain field widths, and the widths supported are different for different operations. This thesis will define a general-purpose SWAR (SIMD Within A Register) programming model. This model will be implemented for multiple target architectures: initially as compatible libraries, then as optimizing compilers accepting a simple high-level parallel language. The new SWAR libraries and compiler technology should enable a much wider range of applications to achieve speed-up through SIMD execution using COTS microprocessors.

...read moreread less

27 citations

Proceedings Article•10.1145/1508244.1508283•

Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle

[...]

Robert D. Cameron¹, Dan Lin¹•Institutions (1)

Simon Fraser University¹

7 Mar 2009

TL;DR: A set of simple SWAR instruction set extensions are proposed for this purpose and are shown to significantly reduce instruction count in core parallel bit stream algorithms, often providing a 3X or better improvement.

...read moreread less

Abstract: Parallel bit stream algorithms exploit the SWAR (SIMD within a register) capabilities of commodity processors in high-performance text processing applications such as UTF-8 to UTF-16 transcoding, XML parsing, string search and regular expression matching. Direct architectural support for these algorithms in future SWAR instruction sets could further increase performance as well as simplifying the programming task. A set of simple SWAR instruction set extensions are proposed for this purpose based on the principle of systematic support for inductive doubling as an algorithmic technique. These extensions are shown to significantly reduce instruction count in core parallel bit stream algorithms, often providing a 3X or better improvement. The extensions are also shown to be useful for SWAR programming in other application areas, including providing a systematic treatment for horizontal operations. An implementation model for these extensions involves relatively simple circuitry added to the operand fetch components in a pipelined processor.

...read moreread less

25 citations

Book Chapter•10.1007/3-540-44905-1_25•

The Scc Compiler: SWARing at MMX 3DNow!

[...]

Randall J. Fisher¹, Henry G. Dietz¹•Institutions (1)

Purdue University¹

4 Aug 1999

TL;DR: A more formal description of the SWARC language is provided, the organization of the current version of the Scc compiler is described, and the implementation of optimizations within this framework are discussed.

...read moreread less

Abstract: Last year, we discussed the issues surrounding the development of languages and compilers for a general, portable, high-level SIMD Within A Register (SWAR) execution model. In a first effort to provide such a language and a framework for further research on this form of parallel processing, we proposed the vector-based language SWARC, and an experimental module compiler for this language, called Scc, which targeted IA32+MMX-based architectures. Since that time, we have worked to expand the types of targets that Scc supports and to include optimizations based on both vector processing and enhanced hardware support for SWAR. This paper provides a more formal description of the SWARC language, describes the organization of the current version of the Scc compiler, and discusses the implementation of optimizations within this framework.

...read moreread less

19 citations

Journal Article•10.1016/J.CAG.2003.12.008•

High performance SIMD marching cubes isosurface extraction on commodity computers

[...]

Timothy S. Newman¹, J.Brad Byrd¹, Pavan Emani¹, Amit Narayanan¹, Abouzar Dastmalchi¹ - Show less +1 more•Institutions (1)

University of Alabama in Huntsville¹

01 Apr 2004-Computers & Graphics

TL;DR: An average overall speedup of nearly four times is achieved (compared to an unoptimized standard implementation that uses conventional serial processing) and the approach maintains the rendering quality of a standard serial implementation of the Marching Cubes.

...read moreread less

18 citations

Topic Tools

Papers

Compiling for SIMD Within a Register

General-purpose simd within a register: parallel processing on consumer microprocessors

Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle

The Scc Compiler: SWARing at MMX 3DNow!

High performance SIMD marching cubes isosurface extraction on commodity computers

Related Topics (5)

Performance Metrics

No. of papers in the topic in previous years
Year	Papers
2019	1
2010	1
2009	3
2004	2
2003	1
1999	1