TL;DR: MMX technology extends the Intel architecture to improve the performance of multimedia, communications, and other numeric-intensive applications by introducing data types and instructions to the IA that exploit the parallelism in these applications.
Abstract: Designed to accelerate multimedia and communications software, MMX technology improves performance by introducing data types and instructions to the IA that exploit the parallelism in these applications. MMX technology extends the Intel architecture (IA) to improve the performance of multimedia, communications, and other numeric-intensive applications. It uses a SIMD (single-instruction, multiple-data) technique to exploit the parallelism inherent in many algorithms, producing full application performance of 1.5 to 2 times faster than the same applications run on the same processor without MMX. The extension also maintains full compatibility with existing IA microprocessors, operating systems, and applications while providing new instructions and data types that applications can use to achieve a higher level of performance on the host CPU.
TL;DR: PowerPC's AltiVec speeds not only media processing but also nearly any application in which data parallelism exists, as demonstrated by a cycle-accurate simulation of Motorola's MPC 7400, the heart of Apple G4 systems.
Abstract: There is a clear trend in personal computing toward multimedia-rich applications. These applications will incorporate a wide variety of multimedia technologies, including audio and video compression, 2D image processing, 3D graphics, speech and handwriting recognition, media mining, and narrow/broadband signal processing for communication. In response to this demand, major microprocessor vendors have announced architectural extensions to their general-purpose processors in an effort to improve their multimedia performance. Intel extended IA-32 with MMX and SSE (alias KNI), Sun enhanced Sparc with VIS, Hewlett-Packard added MAX to its PA-RISC architecture, Silicon Graphics extended the MIPS architecture with MDMX, and Digital (now Compaq) added MVI to Alpha. This article describes the most recent, and what we believe to be the most comprehensive, addition to this list: PowerPC's AltiVec, AltiVec speeds not only media processing but also nearly any application in which data parallelism exists, as demonstrated by a cycle-accurate simulation of Motorola's MPC 7400, the heart of Apple G4 systems.
TL;DR: The first hardware implementation of Intel TSX is evaluated using a set of high-performance computing (HPC) workloads, and it is demonstrated that applying IntelTSX to these workloads can provide significant performance improvements.
Abstract: Intel has recently introduced Intel® Transactional Synchronization Extensions (Intel® TSX) in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically determine whether threads need to serialize through lock-protected critical sections. In this paper, we evaluate the first hardware implementation of Intel TSX using a set of high-performance computing (HPC) workloads, and demonstrate that applying Intel TSX to these workloads can provide significant performance improvements. On a set of real-world HPC workloads, applying Intel TSX provides an average speedup of 1.41x. When applied to a parallel user-level TCP/IP stack, Intel TSX provides 1.31x average bandwidth improvement on network intensive applications. We also demonstrate the ease with which we were able to apply Intel TSX to the various workloads.
TL;DR: IA-32 EL as mentioned in this paper is a software that will ship with Itanium-based operating systems and will convert IA-32 instructions into Itanium instructions via dynamic translation via two-phase translation.
Abstract: IA-32 execution layer (IA-32 EL) is a new technology that executes IA-32 applications on Intel Itanium processor family systems. Currently, support for IA-32 applications on Itanium-based platforms is achieved using hardware circuitry on the Itanium processors. This capability will be enhanced with IA-32 EL - software that will ship with Itanium-based operating systems and will convert IA-32 instructions into Itanium instructions via dynamic translation. In this paper, we describe aspects of the IA-32 execution layer technology, including the general two-phase translation architecture and the usage of a single translator for multiple operating systems. The paper provides details of some of the technical challenges such as precise exception, emulation of FP, MMX, and Intel streaming SIMD extension instructions, and misalignment handling. Finally, the paper presents some performance results.
TL;DR: In this article, an overview of the multimedia instructions that have been added to the instruction set architectures of general-purpose microprocessors to accelerate media processing is given, including MAX, MMX and VIS, the multimedia extensions for PA-RISC, ix86 and SPARC processor architectures.
Abstract: This paper gives an overview of the multimedia instructions that have been added to the instruction set architectures of general-purpose microprocessors to accelerate media processing. Examples are MAX, MMX and VIS, the multimedia extensions for PA-RISC, ix86, and SPARC processor architectures. We describe subword parallelism, a low overhead form of SIMD parallelism, and the classes of instructions needed to support subword parallel computations efficiently. Features described include arithmetic operations with saturation, averaging, multiply alternatives, data rearrangement primitives like Permute and Mix, formatting instructions, conditional execution, and complex instructions.