TL;DR: The IBM RISC System/6000 processor is a second-generation RISC processor which reduces the execution pipeline penalties caused by branch instructions and also provides high floating-point performance.
Abstract: The IBM RISC System/6000 processor is a second-generation RISC processor which reduces the execution pipeline penalties caused by branch instructions and also provides high floating-point performance. It employs multiple functional units which operate concurrently to maximize the instruction execution rate. By employing these advanced machine-organization techniques, it can execute up to four instructions simultaneously. Approximately 11 MFLOPS are achieved on the LINPACK benchmarks.
TL;DR: The IBM RISC System/6000 processor as discussed by the authors is a second-generation RISC processor which reduces the execution pipeline penalties caused by branch instructions and also provides high floating-point performance.
Abstract: The IBM RISC System/6000 processor is a second-generation RISC processor which reduces the execution pipeline penalties caused by branch instructions and also provides high floating-point performance. It employs multiple functional units which operate concurrently to maximize the instruction execution rate. By employing these advanced machine-organization techniques, it can execute up to four instructions simultaneously. Approximately 11 MFLOPS are achieved on the LINPACK benchmarks.
TL;DR: A survey of the most common scientific benchmarks and includes a comparison of them based on their instruction mixes as measured by the CRAY X-MP hardware performance monitor (hpm).
Abstract: A number of scientific and engineering benchmarks have emerged during the 1980s. Each of these benchmarks has a different origin, methodology and interpretation. This paper is a survey of the most common scientific benchmarks and includes a comparison of them based on their instruction mixes as measured by the CRAY X-MP hardware performance monitor (hpm). We discuss the relevance of these benchmarks to parallel computing as well as comparing their instruction mixes against an academic supercomputer workload.
TL;DR: The i960MM microprocessor is an implementation of performance improvements beyond those implemented in the i960CA microprocessor that includes an implementationof a full-function floating-point unit.
Abstract: Continued research into Intel's i960 architecture has resulted in the development of performance improvements beyond those implemented in the i960CA microprocessor. These improvements allow additional superscalar dispatch opportunities, reduce memory access delays, and enhance the performance of specific instructions. The i960MM microprocessor is an implementation of these performance enhancements. Together, these enhancements can increase the performance of certain applications by 25% to 100%. Additionally, the i960MM includes an implementation of a full-function floating-point unit. Performance of 27 MFLOPs (single precision) and 16 MFLOPS (double precision) was achieved on the Linpack benchmarks at 40 MHz. >
TL;DR: In this article, the authors investigate the influence of environmental factors on the performance of Intel CPUs and propose a methodology for system calibration to ensure fair and repeatable benchmarks. But their results are limited to Intel CPUs.
Abstract: Benchmarking experiments often draw strong conclusions but lack information about the environmental influences like the hardware used to deploy the investigated system. Fairness and repeatability of these benchmarks are at least questionable. Developing for or migrating applications to the cloud or DevOps environments often requires performance testing, either for ensuring quality-of-service or for choosing the correct service parameters when deciding for a cloud offering. While building a benchmarking pipeline for cloud functions, the typical assumption is that a CPU scales the resources linearly to the used utilization. Due to heat generation, noise and other constraints, this is not the case due to the trade off between efficiency and performance. To investigate this trade off and its implications, we set up some experiments in order to evaluate the influence of these factors for benchmark results. We solely focus on Intel CPUs. Beginning with the second generation (Sandy Bridge), Intel uses their own scaling driver intel_pstate. Our results show that different settings for this scaling driver have a significant impact on the measured performance and therefore on the linear regression models we computed using LINPACK benchmarks. These benchmarks are executed at different CPU utilization points. An active intel_pstate scaling driver with enabled turbo boost and powersave governor reached a R2 of 0.7349, whereas the performance governor shows a significantly better, ideal determination coefficient with 0.9999 on a machine used in the benchmarks. Therefore, we propose a methodology for system calibration to ensure fair and repeatable benchmarks.