About: Pentium is a research topic. Over the lifetime, 1213 publications have been published within this topic receiving 35560 citations. The topic is also known as: Intel Pentium & 80586.
TL;DR: DMDX is a Windows-based program designed primarily for language-processing experiments that uses the features of Pentium class CPUs and the library routines provided in DirectX to provide accurate timing and synchronization of visual and audio output.
Abstract: DMDX is a Windows-based program designed primarily for language-processing experiments. It uses the features of Pentium class CPUs and the library routines provided in DirectX to provide accurate timing and synchronization of visual and audio output. A brief overview of the design of the program is provided, together with the results of tests of the accuracy of timing. The Web site for downloading the software is given, but the source code is not available.
TL;DR: The main features and functions of the NetBurst microarchitecture of Intel’s new flagship Pentium 4 processor are described, including its new form of instruction cache called the Execution Trace Cache.
Abstract: This paper describes the Intel NetBurstTM microarchitecture of Intel’s new flagship Pentium 4 processor. This microarchitecture is the basis of a new family of processors from Intel starting with the Pentium 4 processor. The Pentium 4 processor provides a substantial performance gain for many key application areas where the end user can truly appreciate the difference. In this paper we describe the main features and functions of the NetBurst microarchitecture. We present the frontend of the machine, including its new form of instruction cache called the Execution Trace Cache. We also describe the out-of-order execution engine, including the extremely low latency double-pumped Arithmetic Logic Unit (ALU) that runs at 3GHz. We also discuss the memory subsystem, including the very low latency Level 1 data cache that is accessed in just two clock cycles. We then touch on some of the key features that allow the Pentium 4 processor to have outstanding floating-point and multi-media performance. We provide some key performance numbers for this processor, comparing it to the Pentium III processor.
TL;DR: In this article, a voltage regulator module (VRM) is proposed for future generation microprocessors with high power densities, high efficiencies, and good transient performance, and the design, simulation and experimental results are presented.
Abstract: By reducing the power supply voltage, faster, lower power consumption, and high integration density data processing systems can be achieved. The current generation high-speed complementary metal-oxide-semiconductor (CMOS) processors (e.g., Alpha, Pentium, Power PC) are operating at above 300 MHz with 2.5 to 3.3 V output range. Future processors will be designed in the 1.1-1.8 V range, to further enhance their speed-power performance. These new generation microprocessors will present very dynamic loads with high current slew rates during transient. As a result, they will require a special power supply, voltage regulator module (VRM), to provide well-regulated voltage. The VRMs should have high power densities, high efficiencies, and good transient performance. In this paper, the critical technical issues to achieve this target for future generation microprocessors are addressed. A VRM candidate topology, interleaved quasisquare-wave (QSW), is proposed. The design, simulation and experimental results are presented.
TL;DR: This paper describes a technique for a coordinated measurement approach that combines real total power measurement with performance-counter-based, per-unit power estimation and provides power breakdowns for 22 of the major CPUsubunits over minutes of SPEC2000 and desktop workloadexecution.
Abstract: With power dissipation becoming an increasingly vexing problem across many classes of computer systems, measuring power dissipation of real, running systems has become crucial for hardware and software system research and design. Live power measurements are imperative for studies requiring execution times too long for simulation, such as thermal analysis. Furthermore, as processors become more complex and include a host of aggressive dynamic power management techniques, per-component estimates of power dissipation have become both more challenging as well as more important. In this paper we describe our technique for a coordinated measurement approach that combines real total power measurement with performance-counter-based, per-unit power estimation. The resulting tool offers live total power measurements for Intel Pentium 4 processors, and also provides power breakdowns for 22 of the major CPU subunits over minutes of SPEC2000 and desktop workload execution. As an example application, we use the generated component power breakdowns to identify program power phase behaviour. Overall, this paper demonstrates a processor power measurement and estimation methodology and also gives experiences and empirical application results that can provide a basis for future power-aware research.
TL;DR: This paper proposes runahead execution as an effective way to increase memory latency tolerance in an out-of-order processor without requiring an unreasonably large instruction window.
Abstract: Today's high performance processors tolerate long latency operations by means of out-of-order execution. However, as latencies increase, the size of the instruction window must increase even faster if we are to continue to tolerate these latencies. We have already reached the point where the size of an instruction window that can handle these latencies is prohibitively large in terms of both design complexity and power consumption. And, the problem is getting worse. This paper proposes runahead execution as an effective way to increase memory latency tolerance in an out-of-order processor without requiring an unreasonably large instruction window. Runahead execution unblocks the instruction window blocked by long latency operations allowing the processor to execute far ahead in the program path. This results in data being prefetched into caches long before it is needed. On a machine model based on the Intel/spl reg/ Pentium/spl reg/ processor, having a 128-entry instruction window, adding runahead execution improves the IPC (instructions per cycle) by 22% across a wide range of memory intensive applications. Also, for the same machine model, runahead execution combined with a 128-entry window performs within 1% of a machine with no runahead execution and a 384-entry instruction window.