About: Underclocking is a research topic. Over the lifetime, 732 publications have been published within this topic receiving 14185 citations. The topic is also known as: underclock.
TL;DR: An alternative approach is described, which is called a multiple clock domain (MCD) processor, in which the chip is divided into several clock domains, within which independent voltage and frequency scaling can be performed.
Abstract: As clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a multiple clock domain (MCD) processor, in which the chip is divided into several clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains, corresponding to the front end , integer units, floating point units, and load-store units. We evaluate this design using a simulation infrastructure based on SimpleScalar and Wattch. In an attempt to quantify potential energy savings independent of any particular on-line control strategy, we use off-line analysis of traces from a single-speed run of each of our benchmark applications to identify profitable reconfiguration points for a subsequent dynamic scaling run. Using applications from the MediaBench, Olden, and SPEC2000 benchmark suites, we obtain an average energy-delay product improvement of 20% with MCD compared to a modest 3% savings from voltage scaling a single clock and voltage system.
TL;DR: The clock generation and distribution for the first IA-64 microprocessor achieves a low skew by using distributed programmable deskew units, which compensates for load mismatches and within-die process variations, as well as temperature and voltage gradients.
Abstract: The clock design for the first implementation of the IA-64 microprocessor is presented. A clock distribution with an active distributed deskewing technique is used to achieve a low skew of 28 ps. This technique is capable of compensating skews caused by within-die process variations that are becoming a significant factor of the clock design. The global, regional and local clock distributions are described. A multilevel skew budget and local clock timing methodology are used to enable a high-performance design by providing support for intentional clock skew injection and time borrowing. By providing a test access port interface to the deskew architecture and the incorporation of the on-die-clock-shrink, this design is equipped with two very powerful post-silicon timing debug tools that are critical to high-performance microprocessor design and enabled quick time-to-market.
TL;DR: In this paper, the authors describe an improved high bandwidth chip-to-chip interface for memory devices, which is capable of operating at higher speeds, while maintaining error free data transmission, consuming lower power, and supporting more load.
Abstract: This invention describes an improved high bandwidth chip-to-chip interface for memory devices, which is capable of operating at higher speeds, while maintaining error free data transmission, consuming lower power, and supporting more load. Accordingly, the invention provides a memory subsystem comprising at least two semiconductor devices; a main bus containing a plurality of bus lines for carrying substantially all data and command information needed by the devices, the semiconductor devices including at least one memory device connected in parallel to the bus; the bus lines including respective row command lines and column command lines; a clock generator for coupling to a clock line, the devices including clock inputs for coupling to the clock line; and the devices including programmable delay elements coupled to the clock inputs to delay the clock edges for setting an input data sampling time of the memory device.
TL;DR: In this paper, it is shown that if the instruction is to increase the frequency of the clock signal and the operating voltage in its absolute value, the clock signals having the increased frequency is outputted prior to the increase of the operating voltages in the absolute value.
Abstract: A microcomputer has a clock generator capable of changing the frequency of an output clock signal: and a power circuit capable of changing the level of an operating voltage to be outputted. The frequencies of clock signals and the levels of operating voltages to be individually fed to a plurality of circuit modules can be dynamically changed according to the content of a packaged register. If the content of the register instructs the reduction of the clock signal frequency and the operating voltage in its absolute value, the operating voltage is lowered in its absolute value prior to the change in the clock signal frequency. On the contrary, if the instruction is to increase the frequency of the clock signal and the operating voltage in its absolute value, the clock signal having the increased frequency is outputted prior to the increase of the operating voltage in the absolute value. As a result, it is possible to prevent in advance the malfunctions of the circuit at the time of switching the operation frequency and the operating voltage of the circuit module.
TL;DR: Core and I/O clock design for the Pentium(R) 4 microprocessor is described and Silicon speed path tools and clock debug features are designed to enable a short debug cycle.
Abstract: Core and I/O clock design for the Pentium(R) 4 microprocessor is described. Two phase-locked loops generate core and I/O clocks supporting concurrent multiple frequencies. A clock distribution network with skew optimization and jitter reduction is designed to achieve low clock inaccuracies for processors at frequencies /spl ges/2 GHz for the core and /spl ges/4 GHz for the rapid execution engine. A global medium clock frequency is distributed. Local clock drivers generate pulsed or regular (nonpulsed) clocks at fast, medium, and slow frequencies. A 3.2-GB/s system bus is achieved using a dedicated I/O phase-locked loop with glitch protection and detection. Silicon speed path tools and clock debug features are designed to enable a short debug cycle.