TL;DR: In this paper, a pipelined method for executing instructions in a computer system is presented, which includes providing multiple instructions as a continuous stream of operations, provided in program order, and provided for executing the instructions in an out-of-order pipeline.
Abstract: A pipelined method for executing instructions in a computer system. The present invention includes providing multiple instructions as a continuous stream of operations. This stream of operations is provided in program order. In one embodiment, the stream of operations is provided by performing an instruction cache memory lookup to fetch the multiple instructions, performing instruction length decoding on the instructions, rotating the instructions, and decoding the instructions. The present invention also performs register renaming, allocates resources and sends a portion of each of the operations to a buffering mechanism (e.g., a reservation station). The instruction cache memory lookup, instruction length decoding, rotation and decoding of the instructions, as well as the register renaming, are performed in consecutive pipestages. The present invention provides for executing the instructions in an out-of-order pipeline. The execution produces results. In one embodiment, the instructions are executed by determining the data readiness of each of the operations and scheduling data ready operations. These scheduled data ready operations are dispatched to an execution unit and executed. The results are written back for use by other operations or as data output or indication. The determination of execution readiness, the dispatching and the execution, and writeback, are performed in consecutive pipestages. The present invention also provides for retiring each of the continuous stream of operations in such a manner as to commit their results to architectural state and to reestablish sequential program order.
TL;DR: In this article, the superscalar microprocessor is presented, which includes an integer functional unit and a floating-point functional unit that share a high performance main data processing bus.
Abstract: A superscalar microprocessor is provided which includes a integer functional unit and a floating point functional unit that share a high performance main data processing bus. The integer unit and the floating point unit also share a common reorder buffer, register file, branch prediction unit and load/store unit which all reside on the same main data processing bus. Instruction and data caches are coupled to a main memory via an internal address data bus which handles communications therebetween. An instruction decoder is coupled to the instruction cache and is capable of decoding multiple instructions per microprocessor cycle. Instructions are dispatched from the decoder in speculative order, issued out-of-order and completed out-of-order. Instructions are retired from the reorder buffer to the register file in-order. The functional units of the microprocessor desirably accommodate operands exhibiting multiple data widths. High performance and efficient use of the microprocessor die size are achieved by the sharing architecture of the disclosed superscalar microprocessor.
TL;DR: A superscalar processor (200) includes a scheduler (280) which selects operations for out-of-order execution as mentioned in this paper, which is a reorder buffer keeping the results of operations until the results are committed.
Abstract: A superscalar processor (200) includes a scheduler (280) which selects operations for out-of-order execution. The scheduler (280) contains storage and control logic which is partitioned into entries (540) corresponding to operations. The scheduler (280) uses the entries to issue operations to execution units (251 to 257) for parallel pipelined execution, to provide operands as required for execution, and as a reorder buffer keeping the results of operations until the results are committed. The scheduler (280) is tightly coupled to execution units (251 to 257) and provides a wide parallel path which minimizes pipeline bottlenecks and hold ups into and out of the execution units (251 to 257). The scheduler (280) monitors entries to determine when all operands required for execution of an operation are available and provides required operands to the execution units (251 to 257). The operands can be from a register file (290), a scheduler entry, or an execution unit (251 to 257). Scan chains (530, 532, 534, 536 and 538) link the entries together and identify operations and operands for execution.
TL;DR: The Distributed Instruction Queue (DIQ) and Modified Reorder Buffer (MRB) as discussed by the authors are two superscalar microprocessors that support multi-instruction issue, decoupled dataflow scheduling, out-of-order execution, register renaming, multi-level speculative execution and precise interrupts.
Abstract: The invention involves new microarchitecture apparatus and methods for superscalar microprocessors that support multi-instruction issue, decoupled dataflow scheduling, out-of-order execution, register renaming, multi-level speculative execution, and precise interrupts. These are the Distributed Instruction Queue (DIQ) and the Modified Reorder Buffer (MRB). The DIQ is a new distributed instruction shelving technique that is an alternative to the reservation station (RS) technique and offers a more efficient (improved performance/cost) implementation. The Modified Reorder Buffer (MRB) is an improved reorder buffer (RB) result shelving technique eliminates the slow and expensive prioritized associative lookup, shared global buses, and dummy branch entries (to reduce entry usage). The MRB has an associateive key unit which uses a unique associative key.
TL;DR: In this article, a dependency table stores a reorder buffer tag for each register, which corresponds to the last of the instructions within the Reorder buffer (in program order) to update the register.
Abstract: A dependency table stores a reorder buffer tag for each register. The stored reorder buffer tag corresponds to the last of the instructions within the reorder buffer (in program order) to update the register. Otherwise, the dependency table indicates that the value stored in the register is valid. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. Information from the comparators and the information stored in the dependency table is sufficient to select which value is forwarded. Additionally, the dependency table stores the width of the register being updated. Prior to forwarding the reorder buffer tag stored within the dependency table, the width stored therein is compared to the width of the source operand being requested. If a narrow-to-wide dependency is detected the instruction is stalled until the instruction indicated in the dependency table retires. Still further, the dependency table recovers from branch mispredictions and exceptions by redispatching the instructions into the dependency table.