About: Instruction cycle is a research topic. Over the lifetime, 3179 publications have been published within this topic receiving 46167 citations. The topic is also known as: fetch–decode–execute cycle & fetch-execute cycle.
TL;DR: In this paper, a superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput is presented, where the data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
TL;DR: In this article, an instruction issuing mechanism for boosting throughput of processors with multiple functional units is proposed. But, it does not support non-sequential instruction issuing and instructions do not have to be issued according to their order in the instruction stream.
Abstract: An instruction issuing mechanism for boosting throughput of processors with multiple functional units. A Dispatch Stack (DS) and a Precedence Count Memory (PCM) are employed which allow multiple instructions to be issued per machine cycle. Additionally, instructions do no have to be issued according to their order in the instruction stream, so that non-sequential instruction issuance occurs. In this system, multiple instruction issuance and non-sequential instruction issuance policies enhance the throughput of processors with multiple functional units.
TL;DR: Tiny Garble achieves an unprecedented level of compactness and scalability by using a sequential circuit description for GC, and is able to implement functions that have never been reported before, such as SHA-3.
Abstract: We introduce Tiny Garble, a novel automated methodology based on powerful logic synthesis techniques for generating and optimizing compressed Boolean circuits used in secure computation, such as Yao's Garbled Circuit (GC) protocol. Tiny Garble achieves an unprecedented level of compactness and scalability by using a sequential circuit description for GC. We introduce new libraries and transformations, such that our sequential circuits can be optimized and securely evaluated by interfacing with available garbling frameworks. The circuit compactness makes the memory footprint of the garbling operation fit in the processor cache, resulting in fewer cache misses and thereby less CPU cycles. Our proof-of-concept implementation of benchmark functions using Tiny Garble demonstrates a high degree of compactness and scalability. We improve the results of existing automated tools for GC generation by orders of magnitude, for example, Tiny Garble can compress the memory footprint required for 1024-bit multiplication by a factor of 4,172, while decreasing the number of non-XOR gates by 67%. Moreover, with Tiny Garble we are able to implement functions that have never been reported before, such as SHA-3. Finally, our sequential description enables us to design and realize a garbled processor, using the MIPS I instruction set, for private function evaluation. To the best of our knowledge, this is the first scalable emulation of a general purpose processor.
TL;DR: In this article, a super-scaler processor with branch-prediction information is described, where each instruction cache block stored in the instruction cache memory includes branch prediction information fields in addition to instruction fields, which indicate the address of the instruction block's successor and information indicating the location of a branch instruction within an instruction block.
Abstract: A super-scaler processor is disclosed wherein branch-prediction information is provided within an instruction cache memory. Each instruction cache block stored in the instruction cache memory includes branch-prediction information fields in addition to instruction fields, which indicate the address of the instruction block's successor and information indicating the location of a branch instruction within the instruction block. Thus, the next cache block can be easily fetched without waiting on a decoder or execution unit to indicate the proper fetch action to be taken for correctly predicted branching.
TL;DR: This work proposes a method for compressing programs in embedded processors where instruction memory size dominates cost and achieves an average size reduction of 39%, 34%, and 26%, respectively, for SPEC CINT95 programs.
Abstract: Proposes a method for compressing programs in embedded processors where the instruction memory size dominates the cost. A post-compilation analyzer examines a program and replaces common sequences of instructions with a single instruction codeword. A microprocessor executes the compressed instruction sequences by fetching codewords from the instruction memory, expanding them back to the original sequence of instructions in the decode stage, and issuing them to the execution stages. We apply our technique to the PowerPC, ARM and i386 instruction sets and achieve an average size reduction of 39%, 34% and 26%, respectively, for SPEC CINT95 programs.