Patent
Parallel processing unit which processes branch instructions without decreased performance when a branch is taken
Hideki Ando
- 30 Jan 1996
36
TL;DR: In this article, a branch instruction is prefetched from an instruction memory into a queue memory and a flag indicating that an associated instruction is executed according to a prediction of a branch.
read more
Abstract: A parallel processing unit operable in a delayed branch method has a branch-delay slot filled with instructions to be executed when a branch by a branch instruction is taken. The instructions in the branch-delay slot are those fetched in a period from fetching of the branch instruction till the execution of the branch instruction. Instructions are prefetched from an instruction memory into a queue memory. The queue memory includes a plurality of blocks of storage units. Instructions in the same block as a branch instruction and subsequent to the branch instruction, and instructions in the block adjacent to the block including the branch instruction provide the branch delay slot for the branch instruction. A parallel processing unit operable in a predicted branch method includes a queue memory including a plurality of entries, each of which includes an instruction and a flag indicating that an associated instruction is executed according to a prediction of a branch. This flag is utilized to control execution and non execution of an associated instruction.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Patent
Alignment and ordering of vector elements for single instruction multiple data processing
Timothy J. Van Hook,Peter Yan-Tek Hsu,William A. Huffman,Henry Packard Moreton,Earl A. Killian +4 more
- 06 Feb 2007
TL;DR: In this paper, the alignment and ordering of vector elements for SIMD processing is described, and a starting byte specifying the first byte of an aligned vector is determined, and then a vector is extracted from the first register and the second register, and replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.
251
Patent
Method for providing extended precision in SIMD vector arithmetic operations
Timothy J. Van Hook,Peter Yan-Tek Hsu,William A. Huffman,Henry Packard Moreton,Earl A. Killian +4 more
- 30 Dec 1998
TL;DR: In this article, an extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator is provided. But the present invention is limited to a single-core processor.
105
Patent
Processor configured to selectively cancel instructions from its pipeline responsive to a predicted-taken short forward branch instruction
David B. Witt,William M. Johnson +1 more
- 06 Jul 1998
TL;DR: In this paper, a branch instruction is predicted to be taken, and the processor allows sequential fetching to continue and selectively cancels the sequential instructions which are not part of the predicted instruction sequence (i.e. the instructions between the predicted taken branch instruction and the target instruction identified by the forward branch target address).
74
Patent
Method and apparatus for performing predicate hazard detection
Judge K. Arora
- 30 Dec 1998
TL;DR: In this paper, the problem of hazard detection using status and mask vectors is addressed using a predicate status vector and a mask vector, respectively, and the predicate mask vector is used to determine whether a predicate is needed or not.
66
Patent
Extended-precision accumulation of multiplier output
Morten Stribaek,Pascal Paillier +1 more
- 21 Feb 2001
TL;DR: In this article, microprocessor instructions are provided for manipulating portions of the extended precision accumulator including an instruction to move the contents of a portion of an extended accumulator to a general-purpose register (MFLHXU) and an instruction that moves the contents from a general purpose register to a portion (MTLHX) of the accumulator.
55
References
•Book
Computer Architecture and Parallel Processing
Kai Hwang,Faye A. Briggs +1 more
- 01 Jan 1984
TL;DR: The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.
1.4K
Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers
TL;DR: In this paper, the problems of data dependency resolution and precise interrupt implementation in pipelined processors are combined and a design for a hardware mechanism that resolves dependencies dynamically and, at the same time, guarantees precise interrupts is presented.
Patent
Pipelined data processor capable of decoding and executing plural instructions in parallel
Kazunori Kuriyama,Y. Shintani,Akira Yamaoka,Tohru Shonai,Eiki Kamada,Kiyoshi Inoue +5 more
- 24 Mar 1987
TL;DR: A pipelined data processor comprises a circuit for extracting two instructions into a pair of instruction registers (l, 2), a circuit (6) for detecting whether those instructions are a combination of an instruction requesting a use of an operation unit and a instruction requesting the use of other resource as discussed by the authors.
289
Patent
System for reducing delay for execution subsequent to correctly predicted branch instruction using fetch information stored with each block of instructions in cache
William M. Johnson
- 06 Jun 1989
TL;DR: In this article, a super-scaler processor with branch-prediction information is described, where each instruction cache block stored in the instruction cache memory includes branch prediction information fields in addition to instruction fields, which indicate the address of the instruction block's successor and information indicating the location of a branch instruction within an instruction block.
254
Patent
Hierarchical priority branch handling for parallel execution in a parallel processor
Robert P. Colwell,John O'Donnell,David B. Papworth,Paul Rodman +3 more
- 20 Apr 1987
TL;DR: In this paper, a hierarchical priority system is used to determine whether a branch test condition associated with a branch instruction is true, and independently, the target address for each branch instruction and a fall-through instruction address are determined.
187