TL;DR: In this article, an intelligent DSP function decoder or preprocessor examines X8 opcode sequences and determines if DSP functions are being executed, and then converts the opcodes to a DSP macro instruction that is provided to the DSP.
Abstract: A CPU or microprocessor which includes a general purpose CPU, such as an X86 core, and a DSP. The CPU also includes an intelligent DSP function decoder or preprocessor which examines X8 opcode sequences and determines if a DSP function is being executed. The function preprocessor includes a look-up table which stores instruction sequences which implement DSP functions. Each pattern in the look-up table is compared with an instruction sequence to determine if one of the patterns substantially matches the instruction sequence. If the DSP function preprocessor determines that a DSP function is being executed, the DSP function preprocessor converts the opcodes to a DSP macro instruction that is provided to the DSP. The DSP executes one or more DSP instructions to implement the desired DSP function in response to the macro instruction. If the X86 opcodes in the instruction cache or instruction memory do not indicate or are not intended to perform a DSP-type function, the opcodes are provided to the X86 core. Thus, the DSP offloads these mathematical functions from the X86 core, thereby increasing system performance. The DSP operates in parallel with the X86 core, providing further performance benefits. The CPU of the present invention thus implements DSP functions more efficiently than X86 logic while requiring no additional X86 opcodes. The present invention also generates code that operates transparently on an X86 only CPU or a CPU according to the present invention which includes X86 and DSPs. Thus the present invention is backwards compatible with existing software.
TL;DR: In this article, a method and apparatus are disclosed for staggering execution of an instruction, where a single macro instruction is received, and an operation specified by the single instruction is then performed independently on a first and second plurality of the corresponding data elements from said first-and second-packaged data operands at different times using the same circuit to independently generate a first/second plurality of resulting data elements.
Abstract: A method and apparatus are disclosed for staggering execution of an instruction. According to one embodiment of the invention, a single macro instruction is received wherein the single macro instruction specifies at least two logical registers and wherein the two logical registers respectively store a first and second packed data operands having corresponding data elements. An operation specified by the single macro instruction is then performed independently on a first and second plurality of the corresponding data elements from said first and second packed data operands at different times using the same circuit to independently generate a first and second plurality of resulting data elements. The first and second plurality of resulting data elements are stored in a single logical register as a third packed data operand.
TL;DR: In this article, an intelligent DSP function decoder or preprocessor examines X86 opcode sequences and determines if DSP functions are being executed, and then converts or maps the opcodes to a DSP macro instruction that is provided to the DSP.
Abstract: A CPU or microprocessor which includes a general purpose CPU, such as an X86 core, and a DSP. The CPU also includes an intelligent DSP function decoder or preprocessor which examines X86 opcode sequences and determines if a DSP function is being executed. The function preprocessor includes a pattern recognition detector which stores instruction sequences which implement DSP functions. The pattern recognition detector compares each pattern with an instruction sequence and determines if one of the patterns substantially matches the instruction sequence. If the DSP function preprocessor determines that a DSP function is being executed, the preprocessor converts or maps the opcodes to a DSP macro instruction that is provided to the DSP. The DSP executes one or more DSP instructions to implement the desired DSP function in response to the macro instruction. If the X86 opcodes in the instruction cache or instruction memory do not indicate or are not intended to perform a DSP-type function, the opcodes are provided to the X86 core as which occurs in current prior art computer systems. Thus, the DSP offloads these mathematical functions from the X86 core, thereby increasing system performance. The CPU of the present invention thus implements DSP functions more efficiently than X86 logic while requiring no additional X86 opcodes. The present invention also generates code that operates transparently on an X86 only CPU or a CPU according to the present invention which includes X86 and DSPs. Thus the present invention is backwards compatible with existing software.
TL;DR: In this paper, an apparatus and method for executing a combined compare-and-branch operation in a single integer pipeline microprocessor is presented. But it does not specify how to combine the compare macro instruction and the conditional jump macro instruction.
Abstract: An apparatus and method are provided for executing a combined compare-and-branch operation in a single integer pipeline microprocessor. Typically, the compare-and-branch operation is specified by two macro instructions. The first macro instruction, a compare macro instruction, directs the microprocessor to compare two operands, resulting in the update of a flags register to describe various attributes of the comparison result. The second macro instruction, a conditional jump macro instruction, directs the microprocessor to examine the flags register and to branch program control to a target address if a prescribed condition is met. The apparatus has translation logic that combines the compare macro instruction and the conditional jump macro instruction into a single compare-and-branch micro instruction. The single compare-and-branch micro instruction directs the microprocessor to make the comparison and to perform a conditional branch based upon a result of the comparison. The apparatus also has execute logic that is coupled to the translation logic. The execute logic makes the comparison and generates the result. Jump resolution logic in a stage following the execute logic accesses the flags register to resolve the conditional jump operation.
TL;DR: In this paper, the bit length of an instruction having two operands is modified based on size information stored in these fields, which avoids the need for modification of the bit lengths of an operand by use of a macro instruction at the time of execution of an operation based on the two operators.
Abstract: An instruction having two operands includes a field specifying the bit length of a source operand and a field specifying the bit length of data to be operated upon by the execution unit. Based on size information stored in these fields, the operand bit length is modified, which avoids need for modification of the bit length of an operand by use of a macro instruction at the time of execution of an operation based on the two operands. Consequently, the program execution speed can be improved.