TL;DR: In this article, a method for a parallel shift right merge of data is proposed, where a first operand having a first set of L data elements is shifted left by ''L - M'' data elements.
Abstract: A method for a parallel shift right merge of data. The method of one embodiment comprises receiving a shift count of M. A first operand having a first set of L data elements is shifted left by `L - M' data elements. A second operand having a second set of L data elements is shifted right by M data elements. The shifted first set is merged with the shifted second set to generate a resultant having L data elements.
TL;DR: In this article, a floating point normalization circuit and method are used to generate coded multibit output in the form of a mantissa equal to the minimum mantissa that can be normalized for the input exponent part of the floating point number.
Abstract: A floating point normalization circuit and method decodes the exponent generating a coded multibit output corresponding to the maximum decrease in the exponent within the minimum expressible exponent. This coded multibit output is bit-wise ORed with the mantissa. A left most one circuit detects the bit position of the most significant bit of the logical OR output having a "1". The mantissa and exponent are them normalized according to this number. The mantissa is left shifted an amount equal to this detected bit position of a most significant bit having a "1". The exponent is decremented an amount equal to this detected bit position of a most significant bit having a "1". The exponent decoder generates said coded multibit output in the form of a mantissa equal to the minimum mantissa that can be normalized for the input exponent part of the floating point number. This minimum mantissa is equal to 2.sup.(M+N), where M is the minimum expressible exponent and N is the exponent. In the preferred embodiment, the exponent decoder includes a two to four line decoder for each pair of bits of the exponent part of the floating point number, and an AND gate connected to selected outputs of said two to four line decoders for each bit of said mantissa.
TL;DR: Design alternatives for data-reversal barrel shifters that perform the following functions: shift right logical, shift right arithmetic, rotate right, shift left logical,shift left arithmetic, and rotate left are examined.
Abstract: Barrel shifters are often utilized by embedded digital signal processors and general-purpose processors to manipulate data. This paper examines design alternatives for barrel shifters that perform the following functions: shift right logical, shift right arithmetic, rotate right, shift left logical, shift left arithmetic, and rotate left. Four different barrel shifter designs are presented and compared in terms of area and delay for a variety of operand sizes. This paper also examines techniques for detecting results that overflow and results of zero in parallel with the shift or rotate operation. Several Java programs are developed to generate structural VHDL models for each of the barrel shifters. Synthesis results show that data-reversal barrel shifters have less area and mask-based data-reversal barrel shifters have less delay than other designs. Mask-based data-reversal barrel shifters are especially attractive when overflow and zero detection is also required, since the detection is performed in parallel with the shift or rotate operation.
TL;DR: In this paper, a VLIW microprocessor capable of executing two or more instructions having data dependency in a single cycle is presented, where at least two of the execution units are connected such that the output of a first one of two execution units is connected to the input of a second one of the two operation units, such that both execution units can execute in said single cycle.
Abstract: A VLIW microprocessor capable of executing two or more instructions having data dependency in a single cycle. The microprocessor includes an instruction fetch and decode unit, a register file, and a plurality of execution units communicating with the instruction fetch and decode unit and with the register file. At least two of the execution units are connected such that the output of a first one of the two execution units is connected to the input of a second one of the two execution units, such that the output of the first execution unit is available as an input to the second execution unit during said single cycle, and such that both execution units can execute in said single cycle. In an exemplary embodiment, the first execution unit is a shift left unit, and the second execution unit is a shift right unit. With this embodiment, a complete extract operation can be performed in a single cycle.
TL;DR: In this article, a multiplier circuit within a CPU has its selections of partial products reordered in a unique manner so that shift left capabilities are eliminated and the hardware is required to only perform shift right operations.
Abstract: A multiplier circuit within a CPU has its selections of partial products reordered in a unique manner so that shift left capabilities are eliminated and the hardware is required to only perform shift right operations. This allows for reduced circuit sizes in several components within the multiplier circuit in order to save area, speed computation time, and reduce power consumption on the chip.