FIELD OF THE INVENTIONThe present invention relates to computer systems. More specifically, the present invention relates to performing integer operations based on results of floating point operations.[0001]
BACKGROUND OF THE INVENTIONPrior art processors typically perform comparisons of data, including integer data, floating point data and packed data. Such comparison operations are often used when determining whether branching should occur. For example, in a branch if greater than operation, two numbers are compared and a branch is taken if the first number is greater than the second number. Otherwise, the branch is not taken. The most basic comparisons are of two integer numbers.[0002]
In some applications, such as three-dimensional graphics, many numbers are compared to determine the “location” of various objects with respect to each other. In such applications, comparisons are performed more efficiently by operating on packed data. Packed data generally refers to the representation of multiple values by a single number. For example, four eight-bit integer numbers may be represented by a single 32-bit number having four eight-bit segments equivalent to the four eight-bit numbers. Thus, the significance given to various bit placements is altered from standard 32-bit values in order to accurately represent a greater number of smaller values. By performing a compare on the 32-bit packed data, four eight-bit integer compares are accomplished with a single compare operation. Similarly, packed data comparisons may be performed on floating point data.[0003]
Because many prior art processors branch on integer operations and many applications operate on floating point data, what is needed is an improved method and apparatus for performing branch instructions based on integer instructions in response to results of floating point operations.[0004]
SUMMARY OF THE INVENTIONA method and apparatus for performing a move mask operation is described. An operation is performed on floating point data and data is extracted from a result of the operation. The data includes a set of one or more bits where each bit represents multiple redundant bits in the result of the floating point operation. The set of one or more bits is transferred to an integer register and an operation is performed in response to the set of one or more bits.[0005]
BRIEF DESCRIPTION OF THE DRAWINGSThe present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.[0006]
FIG. 1 is one embodiment of a computer system.[0007]
FIG. 2 is one embodiment of an architectural block diagram of a register set and arithmetic circuitry.[0008]
FIG. 3 is one embodiment of a packed data format.[0009]
FIG. 4 is one embodiment of the result of a compare operation performed on two packed data values.[0010]
FIG. 5 is one example of compare, move and branch operations.[0011]
FIG. 6 is one embodiment of a flow diagram for a move mask operation.[0012]
DETAILED DESCRIPTIONA method and apparatus for performing a move mask operation is described. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the present invention.[0013]
The present invention provides a method and apparatus for performing operations on packed data values of a first size and conversion of the results stored in the first size to data of a second size by eliminating redundant data. The present invention is useful, for example, when operations are performed on floating point data that is typically larger (e.g., 64 bits) than integer data (e.g., 32 bits) and integer operations are performed based on the floating point result. Because many processors branch based on integer data, the comparison results stored as floating point data must be transferred to an integer register prior to branching. The present invention takes advantage of redundancy of the floating point comparison results to transfer enough data to convey the comparison result to integer registers with a single instruction.[0014]
FIG. 1 is one embodiment of a computer system. Computer system[0015]100 comprisesbus101 or other device for communicating information, andprocessor102 coupled withbus101 for processing information.Processor102 may be a complex instruction set computer (CISC) processor, a reduced instruction set computer (RISC) computer, a very long instruction word (VLIW) processor, or any other type of processor. In one embodiment,processor102 is a processor in the Pentium® family of processors available from Intel Corporation of Santa Clara, Calif. Of course, other processors may also be used. In one embodiment,processor102 includes one or more register sets for storing integer and/or floating point values.
Computer system[0016]100 further comprises random access memory (RAM) or other dynamic storage device104 (referred to as main memory), coupled tobus101 for storing information and instructions to be executed byprocessor102.Main memory104 also may be used for storing temporary variables or other intermediate information during execution of instructions byprocessor102. Computer system100 also comprises read only memory (ROM) and/or otherstatic storage device106 coupled tobus101 for storing static information and instructions forprocessor102.Data storage device107 is coupled tobus101 for storing information and instructions.
[0017]Data storage device107 such as magnetic disk or optical disc and corresponding drive can be coupled to computer system100. Computer system100 can also be coupled viabus101 to displaydevice121, such as a cathode ray tube (CRT) or liquid crystal display (LCD), for displaying information to a computer user.Alphanumeric input device122, including alphanumeric and other keys, is typically coupled tobus101 for communicating information and command selections toprocessor102. Another type of user input device is cursor control123, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections toprocessor102 and for controlling cursor movement ondisplay121.
In one embodiment, computer system[0018]100 provides graphics functionality.Main memory104 stores sequences of instructions to generate and display graphical or visual displays ondisplay device121.Processor102 executes the sequences of instructions to causedisplay device121 to display the resulting graphical or video image. The sequences of instructions may respond to user input provided viaalphanumeric input device122, cursor control device123, or some other input device (not shown in FIG. 1). Of course, other systems may also provide graphics functionality or may use the present invention for purposes other than graphics, such as numerical analysis or other mathematical applicaitons.
FIG. 2 is one embodiment of an architectural block diagram of a register set and arithmetic circuitry. The components of FIG. 2 may be part of[0019]processor102 of FIG. 1, or may be included in other circuitry of computer system100, either shown or not shown in FIG. 1.
The present invention is described in terms of floating point registers and integer registers. It is important to note that any register architecture may be used with the present invention. Some architectures, for example, provide a predetermined number of integer registers and a predetermined number of floating point registers. Alternatively, an architecture may provide a pool of registers from which registers may be used for either integer or floating point use, such as in a processor that uses a register renaming scheme.[0020]
It is also important to note that what is called a register may be multiple registers treated as a single register. For example, a processor may provide multiple 64-bit registers that may be used as integer registers. Within the same architecture, two 64-bit registers may store the upper 64 bits and the lower 64 bits of a floating point number and be treated as a single 128-bit floating point register. Alternative architectures may also be used.[0021]
In general, the components of FIG. 2 provide floating point computation and integer computation functionality. Floating point registers[0022]200 store floating point data to be used in operations performed by floating pointarithmetic circuitry205.
Integer registers[0023]210 store integer data in registers for use in operations performed byinteger arithmetic circuitry215. Integer registers210 are coupled to floating point registers200 bytransfer circuitry230.Transfer circuitry230 may be any circuitry that transfers data from floating point registers in floating point format to integer registers stored in integer format.
FIG. 3 is one embodiment of a packed data format. The packed data format of FIG. 2 stores four 32-bit numbers (X[0024]0, X1, X2, and X3) as a 128-bit packed data value300. In such an embodiment, bits0-31 represent X0, bits32-63 represent X1, bits64-95 represent X2, and bit96-127 represent X3. In one embodiment, the packed data are stored in floating point registers.
Packed data operations are performed on two 128-bit packed data values in the format of FIG. 2 with each of the 32-bit values being operated on with the corresponding 32-bit value of the corresponding 128-bit packed data value. For example, to AND two packed data values, bits[0025]0-31 of the two packed data values are ANDed together to result in a 32-bit result value. The other three 32-bit values may be ANDed in parallel to perform four 32-bit AND operations in a single 128-bit operation. Of course, other operations may be performed on packed data, such as additions, subtractions, etc.
FIG. 4 is one embodiment of the result of a compare operation performed on two packed data values in the format described above with respect to FIG. 2. In the example of FIG. 4, 128-bit[0026]packed data value400 is compared to 128-bitpacked data value410. The result is 128-bitpacked data value420.
To perform comparison operation on two 128-bit packed data values, each of the four components of the packed data value are compared to each other.[0027]Packed data value400 comprises four values labeled X0, X1, X2, and X3and packeddata value410 comprises four values labeled Y0, Y1, Y2, and Y3. Each value in the respective packed data values is compared to a corresponding value in the other packed data value (e.g., X3and Y3).
Packed data value[0028]420 (Z0, Z1, Z2, and Z3) stores the result of the compare operation. Each value in packeddata value420 stores the result of the compare operation of the corresponding X and Y values. In one embodiment, each value (e.g., Z1, Z1, Z2, and Z3) of packeddata value420 stores either 32 set bits, if the corresponding X value is greater than the Y value, or 32 cleared bits, if the corresponding Y value is greater or equal than the X value. Thus, the result data represented by packed data value stores redundant information. The result information could be stored in four bits, one bit for each of the four 32-bit values stored in the 128-bit result packeddata value420.
In one embodiment, the present invention extracts the most significant bit (MSB), or sign bit from each result value (e.g., Z[0029]0, Z1, Z2, and Z3) stored in packeddata value420 when the result of a comparison is transferred to integer registers. Of course, a bit other than the most significant bit could be extracted to convey similar information. In one embodiment, the low four bits or an integer register represent the result of the packed data compare operation.
The example of FIGS. 4 and 5 are described in terms of a compare operation. It is important to note, however, that the floating point operation that provides a result may be any other floating point operation, whether packed or not.[0030]
FIG. 5 is one example of compare, move and branch operations. In the example of FIG. 5, two floating point numbers stored in floating point registers are compared. The result is stored in a third floating point register. Selected bits from the result register are transferred to an integer register. The data in the integer register is then used to evaluate a branch condition or perform an integer operation.[0031]
The example of FIG. 5 may be useful, for example, when evaluating three-dimensional graphics. Many values may be compared to determine whether two objects overlap, touch, etc. In the following example, four values are compared to four other values as part of a packed data compare operation. Of course, other formats of packed data as well as other floating point operations may also be used. The values stored in floating point registers[0032]200 are described in hexadecimal format, while the values stored in integer registers210 are described in binary format.
In the following example, packed data value in floating[0033]point register500 is compared to packed data value in floatingpoint register510. The result is stored as a packed data value in floatingpoint register520. For example, X3=FF00 and Y3=F300. Thus, X3is greater than Y3. The result (Z3=FFFF) is stored in packeddata value520. Other values are compared in a similar manner such that the result from each of the four comparisons is stored inregister520. In one embodiment, floating point comparisons are performed by floating pointarithmetic circuitry205.
In one embodiment, the most significant bits from each of the result values (e.g., Z[0034]0, Z1, Z2, and Z3) are extracted and transferred, viatransfer circuitry230, to integer register530. Thus, the binary value1100 represents the result of the floating point comparison operation and can be used for integer operations such as branching. In the example of FIG. 5, the binary result value1100 is compared to a conditional binary value1011 stored in integer register540. If the condition is true a branch is taken. Otherwise, the branch is not taken. In one embodiment, integer operations are performed byinteger arithmetic circuitry215.
Performing floating point comparisons in the manner described above is advantageous because the result of the floating point compare is maintained in floating point format and may be used subsequently as a mask for later operations. For example, a logical AND operation my be performed on result packed data value stored in floating[0035]point register520 and the packed data value stored in floatingpoint register500 to generate a packed data value with the values that are greater than the values of the packed data value stored in floating point register510 (e.g., X3, X2, 0, 0).
The value stored in floating
[0036]point register520 may be logically complemented and then logically ANDed with the value stored in floating
point register510 to generate a packed data value with the values that are greater than the greater values stored in floating point register
500 (e.g., 0, 0, Y
1, Y
0). The two result values may be logically ORed to generate a packed data value having the values of the respective values stored in floating point registers
500 and
510 (e.g., X
3, X
2, Y
1, Y
0) Another advantage of the present invention is that branches based on floating point comparisons in processors that support integer branching may be performed more efficiently than would otherwise be possible. For example, assuming that the comparison of floating point values, extraction of bits, and transfer of bits to an integer register is performed by a single instruction (e.g., MOVEMASK), the following instruction sequence may be used to perform a branch based on a floating point comparison:
|
|
| Z = MOVEMASK (X, Y) | // compare fp values X and Y, result is int |
| value Z |
| COMPARE (Z, V) | // compare int values Z and V |
| JUMP GREATER THAN | // jump if Z > V |
|
Thus, the present invention provides a more compact instruction stream, and therefore more efficient code, when multiple comparisons of floating point values are used to determine a branching condition.[0037]
The present has been described with respect to compare and branch instructions. However, extraction of bits and transfer to integer registers may be performed with any floating point number. For example, the present invention may be used to extract sign bits from each value of a packed floating point number. The results may be used for integer operations such as branching or comparisons. Thus, the present invention has a broader application than to only floating point comparisons and integer branches.[0038]
FIG. 6 is one embodiment of a flow diagram for performing a move mask instruction. The process of FIG. 6 is performed on floating point values. In one embodiment, the floating point values are packed floating point values. Alternatively, the floating point values are not packed data values.[0039]
In[0040]step610, a floating point operation is performed on the floating point values. The floating point operation may be, for example, a packed floating point compare, a packed floating point add, a floating point multiply, etc.
In[0041]step620, one or more bits are extracted from a floating point result register. In one embodiment, the most significant bit of each value of a packed floating point value is extracted. Alternatively, a different bit, such as the least significant bit may be extracted. Extracting the most significant bit provides the advantage that the most significant bit provides the sign of the floating point number. Of course, bits from non-packed data may also be extracted.
The extracted bits are placed in a predetermined format in step[0042]630. In one embodiment, the extracted bits are stored in the least significant bits of the integer format. For example, the bit representing Z0(shown in FIG. 4) is stored in the least significant bit of the integer format. The bit representing Z1(shown in FIG. 4) is stored in the next to least significant bit of the integer format, and so on. Of course, alternative integer formats may be used. For example, the extracted bits may be stored in the most significant bits of the integer format.
In[0043]step640, an integer operation is performed based on the extracted bits stored in an integer register. For example, a branch on equal may be performed in response to bits extracted from a floating point operation. Of course, other operations, such as integer compare, integer add, etc. may also be performed on the extracted bits.
Thus, the present invention provides a method and apparatus for performing integer operations based on floating point values without losing the floating point value. This leaves the floating point value for later floating point operations, should subsequent operations be performed. The present invention thereby provides more compact code by transferring information to integer registers for integer operations and by maintaining floating point values for possible subsequent floating point operations.[0044]
In the foregoing specification, the present invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.[0045]