This application is a continuation of prior application Ser. No. 09/475,927 filed on Dec. 30, 1999 now U.S. Pat. No. 6,691,308.
BACKGROUND OF THE INVENTION1. Field of the Invention
This invention relates generally to a processors and, more specifically, to a method and apparatus which provides run-time correction for microcode, code enhancement, and/or interrupt vector reassignment.
2. Description of the Prior Art
For integrated circuits which are driven by microcode which is embedded in internal memory, many times it is necessary to either have the instruction content of the embedded memory device or the behavior of the Central Processing Unit (CPU) pipeline itself corrected or debugged in the field. This may require on-the-fly modifications driven by customer request or updates due to evolution of industry protocol standards. However, this creates problems since it is difficult to correct and/or debug these types of circuits. Debugging and/or changing the embedded microcode is a time consuming effort which generally requires messy CRC changes or related checksum modifications.
Therefore, a need existed to provide a circuit by which either the instruction content of the internal memory and/or the behavior of the CPU pipeline itself could be corrected and/or debugged in the field. The debug circuit should consume only a small amount of silicon real estate, be inexpensive to implement and allow changes at a faster rate then current techniques. The debug circuit must also provide a means by which the debug circuit could download data to debug the device. Data could be downloaded by the host system or managed via a simplistic communication scheme as described in the ST52T3 data book written by STMicroelectronics, Inc.
SUMMARY OF THE INVENTIONIt is object of the present invention to provide a circuit by which either the instruction content of an internal memory and/or the behavior of the CPU pipeline itself could be corrected and/or debugged in the field.
It is another object of the present invention to provide a debug circuit which consumes only a small amount of silicon real estate, is inexpensive to implement and allow changes at a faster rate then current techniques.
It is still a further object of the present invention to provide a debug circuit which provides a means by which the debug circuit could download data to debug the device.
BRIEF DESCRIPTION OF THE PREFERRED EMBODIMENTSIn accordance with one embodiment of the present invention, a hot patch system for changing of code in a processor is disclosed. The hot patch system has a memory, such as a Read Only Memory (ROM), for storing a plurality of instructions. A program counter is coupled to the memory for indexing of the memory to access an instruction. A cache system is coupled to the memory and to the program counter. The cache system is used for comparing information associated with the instruction from memory with information stored in the cache system. If there is a comparison match, the cache system alters the instruction stream as designated by information stored in the cache system. If no match occurs, the cache system sends the instruction from memory into the instruction stream.
In accordance with another embodiment of the present invention, a method of altering the code of a pipeline processor is disclosed. The method requires that a plurality of instructions be stored in memory. A cache is provided and information is stored in the cache. The memory is indexed to access one of the instructions stored in memory. Information associated with the instruction from memory is compared with information stored in the cache. If a comparison match is made, the instruction stream is altered as designated by the information stored in the cache. If no comparison match is made, the instruction from memory is inserted into the instruction stream.
The foregoing and other objects, features, and advantages of the invention will be apparent from the following, more particular, description of the preferred embodiments of the invention, as illustrated in the accompanying drawing.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is a simplified block diagram of one embodiment of the hot patch circuit of the present invention.
FIG. 2 is a simplified block diagram of a second embodiment of the hot patch circuit of the present invention.
FIG. 3 shows one example of the different fields associated with the cache used in the present invention.
FIG. 4 shows one example of the control codes used in the control flag field ofFIG. 3.
FIG. 5 shows one example of the bit configuration of the cache control register used in the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTSReferring toFIG. 1, one embodiment of a hot patch circuit10 (hereinafter circuit10) is shown. Thecircuit10 provides a means whereby the instruction content of an embedded memory device or the behavior of the Central Processing Unit (CPU) may be corrected, modified and/or debugged. Thecircuit10 is preferably used in a pipeline CPU.
Thecircuit10 has aprogram counter12. Theprogram counter12 is coupled to amemory device14 and to aregister18. Theprogram counter12 is used to generate an address of an instruction to be executed. When the address is generated, theprogram counter12 will index thememory unit14. Thememory unit14 stores instructions which are to be executed by the CPU. Thememory unit14 is a nonvolatile memory device like a Read Only Memory (ROM) device. Once the program counter12 access the instruction which is stored in thememory unit14, the instruction is sent to amultiplexer16.
Theregister18 is coupled to thememory unit14, theprogram counter12, and to acache unit20. Theregister18 is used to store data which will be compared to data which is stored in thecache unit20. Theregister18 may either store the address sent from theprogram counter12 or the instruction which the program counter12 access from thememory unit14.
As may be seen inFIG. 4, thecache unit20 is comprised of a plurality ofcache lines30. Eachcache line30 is comprised of at least three different fields: acontrol flag field30A, an op-code field30B which stores the new instruction to be inserted into the instruction stream, and an address/op-code field30C which stores the data which is to be compared to the data stored in theregister18. The size of the fields will vary based on the implementation of thecircuit10. In accordance with one embodiment of the present invention, the width of thecache line30 would be able to accommodate at least a 32-bit op-code (field30B) along with a 10 to 32-bit address/op-code (field30C) and a 2 to 8-bit control flag field (field30A). This would yield a cache line width between 44 to 72-bits. However, it should be noted that these field lengths are only given as one example and should not be seen as limiting the scope of the present invention. As stated above, bit field dimensions will vary depending on the size of thememory unit14.
Thecontrol flag field30A is used to dictate both the semantic content and the execution behavior of individual or multiple cache lines30. The number of control flags is dependent upon the allocated field size. In some cases, combination of control flags may be useful. Control flags may be used to either delete or enable cache unit entries, or provide alternate semantic information regarding the register content. Referring toFIG. 3, some examples of the control flag code is shown. The valid flag “V” indicates whether the entry in thecache unit20 is valid. The “A” and “O” flags indicate whether the information to be compared is an address or an op-code. The global flag “G” allows for greater than a 1:1 mapping. For example, if the address flag “A” is set, one would only be comparing the one particular address in thememory unit14. Thus, there is only a 1:1 mapping. However, if the op-code “O” and global “G” flags are set, one would be able to replace every occurrence of a particular instruction that is accessed from thememory unit14. Thus, the global flag “G” allows one to make better use of the space in thecache unit20. The insert “I”, match “M”, block assignment “B”, and delete “X” flags are used by thecache control logic22 to control access to the instruction stream. The “I” flag implies that the associated op-code in the cache is to be inserted into the instruction stream. The “M” flag indicates that when the contents of theregister18 matches that in thecache unit20, the cache unit instruction is to replace the instruction from thememory unit14 in the instruction stream. The “B” flag allows for more than one instruction (i.e., a block of instructions) that is stored in thecache unit20 is to be clocked into the instruction stream. The “X” indicates that the relevant instruction is to be ignored or deleted (i.e., no operation (NOP)). The “E”, “H”, “L”, and “Q” flags are pipeline control flags. The “E” flags indicates that if there is a match to jump to external memory using the address in the “opcode field” and to execute the instructions in external memory starting at that location. The “H” flag allows one to stop the clock for purposes of debugging the pipeline. The “L” flag allows one to lock thecache unit20 and the “Q” flag is a generate trap flag. The control codes shown inFIG. 3 are just examples and should not be seen to limit the scope of the present invention. Different sets or embodiments of flags could be used depending on the particular implementation.
In the embodiment depicted inFIG. 1, the cache unit is a fully associative or direct-mapped cache which would contain memory unit addresses with associated control flags, executable instructions, and tag information. Thecache unit20 may be a content addressable memory whereby the data in theregister18 is compared to all the contents in thecache unit20.
Thecache20 is also coupled to abus21. Thebus21 could be coupled to a host bus or to external memory. Thebus21 allows data to be downloaded into thecache20 or for allowing instructions to be executed from the external memory. Contents of thecache20 could be downloaded by the host system or managed via a simple communication scheme as described in the ST52T3 data book written by STMicroelectronics, Inc.
Cache control logic22 is coupled to thecache unit20 and tote multiplexer16. Thecache control logic22 controls the operation of thecache unit20 and when a particular instruction will be inserted into the instruction stream of thepipeline24. If there is no comparison match, thecache control logic22 will let the instruction from thememory unit14 flow through themultiplexer16 to thepipeline24. When there is a comparison match, the instruction from thememory unit14 is replaced by a new instruction from thecache unit20 in thepipeline24. Thecache control logic22 will have acache control register23. The cache control register23 allows one to control thecache unit20 and to control insertion of an instruction into thepipeline24. By setting various bits in thecache control register23, one would be able to enable/disable thecache unit20, modify the contents of thecache unit20 and control the general operation of thecache unit20. The cache control register23 will be described in further detail in relation to the dual cache system ofFIG. 2.
Amask register26 may be coupled to thecache unit20. Themask register26 may be a global mask register which would affect theentire cache unit20 or a local mask register32 (FIG. 3) whereby asingle cache line30 would have an associatedlocal mask register32. Themask register26 provides flexibility to thecircuit10. Themask register26 allows flexibility by allowing one to control how the data from thememory unit14 is matched with data in thecache unit20. For example, if all of the bits in theglobal mask register26 were set to 1, then what ever data came through theregister18 would be matched one to one against that of thecache unit20. One could also set theglobal mask register26 to invalidating thecache unit20 and let the memory unit instructions be executed as accessed by theprogram control12. Themask register26 may also be used to modify the contents of thecache unit20 by using simple write instructions.
Referring toFIG. 2, a second embodiment of the present invention is shown wherein like numerals represent like elements with the exception of a “′” to indicate another embodiment. Thecircuit10′ looks and operates in a similar fashion ascircuit10 depicted inFIG. 1. One difference incircuit10′ is that thecache20′ is divided into two separate caches: anaddress cache20A′ and aninstruction cache20B′. Thus, for theaddress cache20A′, the third field of the cache line will contain the memory unit address location to be matched, and for theinstruction cache20A′, the third field of the cache line will contain the memory unit instruction to be matched.
Thecache control logic22′ operates in a similar fashion as disclosed above. For the dual cache system, one implementation of the cache control register23′ is shown inFIG. 5. As can be seen inFIG. 5, by setting different bits in the cache control register23′, one is able to control the operation of thecache unit20′. The catch control register23′ depicted inFIG. 5 would be used in the dual cache system ofFIG. 2. In this particular embodiment, the cache control register23′ has locking, enabling, indexing, and match status bits for both theaddress cache20A′ and theindex cache20B′. Bits like the enable operation bit and the debug mode bit could be used in either the single cache system ofFIG. 1 or the dual cache system ofFIG. 2. The cache control register bit definition as shown inFIG. 5 is just one example and should not be seen to limit the scope of the present invention. Different configuration of bits could be used depending on the particular implementation.
The dual cache system also uses twomultiplexers16A′ and16B′. Thefirst multiplexer16A′ has a first input coupled to the output of theaddress cache20A′, a second input coupled to the output of theinstruction cache20B′, a third input coupled to thecache control logic22′, and an output coupled to the second multiplexer16B′. The second multiplexer16B′ has a first input coupled to the output of thefirst multiplexer16A′, a second input coupled to the output of thememory device14′, a third input coupled to thecache control logic22′, and outputs coupled to thepipeline24′ and thestatus buffer34′. In operation, thecache control logic23′ will control whichcache20A′ or20B′ is enabled and if there is a dual match if bothcaches20A′ and20B′ are enabled, which cache has priority. If there is a comparison match, thecache control logic22′ will cause themultiplexer16A′ to send an output from thecache unit20′ to the second multiplexer16B′. Thecache control logic22′ will then cause the multiplexer16B′ to insert the output from thecache unit20′ into the instruction stream to be executed. If there is no comparison match, thecache control logic22′ will cause the multiplexer16B′ to insert the instruction from thememory unit14′ into thepipeline24′.
In the embodiment depicted inFIG. 2, thecircuit10′ has astatus buffer34′. Thestatus buffer34′ has an input coupled to thecache control logic22′, an input coupled to the second multiplexer16B′, and an input coupled to thebus36′. The status buffer is used to store information related to the operation of thecircuit10′. For example, the status buffer could be used to gather debug information such as what line of code was matched. Although not shown inFIG. 1, it should be noted that thestatus buffer34′ could also be used in the embodiment depicted inFIG. 1.
OPERATIONReferring now to Table 1 below, the operation ofcircuit10 will be described. It should be noted that the operation ofcircuit10′ is similar to10 and will not be described in detail.
|  | Flags | Address | Op-code | ProgramCounter | Code Stream |  | 
|  | 
| 1 | MA | 0111111 | CP32 A,C | 0111111 | CP32 A,C | 
| 2 | IR | 1000000 | MOV A,B | 1000000 | 100000 | 
| 3 | RA | 1000010 | SAV B | 1000001 | MOV A,B | 
| 4 | RA | 1000011 | ADD B,C | 1000010 | 100001 | 
| 5 | XA | 1000101 |  | 1000011 | SAV B | 
|  |  |  |  | 1000101 | ADD B,C | 
|  |  |  |  | 1000110 | NOP | 
|  |  |  |  | 1000111 | 1000110 | 
|  |  |  |  |  | 1000111 | 
|  | 
When theprogram counter12 generates the address 0111111, theprogram counter12 will index thememory unit14. The instruction associated with address 0111111 from thememory unit14 will be stored in themultiplexer16. The address from theprogram counter12 is also sent to theregister18 where it is compared to the data stored in thecache unit20. As can be seen above, for address 0111111 there is a comparison match withcache line 1. Since the “M” flag is set forcache line 1, the op-code incache line 1 will replace the instruction from memory. Thus thecache control logic23 will send the CP32 A,C instruction associated withcache line 1 through themultiplexer16 into thepipeline24 to be execute.
The next address generated by theprogram counter12 is 1000000. The memory unit instruction associated with address 1000000 is sent from thememory unit14 and stored in themultiplexer16. The address generated by theprogram counter12 is sent to theregister18 where it is compared to the data stored in thecache unit20. For the address 1000000 there is a comparison match withcache line 2. Since the “I” flag is set forcache line 2, the op-code in cache line 2 (i.e., MOV A,B) will be inserted into the instruction stream after the instruction associated with the memory unit address location 1000000.
The next address generated by theprogram counter12 is 1000001. For this address there is no comparison match. Thus, thecache control logic23 will send the instruction associated with memory unit address location 1000001 through themultiplexer16 into thepipeline24 to be execute.
For the next address, 1000010, there is a comparison match withcache line 3. Since the “R” flag is set incache line 3, the op-code in cache line 3 (i.e., SAV B) replaces the memory unit instruction associated with the address 1000010 in the instruction stream.
The next address generated by the program counter is 1000011. For this address, there is a comparison match withcache line 4. Since the “R” flag is set incache line 4, the op-code ADD B,C incache line 4 replaces the memory unit instruction associated with the address 1000011 in the instruction stream.
The next address in the program counter is 1000101. Again there is a comparison match. This time the match is withcache line 5.Cache line 5 has the “X” flag set so the instruction is ignored or deleted (i.e., no operation (NOP)).
For the last two addresses in the program counter, 1000110 and 1000101, this is no comparison match. Thus, thecache control logic23 will send the instruction associated with these memory unit address locations through themultiplexer16 into thepipeline24 to be execute.
While the invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and other exchanges in form and details may be made therein without departing from the spirit and scope of the invention.