METHOD AND APPARATUS TO IMPROVE CONTEXT SWITCH TIMES
IN A COMPUTING SYSTEM
BACKGROUND OF THE INVENTION
This is a continuation-in-part of United States Patent Application Serial No. 09/291,811, filed April 14, 1999.
1. FIELD OF THE INVENΗON
This invention relates to the field of processors in computing systems, and, more specifically, to context switching.
2. BACKGROUND ART
In current computing systems, a processor (also referred to in various implementations as a microprocessor, controller, microcontroller, central processing unit (CPU), etc.) is used to decode program instructions associated with a process, and to execute the operations specified for each given instruction by manipulating data values and moving data within the computing system's address space. Many computing systems implement a scheme for handling multiple processes concurrently in what is called "multiprocessing." Multiprocessing includes sharing, among all runnable processes, instruction execution cycles of one or more processors. To handle multiple processes in a single processor, a processor interrupts the execution of a first process to execute a second process. The second process is then interrupted to execute a third process, or to resume execution of the first process, and so on. The determination of when each process executes within a processor (referred to as "process scheduling") may be, for example, time-based and/or priority-based, non-preemptive or preemptive.
The state of the processor during the execution of a given process is referred to as the "context" of that process. A process context includes, for example, the contents of the processor's registers, and may be further extended to include data stored in cache memory. In general, the process context includes the processor state necessary for a process to be stopped and restarted at a later time without incurring errors in execution, or requiring excessive re-execution of instructions to recapture the prior processor state.
When a processor stops executing a first process and begins (or resumes) execution of a second process, the context of the first process is saved to main memory (e.g., RAM), and the context of the second process, if any already exists, is retrieved from main memory and loaded into the processor. This process of writing (or "storing") a first context to memory and loading a second context into the processor is called "context switching," and occurs each time the processor changes processes. Unfortunately, context switching is a source of undesired overhead in the system, as the processor must sacrifice valuable execution cycles to individually store each register value for the outgoing context into main memory, and to individually load each register value from memory into the respective processor register for the incoming context. This undesired overhead increases with the number of registers supported by the processor architecture.
To reduce the overhead of context switching, processor architectures of the prior art use a "dirty" register scheme. This scheme limits the registers included in a current process context to those registers that have been used by the current process since the last context switch. The dirty register scheme is typically implemented by associating a "dirty bit" with each register. Whenever a register is used during process execution, that register's associated dirty bit is set. Those registers that have not been used by the current process, i.e., those registers whose associated dirty bits are not set, are ignored during a context switch. The decrease in context size results in a reduction in the number of store operations required to transfer an outgoing context to memory. The number of load operations required to transfer an incoming context back into the processor is also reduced.
Figure 1 is a flow diagram illustrating a context switch using the dirty register scheme. Steps 100-105 refer to the outgoing context of the process being interrupted, and steps 106-110 refer to the incoming context of the process that is resuming execution. The context switch begins in step 100, where certain special function registers are stored in memory as part of the outgoing context. These special function registers are those registers fundamental to processor operation, such as the program counter register, that are always included in the process context. The rest of the processor registers, commonly configured as part of a register file, are processed beginning in step 101 where the first register of the register file is identified as the current register under consideration.
In step 102, the processor determines whether the current register is dirty, i.e., whether the associated dirty bit is set. If the current register is dirty, the contents of the register are stored in memory in step 103 as part of the outgoing context. If the current register is not dirty, step 103 is bypassed. In step 104, the processor determines whether the current register is the last register in the register file. If the current register is not the last register, the next register is identified for consideration in step 105, and the context switch returns to step 102. If, in step 104, the current register is the last register in the register file, the outgoing context is completed with respect to the processor registers, and the context switch continues with the mcoming context in step 106.
In step 106, the special function registers of the processor are loaded from the incoming context in memory. In step 107, the first register value in memory is identified as the current register value being processed in the context switch, and in step 108, the current register value is loaded from the mcoming context in memory into the associated register in the processor's register file. In step 109, the processor determines whether the current register value is the last value of the incoming context. If the current register value is not the last register value, the next register value in the incoming context is identified as the current register value being processed, and the context switch returns to step 108. If, in step 109, the current register value is the last register value stored in the incoming context, the context switch is completed and the process associated with the mcoming context resumes execution.
Whereas early processor architectures had very few registers, current processor architectures attempt to improve processor execution efficiency by increasing the number of registers in the register file to minimize the need to perform relatively slow memory access operations. However, the increased number of registers can adversely affect context switch times by increasing the size of each process context. During a context switch, all processes must wait while the processor performs the required load and store operations for the respective incoming and outgoing contexts, as described above. Context switching performance is thus an important consideration in multiprocessing systems, and becomes more so as processor architectures adopt larger register files.  SUMMARY OF THE INVENTION
A method and apparatus to improve context switch times in a computing system are described. The invention provides a mechanism for registers containing unneeded data to be removed from a process context, thus reducing the amount of time required to perform a context switch. Program instructions are used to explicitly specify registers that are to be abandoned. For processor's implementing a dirty bit scheme, the processor resets the dirty bits for those registers that are to be abandoned. A subsequent context switch excludes the abandoned registers.
In one embodiment, a microprocessor instruction (such as a NOP instruction or other instruction with available fields) is provided that has one or more operands for specifying registers to be abandoned. During compiling, the operands of the microprocessor instruction are set to specify registers that become unnecessary, i.e., that no longer contain useful information. During execution, the processor determines the registers to be abandoned from the operands of the microprocessor instruction, and removes those registers from the process context (e.g., by resetting corresponding dirty bits).
In another embodiment, the identification of registers for abandonment is encoded within existing instructions that specify registers as source operands. An "abandon register" bit-field is provided in an instruction for each source operand that specifies a processor register. During compiling, a compiler is configured to determine whether a source register will be needed after a given instruction is executed, and to set an associated "abandon register" bit of the instruction based on the results of that determination. During execution, if the "abandon register" bit is not set for a given source operand, the "dirty bit" of the associated processor register is unchanged. If the "abandon register" bit is set for the given source operand, the processor removes the associated register from the process context, for example, by resetting the "dirty bit" of the associated processor register to indicate that the value stored in the register is not needed.
BRIEF DESCRIPTION OF THE DRAWTNGS
Figure 1 is a flow diagram of a context switch using a dirty register scheme.
Figure 2 is a block diagram of an example processor architecture.
Figure 3 is a block diagram of a register file.
Figure 4 is a bit diagram of sample instruction formats.
Figure 5 is a bit diagram of instruction formats in accordance with embodiments of the invention.
Figure 6 is a flow diagram of an instruction execution process in accordance with an embodiment of the invention.
Figure 7 is an example of assembly code illustrating application of a NOP instruction in accordance with an embodiment of the invention.
Figure 8 is a flow diagram of a compiler process in accordance with an embodiment of the invention.
Figure 9 is a flow diagram of an instruction execution process in accordance with an embodiment of the invention.
Figure 10 is an example of assembly code illustrating application of instruction encoding in accordance with an embodiment of the invention.  Figure 11 is a flow diagram of a compiler process in accordance with an embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
The invention is a method and apparatus to improve context switch times in a computing system. In the following description, numerous specific details are set forth to provide a more thorough description of embodiments of the invention. It will be apparent, however, to one skilled in the art, that the invention may be practiced without these specific details. In other instances, well known features have not been described in detail so as not to obscure the invention.
In the prior art, all registers that have been used by a process are included in the process context. This includes those registers for which the stored data has already been utilized in a desired operation, and for which no further purpose remains. Thus, registers containing unneeded data are unnecessarily transferred to main memory, and loaded from main memory, as part of a context switch. Sub-optimal context switch times are therefore achieved.
In embodiments of the invention, a processor is configured to remove unneeded registers from a process context (i.e., to abandon the register) in response to program instructions. A compiler is configured to provide such program instructions for registers that no longer contain data needed by the executing process. As a result, the number of registers that are required to be loaded or stored during a context switch is minimized, reducing the time needed to perform the context switch. Embodiment of General-Purpose Processor Architecture
An embodiment of the invention can be implemented in the instruction set of any processor architecture. For purposes of example, one possible embodiment of a processor architecture is shown in Figure 2.
Figure 2 is a block diagram of a microprocessor (i.e., a processor implemented on an integrated circuit (IC) chip) containing a processor 200 coupled to data bus 210 and address bus 211. More complex microprocessors may also include other on-chip components, such as floating-point units, memory management units, an "on-chip" cache (sometimes referred to as "level 1" or "LI" cache), and possibly further processors. Figure 2 also includes "off- chip" main memory (e.g., random access memory (RAM)) 208 coupled to the microprocessor chip via data bus 210 and address bus 211. Processor 200 can access program instructions and program data in main memory 208 via data bus 210 by asserting an appropriate memory address on address bus 211. During execution, processor 200 stores one or more process contexts (1-M) 209 in main memory 208. Each process context 209 represents the preserved state of a process not currently executing in processor 200.
Other off-chip computer components, such as "level 2" or "L2" cache, mass storage devices (e.g., disk drives) and input/output (I/O) devices (e.g., keyboard, mouse, display, network devices, etc.), may also be coupled to data bus 210 and address bus 211 to interface with the microprocessor. Each component occupies a portion of the computing system's address space.
Memory-mapped I/O devices may have data written to them, and read from them, in similar fashion to memory devices, by addressing the portion of the address space specified for each respective I/O device. For some microprocessors, data bus 210 and address bus 211 are implemented as a single, time-multiplexed bus.
Processor 200 provides the main mechanism for executing instructions associated with application programs, operating systems, and, in some embodiments, firmware routines. Execution is carried out by a process of fetching individual instructions from memory, decoding those instructions, and exerting control over the components of the processor to implement the desired function(s) of each instruction. The manner in which a processor carries out this execution process is dependent on the individual components of the given processor and the component interaction provided by the defined instruction set for the given processor architecture. For the example architecture of processor 200, the functions of the individual components are generally described below.
Processor 200 comprises arithmetic logic unit (ALU) 201, register file 202, program counter register (PC REG) 203, memory address register (MEM ADDR REG) 204, instruction register (INSTR REG) 205, control unit 206 and ALU input multiplexer (MUX) 212, each coupled to an internal data bus 213. Multiplexer (MUX) 207 is provided to drive address bus 211 from either program counter register 203 or memory address register 204. Memory buffer register 229 is coupled between data bus 210 and internal data bus 213 to buffer data transfers between the two data buses.
Program counter register 203, memory address register 204, memory buffer register 229 and instruction register 205 are special function registers used by processor 200. If the processor architecture implements a stack structure in main memory 208, processor 200 may also include a stack pointer register (not shown) to store the address for the top element of the stack. Program counter register 203, instruction register 205 and the stack pointer register (if implemented) are each stored as part of the process context during a context switch.
Program counter register 203 is used to store the address of the next instruction for the current process, and is updated during the execution of each instruction. This updating typically consists of incrementing the current value of program counter register 203 to point to the next instruction. However, for branching instructions, program counter register 203 may be directly loaded with a new address as a jump destination, or an offset may be added to the current value. In the execution of a conditional branching instruction, condition codes generated by arithmetic logic unit 201 may be used in the determination of which update scheme is applied to program counter register 203. When it is time to fetch the next instruction, program counter register 203 drives its output signal 218 onto address bus 211 via MUX 207. Control signal 224 from control unit 206 controls the loading of program counter register 203, whereas control signal 227 controls the selection mechanism of MUX 207 to drive output signal 218 (or output signal 219 of memory address register 204) onto address bus 211.
Memory address register 204 is used to hold the target memory address during execution of memory access instructions, such as "load" and "store." In variations of the load and store instructions, the memory address may be loaded into memory address register 204 from one of the registers in register file 202, from the results of an ALU operation, or from an address bit-field extracted from an instruction (e.g., via masking and shifting). When the memory access is initiated, memory address register 204 drives its output signal 219 onto address bus 211 via MUX 207.  Memory buffer register 229 provides a data buffer between the internal data bus 213 of processor 200 and the typically much slower data bus 210. Incoming data (including instructions) loaded from main memory 208 or other components external to processor 200 is first stored into memory buffer register 229 before being placed into one of the registers within processor 200. The address asserted on address bus 211 by program counter register 203 or memory address register 204 determines the origination of the incoming data. Outgoing data is loaded from one of the registers of processor 200 into memory buffer register 229 to be driven onto data bus 210. The address asserted on address bus 211 determines the outgoing data's destination address, in main memory 208, for example. Control unit 206 enables loading of memory buffer register 229 via control signal 228.
When an instruction is fetched from main memory (or from an instruction cache), the instruction is loaded into instruction register 205 (enabled by control unit 206 via control signal 226). The contents of instruction register 205 are made available to control unit 206 via port 220. An instruction cache (not shown) may also be implemented which stores a subset of the previously accessed instructions in a fast cache memory that is typically on-chip with the processor. When an instruction is fetched, the instruction cache is checked first to see if the instruction is resident in the cache (i.e., a cache "hit"). If the instruction is resident, then the instruction is loaded from the cache. If the instruction is not resident in the instruction cache (i.e., a cache "miss"), the instruction is fetched from the slower main memory.
Arithmetic logic unit 201 provides the data processing or calculating capabilities of the processor. Arithmetic logic unit 201 typically comprises double-input/single-output hardware for performing functions such as integer arithmetic (add, subtract), Boolean operations (bitwise AND, OR, NOT (complement)) and bit shifts (left, right, rotate). These functions are sufficient for processor 200 to implement most basic operations, with more complex operations being implemented in software using combinations of those basic operations (e.g., integer multiplication may be implemented by a series of integer add and shift operations). However, to improve the performance of the processor, arithmetic logic unit 201 may further comprise hardware for implementing more complex functions, such as integer multiplication and division, floating-point operations, specific mathematical functions (e.g., square root, sine, cosine, log, etc.) and vector processing or graphical functions, for example. Control signal 222 from control unit 206 controls a multiplexer or other selection mechanism within arithmetic logic unit 201 for selecting the desired function.
The inputs to arithmetic logic unit 201 are signal 214 from register file 202, and signal 217 from input multiplexer 212. Based on control signal 221 from control unit 206, MUX 212 selects signal 217 from internal data bus 213 or signal 215 from register file 202. Output signal 216 of arithmetic logic unit 201 is provided to internal data bus 213 and/or register file 202. Additional output signals (not shown) may include condition codes such as "zero," "negative," "carry" and "overflow." These condition codes may be used in implementing conditional branching operations during updating of program counter register 203.
Register file 202 comprises a set of fast, multi-port registers for holding data to be processed by arithmetic logic unit 201, or otherwise frequently used data. By mamtaining data in register file 202, the data is kept in close proximity to the processor, and lengthy memory access delays are avoided. Register access speeds typically permit register values to be operated on by the arithmetic logic unit and stored back into the register file within a single clock cycle. In contrast, access to values in main memory may take several cycles or more to transfer between the memory buffer register and main memory. The registers within register file 202 are directly accessible to software through instructions that contain source and destination register address bit-fields. Most current processors have at least thirty-two integer registers capable of holding 32-bit or 64-bit data words. A processor having floating point processing capabilities may also contain thirty-two or more floating point registers, each capable of storing a double-precision floating point data word. However, the concepts described herein may be applied to register files and registers (i.e., data words) of any size. An example implementation of register file 202 is illustrated in Figure 3.
In Figure 3, the register file comprises N registers 300 addressable using log2N bits for addresses 0 through N-l. Registers 300 are used to store data words. The register file also contains N "dirty" bits 301 (e.g., one-bit memory cells), with each dirty bit corresponding to a respective register 300. To indicate that a register has been used by the currently executing process, the respective dirty bit is set by the processor (e.g., by control unit 206 asserting set inputs of appropriate dirty bits via control signal 223) when a data word is loaded into the register. The set of used or dirty registers is included in the process context when a context switch occurs. In Figure 3, registers 0, 1, 4 and N-2 are shown with respective dirty bits set and are thus part of the process context. The dirty bits may be reset for a newly loaded process context after a context switch occurs.
Referring back to Figure 2, control unit 206 decodes the instruction contained in instruction register 205 to determine the operation or operations called for by the given instruction. Control unit 206 manipulates the control signals of the components of processor 200 to carry out those operations. For example, in a processor with thirty-two registers in register file 202 (addresses 0- 31), the assembly code "ADD $3 , $4 , $7" is implemented in an instruction comprising an "add" operation code, or "opcode," and three five-bit register address bit-fields. The contents of register 3 (OOlOlbin) and register 4 (OOlOObin) are provided to arithmetic logic unit 201 as signal 214 and 215 (control signal 221 selects signal 215 to be passed through MUX 212 as ALU input signal 217), and control signal 222 selects an add operation for arithmetic logic unit 201. The result of the addition is output from arithmetic logic unit 201 as signal 216, and loaded into register 7 (OOlllbin)- Control signal 223 enables the input and output ports of the specified registers in register file 202 to carry out the transmission of data between register file 202 and arithmetic logic unit 201.
In a simple processor, control unit 206 may be implemented with purely combinational logic to directly decode the opcode of the instruction into appropriately asserted control signals. However, in most current processors, a single instruction requires the processor to perform multiple operations, some of which may be performed in parallel. For example, the "ADD" instruction previously discussed might include the steps of processing the register values in arithmetic logic unit 201, updating program counter register 203, and fetching the next instruction. For those operations which may not be performed in parallel, control unit 206 typically contains a state machine implementation for asserting control signals during different timed states.
In some processor architectures, the control unit's state machine is implemented as a ROM-based microprogram that contains microinstructions corresponding to each state. Based on decoding of a program instruction, the microprogram jumps to routines or microinstruction sequences for carrying out the operations of specific program instructions. A separate program counter register in the control unit is used to step through the microinstructions, and each microinstruction is decoded to generate the appropriate control signals for the associated state.
The processor architecture described above is for purposes of example only. An embodiment of the invention may be implemented in any type of processor or processing environment, including virtual processors or virtual machines that implement context switching. The following is a general description of sample program instruction formats that may be implemented in a processor such as the embodiment of processor 200 previously described.
Program Instructions
Program instructions are provided in binary form, with different processor architectures having different instruction widths. An instruction typically comprises an operation code (or "opcode") bit-field and zero or more additional bit-fields. Examples of some possible instruction formats are illustrated as bit diagrams in Figure 4. Bit diagram 400 shows an instruction of width Y bits, having an opcode of X bits followed by a data field of (Y-X) bits. The sizes of the opcode bit-field and the data field may vary for different instructions within a processor architecture's instruction set. Also, the number and type of the data fields are in accordance with the defined instruction set of the processor architecture.
The opcode specifies the type of the instruction, such as a load, store, add, subtract, or jump instruction or variation thereof. The data field may be used to specify constant values, function flags and/ or address offsets for stack functions. The data field may also comprise several address fields (addresses 1-M), as shown in bit diagram 401. The address fields may specify source or destination address values for accessing the computing system's address space via buses 210 and 211, and/or the address fields may specify one or more source and/or destination register addresses for accessing the processor's register file 202.
Bit diagram 402 shows how an instruction specifying register operations with two source operands and one destination operand might be implemented in a 32-bit instruction, assuming that the processor has thirty-two addressable registers in its register file. The instruction contains an opcode field of seventeen or fewer bits to provide sufficient room for the source and destination operands. The opcode is followed by a first source operand (SRC 1), a second source operand (SRC 2) and a destination operand (DEST). The destination operand may be the same as one of the source operands if the instruction is operating on the values of two registers and placing the result back into one of those same registers. Each of the source and destination operands are five bits wide to permit any of registers 0 through 31 to be addressed. Many current computing systems are expanding to use 64-bit instructions, which allows for more and /or wider data fields. For register operations, a wider source or destination operand allows for addressing of a larger register file.
Description is provided below for implementing instructions to remove registers from a process context in accordance with one or more embodiments of the invention. The following embodiments may be applied to any instruction format, including the formats discussed above. Instructions With "Abandon Register" Operands
In one embodiment of the invention, a "NOP" instruction (i.e., an instruction that otherwise performs no action) is used to specify registers in the register file that should be released from the current process context. The NOP instruction is described in this embodiment because the NOP instruction typically contains surplus bits that may be used to specify "abandon register" operands. It will be obvious that other instructions that contain sufficient surplus bits to provide "abandon register" operands may be used in a similar manner.
As shown by bit diagram 500 in Figure 5, the instruction comprises a NOP opcode and one or more register addresses as operands. The NOP instruction is Y bits wide, with T bits provided for each operand for a register file of size less than or equal to 2T registers. The remaining bits comprise the NOP opcode. If only one operand may be specified in a NOP instruction and more than one source register should be abandoned, multiple NOP instructions may be used. When the NOP instruction is processed, the control unit asserts control signals to reset the dirty bits of those registers specified by the operands.
In many systems, register zero is hardwired to a specific value (e.g.,
"zero") and thus has no corresponding dirty bit. In such a system, for a NOP instruction that is not used to abandon a register, the "abandon register" oρerand(s) of the NOP instruction may be set to specify register zero so that no dirty bits are reset (i.e., implementing a true "no operation" NOP instruction).
Figure 6 is a flow diagram of the control unit process for handling a NOP instruction in accordance with an embodiment of the invention. As previously indicated, the process of Figure 6 may be similarly applied to instructions other than NOP instructions. In step 600 of Figure 6, a determination is made of whether the current instruction is a NOP instruction ( or other instruction used to implement "abandon register" operands). If the instruction is not a NOP instruction, no NOP handling is performed for that instruction. However, if the instruction is a NOP instruction, in step 601, the control unit determines which registers are specified by the operands of the instruction, e.g., by decoding the respective operands to generate register select signals. In step 602, the dirty bits of the designated register(s) are reset by the control unit asserting reset signals to those dirty bits. The NOP-related steps are completed after step 602.
The NOP instruction occurs after the program instruction that specifies the source register that is no longer needed. The closer the NOP instruction is placed to the program instruction that specifies the unneeded source register, the less likely it is that a context switch will occur before the source register is removed from the process context.
Figure 7 is an example code fragment (in pseudo-assembly code) illustrating the placement of NOP instructions in accordance with an embodiment of the invention. Fragment 700 represents unaltered program code, fragment 701 represents altered code (i.e., with NOP instructions inserted), and fragment 702 represents the altered code after optimization (i.e., NOP instructions combined). In the current example, the individual instructions of fragment 700 are intended to perform the following operations:
1) "Load" register 3 with the contents of memory at the address stored in register 1;
2) "Load" register 4 with the contents of memory at the address stored in register 1 incremented by one;
3) "Add" the contents of registers 3 and 4, and place the result in register 4; and
4) "Store" the contents of register 4 into the memory at the address stored in register 1 incremented by two.  In this example, the contents of register 3 are no longer needed after the add operation of step 3. Also, register 1 is no longer needed after its contents have been used to address memory in step 4. To remove the unneeded registers from the process context, as shown in fragment 701, NOP instructions are inserted after the "add" and "store" instructions. The NOP instructions specify address operands of register 3 and register 1, respectively. In a more optimizing compiler, the NOP instructions may be combined into a single NOP instruction after the "store" instruction. As shown in fragment 702, the combined NOP instruction specifies both register addresses and handles the resetting of the dirty bits for registers 3 and 1 simultaneously.
In a more conservative embodiment, the compiler may utilize only those NOP instructions already present in the code, e.g., to align code, balance instruction issue timing, etc. Registers are specified for abandonment based on the availability of NOP instructions in the code. Because there may be insufficient NOP instructions to account for every register that is a candidate for abandonment, and because NOP instructions may not be immediately available in the instruction stream, it is possible that some candidate registers will be included in a context switch. Thus, the conservative approach is less efficient at reducing context switch times than an embodiment that adds further NOP instructions. However, by foregoing the addition of NOP instructions into the instruction stream, code density is preserved. The number of required instruction fetches is reduced, saving fetch time and processor cycles, as well as space in the instruction cache (if implemented) and main memory.
Figure 8 is a flow diagram of a compiler process for setting fields in NOP instructions to abandon unneeded registers (or other instructions with available fields for such a purpose) in accordance with an embodiment of the invention. The steps described herein may be incorporated into any compiler process.
In step 800 of Figure 8, the compiler determines whether the current instruction under consideration has source registers. If no source registers are specified, the process is complete for the current instruction. If source registers are specified, the compiler determines in step 801 whether the contents of the specified source register(s) will be needed after the instruction is executed. If the contents of the source register(s) will be needed, no registers are to be abandoned and the process is complete. However, if, in step 801, the contents of one or more source registers will not be needed, then, in step 802, the compiler determines whether any NOP instructions are available for specifying abandoned registers. If, in step 802, a NOP instruction is available, the process continues at step 804, in which the unneeded source registers are identified in operands of the NOP instruction, completing the process for the current instruction. If, in step 802, no NOP instruction is available, the compiler may either append a NOP instruction to the code in step 803 and continue to step 804 (option A), or the process can complete for the current instruction (option B).
Options A and B represent different embodiments that may be implemented. Whereas option A insures that all unneeded registers are abandoned for best-case context switching, option B simplifies the compiler (no code insertion) and insures that code density is maintained.
As stated above, the compiler may perform an additional optimizing step
(not shown) of checking for consecutive or near consecutive instructions that require a NOP. If there are sufficient register fields available in the NOP instruction, the unneeded registers of the identified instructions are combined into a single NOP instruction occurring after the latest occurring instruction requiring the NOP. This optimization step provides for the removal of unneeded registers from a process context while minimizing the number of NOP instructions used.
Encoded Instructions With "Abandon Register" Bit-Fields
In another embodiment of the invention, instructions that specify registers as source operands are themselves utilized to identify those registers for abandonment (i.e., for removal from the current process context). Bit-fields for indicating abandoned registers are encoded or embedded within the instruction, so that the addition and /or utilization of NOP instructions for this purpose is unnecessary. Optimized context switching is achieved without sacrificing code density or processor cycles.
In those instructions that specify registers as source operands, a bit-field for one or more source operands is added, for example, to the opcode portion of the instruction. The added bit-field is referred to as the "abandon register" bit, and, when set, indicates that the corresponding source register is to be removed from the current process context after the instruction is executed. The processor is configured to decode the added bit-field or bit-fields when decoding the instruction, and, if indicated by the bit-field value, to remove the corresponding register from the process context as part of instruction execution( e.g., by resetting a respective dirty bit in the register file).
Bit diagram 501 of Figure 5 illustrates an example format for encoding information about unneeded registers into an instruction. This example shows two source operands (SRC 1 and SRC 2) and one destination operand (DEST) specifying registers in a processor's register file. The opcode portion of the instruction contains the standard opcode followed by the single-bit bit-fields Al and A2, representing the "abandon register" bits for SRC 1 and SRC 2, respectively. A similar format can be used for any instruction having one or more register-based source operands.
Figure 9 is a flow diagram illustrating a method for a processor to process encoded instructions in accordance with an embodiment of the invention. It will be obvious to one of ordinary skill in the art that certain steps in the following method may be performed in parallel with other steps, or in a sequence differing from that shown, without departing from the scope of the invention.
In step 900 of Figure 9, an instruction is fetched from main memory (or the instruction cache). The current instruction is decoded by the processor's control unit in step 901, and, in step 902, the operation specified by the decoded opcode is carried out by the processor. In step 903, the processor determines whether the current instruction specifies a register in a destination operand. If a destination register is specified, then, in step 904, the dirty bit of the destination register is set before proceeding to step 905. If a register is not specified in a destination operand, the method bypasses step 904 and continues in step 905.
At step 905, the processor determines whether any registers are specified in source operands of the current instruction. If no source registers are specified, the method continues at step 909. If source registers are specified in step 905, the steps shown in block 906 (i.e., steps 907-908) are executed for each source operand that has a corresponding "abandon register" bit in the instruction format. In step 907, the processor determines the value of the corresponding "abandon register" bit for the given source operand. If the "abandon register" bit is set, the dirty bit of the specified source register (i.e., the register whose address is specified in the source operand) is reset in step 908 and the method continues, returning to step 907 if another source operand remains to be processed, or proceeding in step 909 if no other source operands remain. If the "abandon register" bit is not set, the processor bypasses step 908, and performs no action with respect to the source register's dirty bit. In step 909, the program counter register is loaded with the next instruction address, and the method returns to step 900, where execution of the next instruction begins.
The flow chart of Figure 9 illustrates how the steps for abandoning a register are embedded within the last instruction that uses that register. As the steps for abandoning a register comprise the resetting of dirty bits, and do not affect other operations carried out in connection with the instruction being executed, the "abandon register" steps may be performed in parallel with other operations of the given instruction. As a result, program execution experiences little to no degradation in performance, and context switch times are reduced by the amount of time otherwise needed to store and load the abandoned registers.
Figure 10 illustrates how code fragment 700 is altered by the use of encoded instructions in accordance with an embodiment of the invention. Due to the encoded nature of the instructions, no additional instructions are added in the altered code fragment 1000. Rather, in each instruction that has one or more registers as source operands, a compiler-programmable "abandon register" bit is provided for one or more of the source operands. Optionally, registers that provide addressing data (e.g., pointers) for "load" and "store" instruction variations may also be provided with an "abandon register" bit (as shown in fragment 1000).  In fragment 1000, "abandon register" bits are represented by a "0" or "1" in brackets at the end of the assembly instruction. The order of the values within the brackets indicates which source operand is to be abandoned, though it will be apparent that other embodiments may use other orderings or representations. In the corresponding machine code instruction, the "abandon register" value in the brackets is represented by a bit value in the "abandon register" bit-field(s) of the instruction, as shown by bit-fields Al and A2 in bit diagram 501, for example.
In the first and second "load" instructions of fragment 1000, register 1 ($1) is the optional source operand. However, in this example, register 1 is not to be abandoned until the "store" instruction executes. Therefore, the "load" instructions are appended with "[0]", to indicate that the "abandon register" bit in the machine code instruction is not set (defaults to "0"). In the "add" instruction, registers 3 and 4 ($3 and $4) represent the first and second source operands, respectively. In this example, it is desired to abandon register 3, but maintain register 4. Therefore, the "add" instruction is appended with "[1,0]" to set the "abandon register" bit-field Al while leaving bit-field A2 at its default "0" value. In the "store" instruction of fragment 1000, register 4 ($4) represents a first source operand as the data source of the store operation, whereas register 1 ($1) represents an optional second source operand providing addressing data. In this example, it is desired to maintain register 4, but abandon register 1 (e.g., the code no longer needs to access the memory block pointed to by register 1). The store instruction is thus appended with "[0,1]", to leave bit-field Al at its default value and to set bit-field A2. Other assembly code formats may also be used to designate whether the "abandon register" bit-fields are set for a given instruction.  Figure 11 is a flow diagram of a compiler process for handling "abandon register" bits in accordance with an embodiment of the invention. In step 1100 of Figure 11, the compiler process determines whether the current instruction includes source registers (i.e., registers specified in source operands). If the current instruction does not specify source registers, the process is completed for the current instruction. If the current instruction does specify source registers, the first source register is identified for evaluation in step 1101. In step 1102, for the current source register, the compiler process determines whether the data in the source register is still useful (i.e., needed for future processing). If the data is no longer useful, in step 1104, the compiler process sets the corresponding "abandon register" bit in the instruction before proceeding to step 1105. If, in step 1102, the data is still useful, in step 1103, the compiler process resets the "corresponding "abandon register" bit, or leaves the "abandon register" bit at its default value, and proceeds to step 1105. In step 1105, if the current source register is the last or only source register, the process is completed. Otherwise, the compiler process identifies the next source register in step 1106 and returns to step 1102.
The steps of Figure 11 may be applied as each instruction is generated during compilation of a higher-level program, or the compiler may perform these steps on the program instructions in an optimizing sweep after all instructions have been generated. As an optimizing sweep, the compiler is able to use knowledge of later occurring program instructions in determining the usefulness of register values (e.g., if no later occurring instructions access a register value, that register may be abandoned). Further, as a compiler optimization process, the process of Figure 11 may be carried out as a modular process separate from normal compilation. Thus, even if a program is written directly in assembly or machine code form by a programmer without attention to abandoning registers, the optimization process of Figure 11 may still be applied to the program code to set the "abandon register" bit-field values for improved context switching. This is also true for the process of Figure 8 in connection with the "NOP" embodiment previously described.
Thus, a method and apparatus to improve context switch times in a computing system have been described in conjunction with one or more specific embodiments. The invention is defined by the claims and their full scope of equivalents.