CROSS REFERENCE TO RELATED APPLICATIONSThis application is a continuation of U.S. patent application Ser. No. 11/261,655, filed on Oct. 31, 2005, now allowed, titled “Processor Core and Method for Managing Program Counter Redirection in an Out-of-Order Processor Pipeline”, which is incorporated by reference herein in its entirety.
FIELD OF THE INVENTIONThe present invention relates generally to microprocessors. More particularly, it relates to a microprocessor having an out-of-order processor pipeline.
BACKGROUND OF THE INVENTIONProcessor pipelining is a known technique used to make microprocessors operate more quickly. This technique enables a microprocessor to work on different steps of an instruction at the same time and thereby take advantage of parallelism that exists among the steps needed to execute an instruction. As a result, a microprocessor can execute more instructions in a shorter period of time.
Many microprocessors, especially those used in the embedded market, are relatively simple in-order machines. As a result, they are subject to data hazard stalls. More complex microprocessors have out-of-order pipelines, which allow execution of instructions to be scheduled around hazards that would stall an in-order processor pipeline.
Speculation is used to resolve branch instructions and predict whether a conditional branch is taken or not taken in an out-of-order machine. When a branch resolution results in a misprediction, all younger instructions in a program stream must be cleared from the pipeline. Conventionally, this is accomplished using an age-based comparison technique across the entire processor pipeline. While this conventional technique works for its intended purpose, it requires maintaining and updating a number of register renaming maps, especially in microprocessors that employ a pipeline having a large number of processing stages.
What is needed is a new technique for clearing an out-of-order pipeline of a microprocessor following a branch misprediction, which overcomes the deficiencies noted above.
BRIEF SUMMARY OF THE INVENTIONThe present invention provides a processor core and a method for managing program counter redirection in an out-of-order processor pipeline. In one embodiment, the pipeline of the processor core includes a front-end instruction fetch portion, a back-end instruction execution portion, and pipeline control logic. Operation of the instruction fetch portion is decoupled from operation of the instruction execution portion. Following detection of a control transfer misprediction, at least one signal generated in the instruction execution portion of the processor pipeline causes pipeline control logic to clear/invalidate instructions residing in the instruction fetch portion and halt the flow of instructions from the instruction fetch portion to the instruction execution portion of the pipeline. Operation of the instruction execution portion continues until the control transfer instruction associated with the misprediction reaches a selected stage of the instruction execution portion of the processor pipeline. When the instruction associated with the misprediction reaches the selected stage, pipeline control logic clears/invalidates instructions residing in the instruction execution portion of the pipeline. The flow of instructions from the instruction fetch portion to the instruction execution portion of the processor pipeline is restarted after pipeline control logic clears/invalidates instructions residing in the instruction execution portion of the processor pipeline.
In one embodiment, an instruction fetch portion of a pipeline of a processor core according to the present invention includes a program counter selector, an instruction buffer, and a branch predictor. The program counter selector selects addresses/program counter values used to fetch instructions from memory. If a fetched instruction is a control transfer instruction such as, for example, a branch instruction or a jump instruction, the branch predictor predicts whether a conditional branch associated with the instruction is taken or not taken. The instruction buffer stores fetched instructions until they are selected for execution by an instruction execution portion of the pipeline.
In one embodiment, an instruction execution portion of a pipeline of a processor core according to the present invention includes an instruction decoder, an instruction identification generator, a buffer, an arithmetic logic unit, and a mispredict instruction identification checker. The instruction decoder decodes instructions read from an instruction buffer. The instruction identification generator associates or assigns instruction identification tags to instructions. The instruction identification tags are used by the mispredict instruction identification checker to determine a program order of a first control transfer instruction relative to a second control transfer instruction. The buffer of the instruction execution portion of the pipeline stores decoded instructions until they are executed by the arithmetic logic unit. If a decoded instruction is a control transfer instruction such as, for example, a branch instruction or a jump instruction, the arithmetic logic unit determines whether a branch prediction made by a branch predictor residing in an instruction fetch portion of the pipeline is correct. If the prediction is incorrect, the mispredict instruction identification checker compares the instruction's identification tag to an identification tag and/or a valid bit stored in a register of the mispredict instruction identification checker to determine if the instruction is permitted to redirect the instruction fetch portion of the pipeline.
In one embodiment of the present invention, the processor core is capable of executing instructions from multiple program threads.
In one embodiment of the present invention, the processor core is capable of executing instructions having different bit-widths (e.g., instructions having 16-bits, 32-bits, et cetera).
In one embodiment of the present invention, the processor core includes a pipeline that includes multiple parallel processing paths.
Further embodiments, features, and advantages of the present invention, as well as the structure and operation of the various embodiments of the present invention, are described in detail below with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS/FIGURESThe accompanying drawings, which are incorporated herein and form a part of the specification illustrate the present invention and together with the description further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention. In the drawings, like reference numbers indicate identical or functionally similar elements. Additionally, the left-most digit of the reference number indicates a drawing in which the reference number first appears.
FIG. 1 is a diagram of a processor core according to a first embodiment of the present invention.
FIG. 2 is a diagram of a mispredict instruction identification checker according to an embodiment of the present invention.
FIG. 3 is a diagram of an instruction fetch portion of a processor pipeline according to an embodiment of the present invention.
FIG. 4 is a diagram of an instruction execution portion of a processor pipeline according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating the association of instruction identification tags with instructions according to an embodiment of the present invention.
FIG. 6 is a flowchart of a method for clearing/invalidating a processor pipeline according to an embodiment of the present invention.
FIGS. 7A and 7B are a flowchart of a method for controlling program counter redirection in an out-of-order processor pipeline according to an embodiment of the present invention.
FIG. 8 is a diagram of a processor core according to a second embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTIONThe present invention provides a processor core and a method for managing program counter redirection in an out-of-order processor pipeline. In the detailed description of the invention that follows, references to “one embodiment”, “an embodiment”, “an example embodiment”, etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
FIG. 1 is a diagram of amicroprocessor100 according to an embodiment of the present invention.Microprocessor100 includes aprocessor core102,instruction memory104, and aregister file106.Processor core102 has a pipeline that includes aninstruction fetch portion108 and aninstruction execution portion110.
As shown inFIG. 1,instruction fetch portion108 ofprocessor core102 includes a program counter (PC)selector112, anoptional recoder114, and aninstruction buffer116.Instruction fetch portion108 also includes abranch predictor118 and a portion ofpipeline control logic120.
Program counter selector112 selects an address or program counter value to be used to fetch a program instruction from memory. In one embodiment,PC selector112 can select a sequentially incremented program counter value, a redirect program counter value, or a program counter value for a new program thread.PC selector112 receives one or more signals generated bypipeline control logic120 and/or a mispredictinstruction identification checker134, which causePC selector112 to select a redirect program counter value following a branch misprediction associated with a control transfer instruction.
Processor core102 is preferably capable of executing both a standard width instruction (e.g., a 32-bit instruction) and a compressed-format width instruction (e.g., a 16-bit instruction). Accordingly, in one embodiment,processor core102 includesoptional recoder114. If a compressed-format instruction is fetched frominstruction memory104, it is recoded byrecoder114 to a format width that can be decoded by decoder/renamer122 and executed byarithmetic logic unit126. In one embodiment, both standard width instructions and compressed-format width instructions are recoded byrecoder114 to an instruction width having more bits than a standard width instruction. Instructions are passed fromoptional recoder114 toinstruction buffer116.
Instruction buffer116 is capable of holding multiple instructions. In one embodiment, in whichprocessor core102 implements multithreading, instructions from different program threads are stored, for example, in separate portions ofinstruction buffer116. Multithreading refers to an ability of an operating system to execute different parts of a program, called threads, simultaneously. In another embodiment, in whichprocessor core102 implements multithreading, instructions from different program threads are stored in separate instruction buffers, for example, one instruction buffer for each program thread.
In instances where a control transfer instruction such as, for example, a branch instruction or a jump instruction, is fetched frominstruction memory104,branch predictor118 predicts whether a conditional branch associated with the control transfer instruction is taken or not taken. Any known branch prediction algorithm can be used. In one embodiment,branch predictor118 includes a branch prediction table that is used in predicting whether a conditional branch is taken or not taken.
Instruction fetchportion108 ofprocessor core102 preferably fetches multiple instructions per fetch cycle.
Instruction execution portion110 ofprocessor core102 includes a decoder/renamer122, abuffer124, an arithmetic logic unit (ALU)126, and acompletion buffer128. Ascheduler130 dynamically schedules instructions for execution byinstruction execution portion110 ofprocessor core102. Also included ininstruction execution portion110 are aninstruction identification generator132, a mispredictinstruction identification checker134 that includes aregister136, and a portion ofpipeline control logic120.
Instructions are read frominstruction buffer116 and decoded by decoder/renamer122. Decoder/renamer122 performs the functions of decoding instructions and updating a register renaming map (not shown). During the decoding/renaming process, each instruction is associated with/assigned an instruction identification tag. The instruction identification tags are generated byinstruction identification generator132.
In one embodiment, the instruction identification tags are sequentially generated multi-bit values. The number of bits that are generated is dependent on how many instructions are executed simultaneously withininstruction execution portion110 ofprocessor core102. In one embodiment, in whichprocessor core102 performs multithreading, instruction identification tags are generated and assigned on a per thread basis.
Instructions are read frombuffer124 and executed byarithmetic logic unit126 in accordance with a schedule determined byscheduler130.Scheduler130 schedules instructions for execution once their operands are ready and preferably in accordance with their age. Results inarithmetic logic unit126 are written tocompletion buffer128 and stored until instructions graduate and their results are written to registerfile106. The register renaming map (not shown) is updated when an instruction graduates. As will be understood by persons skilled in the relevant arts given the description herein, the present invention eliminates the need to save multiple renaming map states, which are required by conventional out-of-order processors for every renamed instruction.
During execution of a control transfer instruction such as, for example, a branch instruction or a jump instruction,arithmetic logic unit126 determines whether a branch prediction made bybranch predictor118 is correct. If the prediction is incorrect, mispredictinstruction identification checker134 compares the instruction's identification tag to an identification tag and/or a valid bit stored inregister136 of mispredictinstruction identification checker134 to determine if the instruction is permitted to redirect instruction fetchportion108 ofprocessor core102. As described in more detail below, if the result of the comparison indicates that the control transfer instruction was issued out-of-program-order relative to a second control transfer instruction that mispredicted, and that is awaiting graduation, then mispredictinstruction identification checker134 will enable the control transfer instruction currently being considered to redirect instruction fetching.
Instruction memory104 is any memory accessible toprocessor core102 such as, for example, an instruction cache, a scratch pad, a loop buffer, et cetera. In one embodiment,memory104 includes multiple memories and/or multiple types of memories.
Register file106 includes a plurality of general purpose registers (not shown), which are visible to a programmer.
Operation of instruction fetchportion108 of the pipeline ofprocessor core102 is decoupled from operation ofinstruction execution portion110. Following detection of a conditional branch/control transfer misprediction, at least one signal generated ininstruction execution portion110 causespipeline control logic120 to clear/invalidate instructions residing in instruction fetchportion108. Additionally, the flow of instructions from instruction fetchportion108 toinstruction execution portion110 of the pipeline is halted. Operation ofinstruction execution portion110 continues until a control transfer instruction associated with a misprediction reaches a selected stage ofinstruction execution portion110 of the processor pipeline. When the instruction associated with the misprediction reaches the selected stage,pipeline control logic120 clears/invalidates instructions residing ininstruction execution portion110. The flow of instructions from instruction fetchportion108 toinstruction execution portion110 is enabled/restarted afterpipeline control logic120 clears/invalidates instructions residing ininstruction execution portion110 of the pipeline.
In one embodiment, in whichprocessor core102 performs multithreading,pipeline control logic120 only clears/invalidates instructions belonging to the program thread associated with the misprediction. Instructions are cleared/invalidated, for example, by changing a valid bit or tag associated with each instruction to indicate that the instructions are no longer valid instructions. Invalid instruction are prevented from writing their results to registerfile106.
FIG. 2 is a more detailed diagram of mispredictinstruction identification checker134. As shown inFIG. 2, mispredictinstruction identification checker134 includes comparelogic200 and aregister136 that includes an instructionidentification tag portion202 and avalid portion204. In one embodiment, in whichprocessor core102 supports multithreading, aregister136 is maintained for each program thread.
Mispredictinstruction identification checker134 is used to determine whether a current control transfer instruction being executed byarithmetic logic unit126 is permitted to redirect instruction fetchportion108 of the pipeline ofprocessor core102. In one embodiment, following detection of a conditional branch misprediction associated with the current control transfer instruction, comparelogic200 of mispredictinstruction identification checker134 compares the instruction identification tag of the current control transfer instruction to values residing inregister136 of mispredictinstruction identification checker134. If thevalid portion204 ofregister136 is set (e.g., contains a valid bit equal to one), register136 contains the instruction identification tag of an earlier executed control transfer instruction that was involved in a branch misprediction and that has not yet graduated. In this case, if the current control transfer instruction is determined to precede the earlier control transfer instruction in program order, based on the instruction identification tag comparison, register136 is updated to contain the instruction identification tag of the current control transfer instruction and the current control transfer instruction is permitted to redirect instruction fetching. If however the earlier control transfer instruction is determined to precede the current control transfer instruction in program order, register136 is not updated to contain the instruction identification tag of the current control transfer instruction, and the current control transfer instruction does not redirect instruction fetching. If thevalid portion204 ofregister136 is reset (e.g., contains a valid bit equal to zero), the instruction identification tag of the current control transfer instruction is stored inregister136 and the current control transfer instruction is permitted to redirect instruction fetchportion108.
When a control transfer instruction graduates and its instruction identification tag is stored inregister136, thevalid portion204 ofregister136 is reset so that a control transfer instruction occurring later in program order will be permitted to redirect instruction fetching following detection of a branch misprediction. In one embodiment, if an instruction identification tag of a graduating control transfer instruction matches the identification tag held inregister136, a valid bit inregister136 is reset/cleared.
Generally speaking, when a conditional branch misprediction associated with a control transfer instruction is detected byarithmetic logic unit126, the control transfer instruction should be permitted to redirect instruction fetching. This is not the case, however, if the control transfer instruction (e.g., instruction BNE-2 shown inFIG. 5) follows in program order another control transfer instruction (e.g., instruction BNE-1 shown inFIG. 5) that has redirected instruction fetching but that has not yet graduated. Assuming that both control transfer instructions (BNE-1 and BNE-2) belong to a single program thread, redirection by the second control transfer instruction (BNE-2) is not required and should be suppressed.
The operation ofprocessor core102 will now be described in more detail with reference toFIGS. 3-5. As will become apparent to persons skilled in the relevant art(s) given the description herein, the present invention permits an early restart of instruction fetch operations following a branch misprediction. The present invention also eliminates any need for an explicit age-based comparison of instructions across the entire pipeline of a microprocessor and the need for maintaining multiple renaming map states.
FIG. 3 is a diagram illustrating one embodiment of instruction fetchportion108 ofprocessor core102. In the embodiment shown inFIG. 3, instruction fetchportion108 includes five pipeline stages. These five pipeline stages are illustrative and not intended to limit the present invention. In other embodiments, instruction fetchportion108 can have more or less than 5 pipeline stages. The number of pipeline stages that are implemented in any embodiment of the present invention is a design choice.
As shown inFIG. 3, the five pipeline stages of instruction fetchportion108 are stage302 (select program counter), stage304 (check instruction tags), stage306 (fetch instruction), stage308 (recode instruction if required), and stage310 (write instruction to instruction buffer).
Instage302,PC selector112 selects amongst a variety of program counter values to be used to fetch an instruction frominstruction memory104. In one embodiment, the program counter value selected can be the program counter value of a new program thread, the next sequential program counter value for an existing program thread, or a redirect program counter value associated with a branch instruction or a jump instruction.
Instage304, the instruction tags associated with an instruction to be fetched frominstruction memory104 are checked. In one embodiment, the instruction tags contain precode bits for each instruction indicating instruction type. If these precode bits indicate that an instruction is a control transfer instruction, a branch history table is accessed and used to determine whether the control transfer instruction is likely to branch or likely not to branch.
Instage306, one or more instructions are fetched frominstruction memory104. In one embodiment, if a fetched control transfer instruction is predicted as not likely to branch, computation of a conditional branch target address for the instruction is started duringstage306.
Instage308, any compressed-format instructions are recoded into a format that can be decoded and executed byinstruction execution portion110 ofprocessor core102. For example, in one embodiment in whichprocessor core102 executes both 16-bit instructions and 32-bit instructions, any 16-bit compressed-format instructions are recoded byrecoder114 to form instructions having 32 bits. In another embodiment,recoder114 recodes both 16-bit instructions and 32-bit instructions to a format having more than 32 bits.
Instage310, instructions are written toinstruction buffer116. In one multithreading embodiment,processor core102 includes one instruction buffer for each program thread. In this embodiment, instructions associated with a particular program thread are written into an instruction buffer reserved for the particular program thread. In one embodiment,stage310 can be bypassed and instructions can be dispatched directly to decoder/renamer122.
FIG. 4 is a diagram illustrating one embodiment ofinstruction execution portion110 ofprocessor core102. In the embodiment shown inFIG. 4,instruction execution portion110 includes five pipeline stages. These five pipeline stages are illustrative and not intended to limit the present invention. Other embodiments have more or less than five pipeline stages.
As shown inFIG. 4, the five pipeline stages ofinstruction execution portion110 are stage402 (read from instruction buffer), stage404 (decode instruction), stage406 (execute instruction), stage408 (write to completion buffer), and stage410 (write to register file).
Instage402, instructions are read frominstruction buffer116. In one embodiment that uses delay slots following a branch instruction, a branch instruction and its delay slot are always read together instage402. As described herein, following resolution of a branch misprediction, the ability to read instructions frominstruction buffer116 may be temporarily halted until selected instructions residing withininstruction execution portion110 of the pipeline are cleared/invalidated bypipeline control logic120.
Instage404, instructions are decoded. In parallel with decoding, register renaming map(s) are updated and used to determine whether required source operands are available, for example, inregister file106 and/orcompletion buffer128. A register renaming map is a structure that holds the mapping information between programmer visible architectural registers and internal physical registers. Register renaming map(s) indicate whether data is available and where data is available. In one embodiment, the register renaming map(s) also include an active bit that indicates whether a latest producer, if there is one, of a corresponding general purpose register has been issued into the pipeline or not. In one embodiment, in whichprocessor core102 includes accumulation registers, a separate renaming map is maintained for the accumulation registers. This renaming map is similar to the renaming map maintained for general purpose registers.
Instructions instage404 receive a completion buffer identification tag and an instruction identification tag. The completion buffer identification tag determines the location incompletion buffer128 wherearithmetic logic unit126 can write calculated results for an instruction. In one embodiment, each instruction identification tag is a thread-specific sequentially generated value that uniquely determines the program order of instructions residing ininstruction execution portion110 ofprocessor core102. At the end ofstage404, decoded instructions are placed inbuffer124.Scheduler130 then selects instructions residing inbuffer124 for execution byarithmetic logic unit126.
Instage406, instructions are executed byarithmetic logic unit126 and control transfer instructions such as, for example, branch instructions and jump instructions are resolved. For control transfer instructions, resolved paths are compared against predicted paths. If a misprediction has occurred, and redirection is required, the correct address is provided to instruction fetchportion108 ofprocessor core102, the branch history table is updated, and instruction fetchportion108 is redirected to start fetching instructions from the redirect address (program counter value). In one embodiment, selected instructions such as, for example, floating point instructions are processed by a coprocessor (not shown) coupled toarithmetic logic unit126.
In one embodiment of the present invention, an adder in thearithmetic logic unit126 is used to compute both a target address and a fall through address.Arithmetic logic unit126 compares the branch outcome and determines the direction of the branch, which is compared to the predicted direction. If there is a misprediction, theinstruction execution portion110 signals instruction fetchportion108 of the misprediction and provides instruction fetchportion108 with the correct prediction. In embodiments of the present invention, depending on the type of control transfer instruction involved in a misprediction, instruction fetchportion108 may either immediately restart fetching, stall untilpipeline control logic120 invalidates selected instructions in the pipeline once the control transfer instruction involved in the misprediction reaches a selected stage in the pipeline (e.g., upon instruction graduation), or instruction fetchportion108 may stall and wait for a restart signal frominstruction execution portion110. In each case, the control transfer instruction involved in the misprediction is preferably marked mispredicted, and it continues to flow through the pipeline eventually carrying this status tocompletion buffer128.
Because branch instructions can be issued out-of-order ininstruction execution portion110, mispredictinstruction identification checker134 keeps track of the instruction identification tag of the last redirected branch. If a new resolution results in a mispredict, the identification tag of the new branch instruction is compared against the identification tag of the last redirected branch that has not yet graduated. If the program order of the new branch is prior to the previous branch, the instruction fetchportion108 is redirected according to the new branch instruction. Register indirect jumps, which have been predicted with a return stack are also compared against the real register value instage406. If there is a mispredict, the correct target is sent to instruction fetchportion108.
The results of branch decisions are prioritized and communicated to instruction fetchportion108 of the pipeline duringstage406. In addition to the branch redirects, there can be redirects from graduating instructions. These redirects are prioritized over the branch redirects and a final redirect is forwarded toPC selector112 of instruction fetchportion108 viapipeline control logic120. A valid redirect will cause instruction fetchportion108 to clear/invalidate its current fetch stream of all instructions in instruction fetchportion108 associated with the redirect. A new fetch is started with the newly received target address.
Instage408, results generated byarithmetic logic unit126 and/or a coprocessor are written tocompletion buffer128. As noted above, instructions that are accepted intoinstruction execution portion110 ofprocessor core102 are assigned a completion buffer identification number.
In one embodiment, the assignment of completion buffer identification numbers is done using a free list. The free list contains as many entries as the number of entries in each completion buffer. In one embodiment, up to two entries can be read from and written to the free list, for example, every cycle during normal operation. Following a branch mispredict, however, all the entries may be released and data written into the free list. Additionally, in multithreading embodiments, particular program threads can be released and data written into the free list without effecting entries for other program threads. Because of this requirement, the free list preferably is not implemented as a simple stack.
In one embodiment, the free list is implemented using a bitmap to represent the entries incompletion buffer128. A first bit of the bitmap indicates whether the completion buffer entry is either available (e.g., if the bit has a value of one) or unavailable (e.g., if the bit has a value of zero). In a multithreading embodiment, if completion buffer identification numbers are assigned regardless of a thread content to which an instruction belongs, instruction identification tags are preferably assigned on a per thread basis. In one embodiment, an instruction identification tag is assigned to all incoming instructions regardless of whether the instructions write to a destination register.
Assigned completion buffer identification numbers are written into a graduation first-in, first-out buffer at a location identified by a write pointer and/or a write pointer plus an offset value. Completion buffer completion bits associated with newly renamed instructions are reset/cleared to indicate incomplete results. As instructions complete execution, their corresponding completion buffer completion bits are set, thereby enabling the instructions to graduate and release their associated completion buffer identification numbers.Pipeline control logic120 ensures that one thread content does not consume more than its share of completion buffer entries.
In one embodiment of the present invention, separate structures are used to hold the program counter and other attributes of instructions that need not be piped along with the instruction at every stage of the processor pipeline. One such structure is a program counter completion buffer, which can be implemented as a field incompletion buffer128 and managed using read and write pointers.
As noted herein, register renaming is done for destination registers to remove output dependencies and to ensure there is a single producer of a given register inprocessor core102 at any given time. The source registers are renamed so that data is obtained from a producer at the earliest opportunity instead of waiting for the processor core's architectural state to be updated. This also aids in reducing dependency check complexity in any coprocessor coupled, for example, toarithmetic logic unit126.
Instage410, results fromcompletion buffer128 are written to register file106 as instructions graduate and register renaming map(s) are updated. Each instruction preferably graduates according to program order.
Control transfer instructions such as, for example, branch instructions and jump instructions require special processing in order to allow these instructions to graduate in program order. As noted herein, control transfer instructions are preferably resolved duringstage406 of the processor pipeline. However, some control transfer instructions such as, for example, some branch instructions cannot be resolved untilstage410 because of a need for an architectural condition evaluation. Coprocessor conditional branch instructions, for example, are evaluated at graduation. This is due to the fact thatarithmetic logic unit126 may not have access to floating point generated information. In case of a misprediction, the redirection program counter is read from a completion buffer written to by the floating point coprocessor. This program counter is then sent to the instruction fetchportion108 on a mispredict.
As a part of instruction graduation, information pertaining to each instruction that is stored incompletion buffer128 and/or a graduation first-in/first-out buffer is read. The read information includes both data and status information. The data information corresponds, for example, to results calculated byarithmetic logic unit126 and/or a coprocessor coupled toarithmetic logic unit126. The status information includes, for example, completion status, exception information, branch resolution information, et cetera. Data read fromcompletion buffer128 is committed to an architecturally visible general purpose register (register file106) if no flush and redirect due to an exception or other special case is required. If a redirect of the program counter is required,pipeline control logic120 generates signals needed to accomplish the redirect.
In one embodiment, when a control transfer instruction reaches the head of a first-in first-out buffer, all instructions that belong to that program thread are invalidated ininstruction execution portion110 of the pipeline. The fetch instruction portion does not have to invalidate instructions that belong to the program thread at this point in time as any instruction that needed to be invalidated would have been invalidated when the misprediction was first identified.
In one embodiment of the present invention, branch likely instructions are handled differently due to their particular behavior. Instruction fetchportion108 fetches an issued instruction according to the architectural specification. However, if there is a mispredict, instruction fetchportion108 is required to replay the branch instruction sequence with the correct prediction. The branch is marked mispredicted, and it flows through the pipeline eventually carrying this status tocompletion buffer128. When the branch instruction eventually reaches the head of a first-in first-out buffer, all instructions that belong to that program thread are invalidated in theinstruction execution portion110 of the pipeline, and redirection information (e.g., a replay address from the program counter completion buffer) is sent to instruction fetchportion108. This causes a replay of the branch likely instruction using a corrected prediction.
FIG. 5 illustrates two tables502 and504 that illustrate how instruction identification tags can be assigned to instructions in a way that allows the instruction identification tags to be used to identify a program order of one instruction relative to another instruction. Table502 illustrates seven instructions, in program order, fetched fromaddresses1 through7. Table504 shows a possible execution order for the same seven instructions along with assigned instruction identification tags that identify the instructions' program order.
As can be seen from a review of tables502 and504, there are two branch instructions. Branch-if-not-equal instruction1 (BNE-1) and branch-if-not-equal instruction2 (BNE-2). Due to out-of-order processing byprocessor core102, two issue scenarios are possible. The two branch instructions can be issued either in-order or out-of-order. As described herein, if the branch instructions are issued in-order and both instructions are assigned incorrect branch predictions bybranch predictor118, the second branch instruction (BNE-2) should be suppressed from redirecting instruction fetchportion108 because this branch instruction was issued incorrectly on a speculative basis and the first branch instruction (BNE-1) will have already provided instruction fetchportion108 of the pipeline with the correct redirection.
Two method embodiments of the present invention will now be described with regards toFIGS. 6,7A, and7B.
FIG. 6 depicts a flowchart of amethod600 for clearing a processor pipeline according to an embodiment of the present invention. As shown inFIG. 6,method600 includes fivesteps602,604,606,608, and610.Method600 is capable of being implemented, for example, usingprocessor core102 described above, but it is not limited to being implemented inprocessor core102.
Instep602, a conditional program counter redirect misprediction is detected during execution of a control transfer instruction. The misprediction is detected in an instruction execution portion of a processor pipeline such as, for example,instruction execution portion110 ofprocessor core102 described above.
Instep604, movement of instructions from an instruction fetch portion of the processor pipeline to the instruction execution portion of the processor pipeline is halted in response to the conditional program counter redirect misprediction. Operation of the instruction fetch portion of the pipeline is decoupled from operation of the instruction execution portion of the pipeline. This allows, for example, the instruction execution portion of the pipeline to continue operating while the instruction fetch portion is halted.
Instep606, instructions residing within the instruction fetch portion of the processor pipeline are invalidated. In one embodiment, all instructions residing in the instruction fetch portion are invalidated. In another embodiment, only instructions belonging to the same program thread as the control transfer instruction associated with the misprediction are invalidated. Instructions can be invalidated, for example, by changing a validity bit associated with each instruction to show the instruction is no longer a valid instruction. Instruction fetching is preferably restarted at a redirected address as soon as instructions residing within the instruction fetch portion of the processor pipeline have been invalidated.
Instep608, once the control transfer instruction associated with the misprediction has reached a selected stage within the instruction execution portion of the processor pipeline, instructions residing within the instruction execution portion of the processor pipeline are invalidated. In one embodiment, all instructions residing in the instruction execution portion are invalidated. In another embodiment, only instructions belonging to the same program thread as the control transfer instruction associated with the misprediction are invalidated. In one embodiment, the selected stage is a stage prior to instruction graduation.
Instep610, movement of instructions from the instruction fetch portion of the processor pipeline to the instruction execution portion of the processor pipeline is restarted.
FIGS. 7A and 7B depict a flowchart of amethod700 for controlling program counter redirection in an out-of-order processor pipeline according to an embodiment of the present invention. As shown inFIGS. 7A and 7B,method700 includes sixsteps702,704,706,708,710, and712.Method700 is capable of being implemented, for example, usingprocessor core102 described above, but it is not limited to being implemented inprocessor core102.
Instep702, a first instruction identification tag is associated with a first control transfer instruction and a second instruction identification tag is associated with a second control transfer instruction. These tags are associated at a first stage in an out-of-order processor pipeline. The first instruction identification tag and the second instruction identification tag identify a program order of the first control transfer instruction relative to the second control transfer instruction.
Instep704, a first conditional program counter redirect misprediction is detected during execution of the first control transfer instruction. The detection occurs in a second stage of the processor pipeline.
Instep706, the first instruction identification tag is stored in a register of a mispredict instruction identification checker.
Instep708, a second conditional program counter redirect misprediction is detected during execution of the second control transfer instruction. This second misdirection is detected in the second stage of the processor pipeline.
Instep710, the second instruction identification tag is compared to the first instruction identification tag stored in the register of the mispredict instruction identification checker.
Instep712, if the comparison of the second instruction identification tag to the first instruction identification tag indicates that the second control transfer instruction executed out of program order relative to the first control transfer instruction, a program counter redirect enable signal is generated. The program counter redirect enable signal causes an instruction fetch portion of the processor pipeline to fetch instructions according to the second control transfer instruction.
FIG. 8 is a diagram of amicroprocessor800 according to another embodiment of the present invention.Microprocessor800 includes aprocessor core802,instruction memory104, and aregister file106.Processor core802 has a pipeline that includes an instruction fetchportion808 and aninstruction execution portion810.
As shown inFIG. 8, instruction fetchportion808 ofprocessor core802 includes a program counter (PC)selector112, anoptional recoder114, and aninstruction buffer116. Instruction fetchportion808 also includes abranch predictor118 and a portion ofpipeline control logic820.
Instruction execution portion810 ofprocessor core802 includes a decoder/renamer122, twobuffers124aand124b, two arithmetic logic units (ALUs)126aand126b, and twocompletion buffers128aand128b. Ascheduler830 dynamically schedules instructions for execution by one of the two parallel pipelines ofinstruction execution portion810 ofprocessor core802. Also included ininstruction execution portion810 are at least oneinstruction identification generator132, at least one mispredictinstruction identification checker134 that includes at least oneregister136, and a portion ofpipeline control logic820.
Microprocessor800 operates similarly tomicroprocessor100 except thatmicroprocessor800 includes two parallel instruction execution pipelines. These two instruction execution pipelines can be similar, or they can be specialized to execute selected instructions. In one embodiment, the pipeline represented bybuffer124a,arithmetic logic unit126a, and completion buffer128ais used to execute control transfer instructions. This eliminates the need for having more than one mispredictinstruction identification checker134.
While the foregoing is a complete description of exemplary embodiments of the invention, it should be evident that various modifications, alternatives, and equivalents may be made and used. It is also to be appreciated that the detailed description of the present invention provided herein, and not the summary and abstract sections, is intended to be used to interpret the claims. The summary and abstract sections may set forth one or more but not all exemplary embodiments of the present invention as contemplated by the inventors.
For example, in addition to implementations using hardware (e.g., within or coupled to a Central Processing Unit (“CPU”), microprocessor, microcontroller, digital signal processor, processor core, System on Chip (“SOC”), or any other programmable or electronic device), implementations may also be embodied in software (e.g., computer readable code, program code, instructions and/or data disposed in any form, such as source, object or machine language) disposed, for example, in a computer usable (e.g., readable) medium configured to store the software. Such software can enable, for example, the function, fabrication, modeling, simulation, description, and/or testing of the apparatus and methods described herein. For example, this can be accomplished through the use of general programming languages (e.g., C, C++), GDSII databases, hardware description languages (HDL) including Verilog HDL, VHDL, and so on, or other available programs, databases, and/or circuit (i.e., schematic) capture tools. Such software can be disposed in any known computer usable medium including semiconductor, magnetic disk, optical disk (e.g., CD-ROM, DVD-ROM, etc.) and as a computer data signal embodied in a computer usable (e.g., readable) transmission medium (e.g., carrier wave or any other medium including digital, optical, or analog-based medium). As such, the software can be transmitted over communication networks including the Internet and intranets.
It is understood that the apparatus and method embodiments described herein may be included in a semiconductor intellectual property core, such as a microprocessor core (e.g., embodied in HDL) and transformed to hardware in the production of integrated circuits. Additionally, the apparatus and methods described herein may be embodied as a combination of hardware and software. Thus, the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalence.