Movatterモバイル変換

Previous:Delay Slot Scheduling, Up:Instruction Attributes [Contents][Index]

16.20.9 Specifying processor pipeline description ¶

To achieve better performance, most modern processors(super-pipelined, superscalarRISC, and VLIWprocessors) have manyfunctional units on which severalinstructions can be executed simultaneously. An instruction startsexecution if its issue conditions are satisfied. If not, theinstruction is stalled until its conditions are satisfied. Suchinterlock (pipeline) delay causes interruption of the fetchingof successor instructions (or demands nop instructions, e.g. for someMIPS processors).

There are two major kinds of interlock delays in modern processors.The first one is a data dependence delay determininginstructionlatency time. The instruction execution is not started until allsource data have been evaluated by prior instructions (there are morecomplex cases when the instruction execution starts even when the dataare not available but will be ready in given time after theinstruction execution start). Taking the data dependence delays intoaccount is simple. The data dependence (true, output, andanti-dependence) delay between two instructions is given by aconstant. In most cases this approach is adequate. The second kindof interlock delays is a reservation delay. The reservation delaymeans that two instructions under execution will be in need of sharedprocessors resources, i.e. buses, internal registers, and/orfunctional units, which are reserved for some time. Taking this kindof delay into account is complex especially for modern RISCprocessors.

The task of exploiting more processor parallelism is solved by aninstruction scheduler. For a better solution to this problem, theinstruction scheduler has to have an adequate description of theprocessor parallelism (orpipeline description). GCCmachine descriptions describe processor parallelism and functionalunit reservations for groups of instructions with the aid ofregular expressions.

The GCC instruction scheduler uses apipeline hazard recognizer tofigure out the possibility of the instruction issue by the processoron a given simulated processor cycle. The pipeline hazard recognizer isautomatically generated from the processor pipeline description. Thepipeline hazard recognizer generated from the machine descriptionis based on a deterministic finite state automaton (DFA):the instruction issue is possible if there is a transition from oneautomaton state to another one. This algorithm is very fast, andfurthermore, its speed is not dependent on processorcomplexity ⁷.

The rest of this section describes the directives that constitutean automaton-based processor pipeline description. The order ofthese constructions within the machine description file is notimportant.

The following optional construction describes names of automatagenerated and used for the pipeline hazards recognition. Sometimesthe generated finite state automaton used by the pipeline hazardrecognizer is large. If we use more than one automaton and bind functionalunits to the automata, the total size of the automata is usuallyless than the size of the single automaton. If there is no one suchconstruction, only one finite state automaton is generated.

(define_automatonautomata-names)

automata-names is a string giving names of the automata. Thenames are separated by commas. All the automata should have unique names.The automaton name is used in the constructionsdefine_cpu_unit anddefine_query_cpu_unit.

Each processor functional unit used in the description of instructionreservations should be described by the following construction.

(define_cpu_unitunit-names [automaton-name])

unit-names is a string giving the names of the functional unitsseparated by commas. Don’t use name ‘nothing’, it is reservedfor other goals.

automaton-name is a string giving the name of the automaton withwhich the unit is bound. The automaton should be described inconstructiondefine_automaton. You should giveautomaton-name, if there is a defined automaton.

The assignment of units to automata are constrained by the uses of theunits in insn reservations. The most important constraint is: if aunit reservation is present on a particular cycle of an alternativefor an insn reservation, then some unit from the same automaton mustbe present on the same cycle for the other alternatives of the insnreservation. The rest of the constraints are mentioned in thedescription of the subsequent constructions.

The following construction describes CPU functional units analogouslytodefine_cpu_unit. The reservation of such units can bequeried for an automaton state. The instruction scheduler neverqueries reservation of functional units for given automaton state. Soas a rule, you don’t need this construction. This construction couldbe used for future code generation goals (e.g. to generateVLIW insn templates).

(define_query_cpu_unitunit-names [automaton-name])

unit-names is a string giving names of the functional unitsseparated by commas.

automaton-name is a string giving the name of the automaton withwhich the unit is bound.

The following construction is the major one to describe pipelinecharacteristics of an instruction.

(define_insn_reservationinsn-namedefault_latencyconditionregexp)

default_latency is a number giving latency time of theinstruction. There is an important difference between the olddescription and the automaton based pipeline description. The latencytime is used for all dependencies when we use the old description. Inthe automaton based pipeline description, the given latency time is onlyused for true dependencies. The cost of anti-dependencies is alwayszero and the cost of output dependencies is the difference betweenlatency times of the producing and consuming insns (if the differenceis negative, the cost is considered to be zero). You can alwayschange the default costs for any description by using the target hookTARGET_SCHED_ADJUST_COST (seeAdjusting the Instruction Scheduler).

insn-name is a string giving the internal name of the insn. Theinternal names are used in constructionsdefine_bypass and inthe automaton description file generated for debugging. The internalname has nothing in common with the names indefine_insn. It is agood practice to use insn classes described in the processor manual.

condition defines what RTL insns are described by thisconstruction. You should remember that you will be in trouble ifcondition for two or more differentdefine_insn_reservation constructions is TRUE for an insn. Inthis case what reservation will be used for the insn is not defined.Such cases are not checked during generation of the pipeline hazardsrecognizer because in general recognizing that two conditions may havethe same value is quite difficult (especially if the conditionscontainsymbol_ref). It is also not checked during thepipeline hazard recognizer work because it would slow down therecognizer considerably.

regexp is a string describing the reservation of the cpu’s functionalunits by the instruction. The reservations are described by a regularexpression according to the following syntax:

       regexp = regexp "," oneof              | oneof       oneof = oneof "|" allof             | allof       allof = allof "+" repeat             | repeat       repeat = element "*" number              | element       element = cpu_function_unit_name               | reservation_name               | result_name               | "nothing"               | "(" regexp ")"

‘,’ is used for describing the start of the next cycle inthe reservation.
‘|’ is used for describing a reservation described by the firstregular expressionor a reservation described by the secondregular expressionor etc.
‘+’ is used for describing a reservation described by the firstregular expressionand a reservation described by thesecond regular expressionand etc.
‘*’ is used for convenience and simply means a sequence in whichthe regular expression are repeatednumber times with cycleadvancing (see ‘,’).
‘cpu_function_unit_name’ denotes reservation of the namedfunctional unit.
‘reservation_name’ — see description of construction‘define_reservation’.
‘nothing’ denotes no unit reservations.

Sometimes unit reservations for different insns contain common parts.In such case, you can simplify the pipeline description by describingthe common part by the following construction

(define_reservationreservation-nameregexp)

reservation-name is a string giving name ofregexp.Functional unit names and reservation names are in the same namespace. So the reservation names should be different from thefunctional unit names and cannot be the reserved name ‘nothing’.

The following construction is used to describe exceptions in thelatency time for given instruction pair. This is so called bypasses.

(define_bypassnumberout_insn_namesin_insn_names               [guard])

number defines when the result generated by the instructionsgiven in stringout_insn_names will be ready for theinstructions given in stringin_insn_names. Each of thesestrings is a comma-separated list of filename-style globs andthey refer to the names ofdefine_insn_reservations.For example:

(define_bypass 1 "cpu1_load_*, cpu1_store_*" "cpu1_load_*")

defines a bypass between instructions that start with‘cpu1_load_’ or ‘cpu1_store_’ and those that start with‘cpu1_load_’.

guard is an optional string giving the name of a C function whichdefines an additional guard for the bypass. The function will get thetwo insns as parameters. If the function returns zero the bypass willbe ignored for this case. The additional guard is necessary torecognize complicated bypasses, e.g. when the consumer is only an addressof insn ‘store’ (not a stored value).

If there are more one bypass with the same output and input insns, thechosen bypass is the first bypass with a guard in description whoseguard function returns nonzero. If there is no such bypass, thenbypass without the guard function is chosen.

The following five constructions are usually used to describeVLIW processors, or more precisely, to describe a placementof small instructions into VLIW instruction slots. Theycan be used for RISC processors, too.

(exclusion_setunit-namesunit-names)(presence_setunit-namespatterns)(final_presence_setunit-namespatterns)(absence_setunit-namespatterns)(final_absence_setunit-namespatterns)

unit-names is a string giving names of functional unitsseparated by commas.

patterns is a string giving patterns of functional unitsseparated by comma. Currently pattern is one unit or unitsseparated by white-spaces.

The first construction (‘exclusion_set’) means that eachfunctional unit in the first string cannot be reserved simultaneouslywith a unit whose name is in the second string and vice versa. Forexample, the construction is useful for describing processors(e.g. some SPARC processors) with a fully pipelined floating pointfunctional unit which can execute simultaneously only single floatingpoint insns or only double floating point insns.

The second construction (‘presence_set’) means that eachfunctional unit in the first string cannot be reserved unless atleast one of pattern of units whose names are in the second string isreserved. This is an asymmetric relation. For example, it is usefulfor description that VLIW ‘slot1’ is reserved after‘slot0’ reservation. We could describe it by the followingconstruction

(presence_set "slot1" "slot0")

Or ‘slot1’ is reserved only after ‘slot0’ and unit ‘b0’reservation. In this case we could write

(presence_set "slot1" "slot0 b0")

The third construction (‘final_presence_set’) is analogous to‘presence_set’. The difference between them is when checking isdone. When an instruction is issued in given automaton statereflecting all current and planned unit reservations, the automatonstate is changed. The first state is a source state, the second oneis a result state. Checking for ‘presence_set’ is done on thesource state reservation, checking for ‘final_presence_set’ isdone on the result reservation. This construction is useful todescribe a reservation which is actually two subsequent reservations.For example, if we use

(presence_set "slot1" "slot0")

the following insn will be never issued (because ‘slot1’ requires‘slot0’ which is absent in the source state).

(define_reservation "insn_and_nop" "slot0 + slot1")

but it can be issued if we use analogous ‘final_presence_set’.

The forth construction (‘absence_set’) means that each functionalunit in the first string can be reserved only if each pattern of unitswhose names are in the second string is not reserved. This is anasymmetric relation (actually ‘exclusion_set’ is analogous tothis one but it is symmetric). For example it might be useful in a VLIW description to say that ‘slot0’ cannot be reservedafter either ‘slot1’ or ‘slot2’ have been reserved. Thiscan be described as:

(absence_set "slot0" "slot1, slot2")

Or ‘slot2’ cannot be reserved if ‘slot0’ and unit ‘b0’are reserved or ‘slot1’ and unit ‘b1’ are reserved. Inthis case we could write

(absence_set "slot2" "slot0 b0, slot1 b1")

All functional units mentioned in a set should belong to the sameautomaton.

The last construction (‘final_absence_set’) is analogous to‘absence_set’ but checking is done on the result (state)reservation. See comments for ‘final_presence_set’.

You can control the generator of the pipeline hazard recognizer withthe following construction.

(automata_optionoptions)

options is a string giving options which affect the generatedcode. Currently there are the following options:

no-minimization makes no minimization of the automaton. This isonly worth to do when we are debugging the description and need tolook more accurately at reservations of states.
time means printing time statistics about the generation ofautomata.
stats means printing statistics about the generated automatasuch as the number of DFA states, NDFA states and arcs.
v means a generation of the file describing the result automata.The file has suffix ‘.dfa’ and can be used for the descriptionverification and debugging.
w means a generation of warning instead of error fornon-critical errors.
no-comb-vect prevents the automaton generator from generatingtwo data structures and comparing them for space efficiency. Usinga comb vector to represent transitions may be better, but it can bevery expensive to construct. This option is useful if the buildprocess spends an unacceptably long time in genautomata.
ndfa makes nondeterministic finite state automata. This affectsthe treatment of operator ‘|’ in the regular expressions. Theusual treatment of the operator is to try the first alternative and,if the reservation is not possible, the second alternative. Thenondeterministic treatment means trying all alternatives, some of themmay be rejected by reservations in the subsequent insns.
collapse-ndfa modifies the behavior of the generator whenproducing an automaton. An additional state transition to collapse anondeterministicNDFA state to a deterministic DFAstate is generated. It can be triggered by passingconst0_rtx tostate_transition. In such an automaton, cycle advance transitions areavailable only for these collapsed states. This option is useful forports that want to use thendfa option, but also want to usedefine_query_cpu_unit to assign units to insns issued in a cycle.
progress means output of a progress bar showing how many stateswere generated so far for automaton being processed. This is usefulduring debugging a DFA description. If you see too manygenerated states, you could interrupt the generator of the pipelinehazard recognizer and try to figure out a reason for generation of thehuge automaton.

As an example, consider a superscalar RISC machine which canissue three insns (two integer insns and one floating point insn) onthe cycle but can finish only two insns. To describe this, we definethe following functional units.

(define_cpu_unit "i0_pipeline, i1_pipeline, f_pipeline")(define_cpu_unit "port0, port1")

All simple integer insns can be executed in any integer pipeline andtheir result is ready in two cycles. The simple integer insns areissued into the first pipeline unless it is reserved, otherwise theyare issued into the second pipeline. Integer division andmultiplication insns can be executed only in the second integerpipeline and their results are ready correspondingly in 9 and 4cycles. The integer division is not pipelined, i.e. the subsequentinteger division insn cannot be issued until the current divisioninsn finished. Floating point insns are fully pipelined and theirresults are ready in 3 cycles. Where the result of a floating pointinsn is used by an integer insn, an additional delay of one cycle isincurred. To describe all of this we could specify

(define_cpu_unit "div")(define_insn_reservation "simple" 2 (eq_attr "type" "int")                         "(i0_pipeline | i1_pipeline), (port0 | port1)")(define_insn_reservation "mult" 4 (eq_attr "type" "mult")                         "i1_pipeline, nothing*2, (port0 | port1)")(define_insn_reservation "div" 9 (eq_attr "type" "div")                         "i1_pipeline, div*7, div + (port0 | port1)")(define_insn_reservation "float" 3 (eq_attr "type" "float")                         "f_pipeline, nothing, (port0 | port1))(define_bypass 4 "float" "simple,mult,div")

To simplify the description we could describe the following reservation

(define_reservation "finish" "port0|port1")

and use it in alldefine_insn_reservation as in the followingconstruction

(define_insn_reservation "simple" 2 (eq_attr "type" "int")                         "(i0_pipeline | i1_pipeline), finish")

Movatterモバイル変換

16.20.9 Specifying processor pipeline description ¶

Footnotes

(7)