Code Transformation Metadata¶
Overview¶
LLVM transformation passes can be controlled by attaching metadata tothe code to transform. By default, transformation passes use heuristicsto determine whether or not to perform transformations, and when doingso, other details of how the transformations are applied (e.g., whichvectorization factor to select).Unless the optimizer is otherwise directed, transformations are appliedconservatively. This conservatism generally allows the optimizer toavoid unprofitable transformations, but in practice, this results in theoptimizer not applying transformations that would be highly profitable.
Frontends can give additional hints to LLVM passes on whichtransformations they should apply. This can be additional knowledge thatcannot be derived from the emitted IR, or directives passed from theuser/programmer. OpenMP pragmas are an example of the latter.
If any such metadata is dropped from the program, the code’s semanticsmust not change.
Metadata on Loops¶
Attributes can be attached to loops as described in‘llvm.loop’.Attributes can describe properties of the loop, disable transformations,force specific transformations and set transformation options.
Because metadata nodes are immutable (with the exception ofMDNode::replaceOperandWith
which is dangerous to use on uniquedmetadata), in order to add or remove a loop attributes, a newMDNode
must be created and assigned as the newllvm.loop
metadata. Anyconnection between the oldMDNode
and the loop is lost. Thellvm.loop
node is also used as LoopID (Loop::getLoopID()
), i.e.the loop effectively gets a new identifier. For instance,llvm.mem.parallel_loop_access
references the LoopID. Therefore, ifthe parallel access property is to be preserved after adding/removingloop attributes, anyllvm.mem.parallel_loop_access
reference must beupdated to the new LoopID.
Transformation Metadata Structure¶
Some attributes describe code transformations (unrolling, vectorizing,loop distribution, etc.). They can either be a hint to the optimizerthat a transformation might be beneficial, instruction to use a specificoption, , or convey a specific request from the user (such as#pragmaclangloop
or#pragmaompsimd
).
If a transformation is forced but cannot be carried-out for any reason,an optimization-missed warning must be emitted. Semantic informationsuch as a transformation being safe (e.g.llvm.mem.parallel_loop_access
) can be unused by the optimizerwithout generating a warning.
Unless explicitly disabled, any optimization pass may heuristicallydetermine whether a transformation is beneficial and apply it. Ifmetadata for another transformation was specified, applying a differenttransformation before it might be inadvertent due to being applied on adifferent loop or the loop not existing anymore. To avoid having toexplicitly disable an unknown number of passes, the attributellvm.loop.disable_nonforced
disables all optional, high-level,restructuring transformations.
The following example avoids the loop being altered before beingvectorized, for instance being unrolled.
bri1%exitcond,label%for.exit,label%for.header,!llvm.loop!0...!0=distinct!{!0,!1,!2}!1=!{!"llvm.loop.vectorize.enable",i1true}!2=!{!"llvm.loop.disable_nonforced"}
After a transformation is applied, follow-up attributes are set on thetransformed and/or new loop(s). This allows additional attributesincluding followup-transformations to be specified. Specifying multipletransformations in the same metadata node is possible for compatibilityreasons, but their execution order is undefined. For instance, whenllvm.loop.vectorize.enable
andllvm.loop.unroll.enable
arespecified at the same time, unrolling may occur either before or aftervectorization.
As an example, the following instructs a loop to be vectorized and onlythen unrolled.
!0=distinct!{!0,!1,!2,!3}!1=!{!"llvm.loop.vectorize.enable",i1true}!2=!{!"llvm.loop.disable_nonforced"}!3=!{!"llvm.loop.vectorize.followup_vectorized",!{"llvm.loop.unroll.enable"}}
If, and only if, no followup is specified, the pass may add attributes itself.For instance, the vectorizer adds allvm.loop.isvectorized
attribute andall attributes from the original loop excluding its loop vectorizerattributes. To avoid this, an empty followup attribute can be used, e.g.
!3=!{!"llvm.loop.vectorize.followup_vectorized"}
The followup attributes of a transformation that cannot be applied willnever be added to a loop and are therefore effectively ignored. This meansthat any followup-transformation in such attributes requires that itsprior transformations are applied before the followup-transformation.The user should receive a warning about the first transformation in thetransformation chain that could not be applied if it a forcedtransformation. All following transformations are skipped.
Pass-Specific Transformation Metadata¶
Transformation options are specific to each transformation. In thefollowing, we present the model for each LLVM loop optimization pass andthe metadata to influence them.
Loop Vectorization and Interleaving¶
Loop vectorization and interleaving is interpreted as a singletransformation. It is interpreted as forced if!{"llvm.loop.vectorize.enable",i1true}
is set.
Assuming the pre-vectorization loop is
for(inti=0;i<n;i+=1)// original loopStmt(i);
then the code after vectorization will be approximately (assuming anSIMD width of 4):
inti=0;if(rtc){for(;i+3<n;i+=4)// vectorized/interleaved loopStmt(i:i+3);}for(;i<n;i+=1)// epilogue loopStmt(i);
wherertc
is a generated runtime check.
llvm.loop.vectorize.followup_vectorized
will set the attributes forthe vectorized loop. If not specified,llvm.loop.isvectorized
iscombined with the original loop’s attributes to avoid it beingvectorized multiple times.
llvm.loop.vectorize.followup_epilogue
will set the attributes forthe remainder loop. If not specified, it will have the original loop’sattributes combined withllvm.loop.isvectorized
andllvm.loop.unroll.runtime.disable
(unless the original loop alreadyhas unroll metadata).
The attributes specified byllvm.loop.vectorize.followup_all
areadded to both loops.
When using a follow-up attribute, it replaces any automatically deducedattributes for the generated loop in question. Therefore it isrecommended to addllvm.loop.isvectorized
tollvm.loop.vectorize.followup_all
which avoids that the loopvectorizer tries to optimize the loops again.
Loop Unrolling¶
Unrolling is interpreted as forced any!{!"llvm.loop.unroll.enable"}
metadata or option (llvm.loop.unroll.count
,llvm.loop.unroll.full
)is present. Unrolling can be full unrolling, partial unrolling of a loopwith constant trip count or runtime unrolling of a loop with a tripcount unknown at compile-time.
If the loop has been unrolled fully, there is no followup-loop. Forpartial/runtime unrolling, the original loop of
for(inti=0;i<n;i+=1)// original loopStmt(i);
is transformed into (using an unroll factor of 4):
inti=0;for(;i+3<n;i+=4){// unrolled loopStmt(i);Stmt(i+1);Stmt(i+2);Stmt(i+3);}for(;i<n;i+=1)// remainder loopStmt(i);
llvm.loop.unroll.followup_unrolled
will set the loop attributes ofthe unrolled loop. If not specified, the attributes of the original loopwithout thellvm.loop.unroll.*
attributes are copied andllvm.loop.unroll.disable
added to it.
llvm.loop.unroll.followup_remainder
defines the attributes of theremainder loop. If not specified the remainder loop will have noattributes. The remainder loop might not be present due to being fullyunrolled in which case this attribute has no effect.
Attributes defined inllvm.loop.unroll.followup_all
are added to theunrolled and remainder loops.
To avoid that the partially unrolled loop is unrolled again, it isrecommended to addllvm.loop.unroll.disable
tollvm.loop.unroll.followup_all
. If no follow-up attribute specifiedfor a generated loop, it is added automatically.
Unroll-And-Jam¶
Unroll-and-jam uses the following transformation model (here with anunroll factor if 2). Currently, it does not support a fallback versionwhen the transformation is unsafe.
for(inti=0;i<n;i+=1){// original outer loopFore(i);for(intj=0;j<m;j+=1)// original inner loopSubLoop(i,j);Aft(i);}
inti=0;for(;i+1<n;i+=2){// unrolled outer loopFore(i);Fore(i+1);for(intj=0;j<m;j+=1){// unrolled inner loopSubLoop(i,j);SubLoop(i+1,j);}Aft(i);Aft(i+1);}for(;i<n;i+=1){// remainder outer loopFore(i);for(intj=0;j<m;j+=1)// remainder inner loopSubLoop(i,j);Aft(i);}
llvm.loop.unroll_and_jam.followup_outer
will set the loop attributesof the unrolled outer loop. If not specified, the attributes of theoriginal outer loop without thellvm.loop.unroll.*
attributes arecopied andllvm.loop.unroll.disable
added to it.
llvm.loop.unroll_and_jam.followup_inner
will set the loop attributesof the unrolled inner loop. If not specified, the attributes of theoriginal inner loop are used unchanged.
llvm.loop.unroll_and_jam.followup_remainder_outer
sets the loopattributes of the outer remainder loop. If not specified it will nothave any attributes. The remainder loop might not be present due tobeing fully unrolled.
llvm.loop.unroll_and_jam.followup_remainder_inner
sets the loopattributes of the inner remainder loop. If not specified it will havethe attributes of the original inner loop. It the outer remainder loopis unrolled, the inner remainder loop might be present multiple times.
Attributes defined inllvm.loop.unroll_and_jam.followup_all
areadded to all of the aforementioned output loops.
To avoid that the unrolled loop is unrolled again, it isrecommended to addllvm.loop.unroll.disable
tollvm.loop.unroll_and_jam.followup_all
. It suppresses unroll-and-jamas well as an additional inner loop unrolling. If no follow-upattribute specified for a generated loop, it is added automatically.
Loop Distribution¶
The LoopDistribution pass tries to separate vectorizable parts of a loopfrom the non-vectorizable part (which otherwise would make the entireloop non-vectorizable). Conceptually, it transforms a loop such as
for(inti=1;i<n;i+=1){// original loopA[i]=i;B[i]=2+B[i];C[i]=3+C[i-1];}
into the following code:
if(rtc){for(inti=1;i<n;i+=1)// coincident loopA[i]=i;for(inti=1;i<n;i+=1)// coincident loopB[i]=2+B[i];for(inti=1;i<n;i+=1)// sequential loopC[i]=3+C[i-1];}else{for(inti=1;i<n;i+=1){// fallback loopA[i]=i;B[i]=2+B[i];C[i]=3+C[i-1];}}
wherertc
is a generated runtime check.
llvm.loop.distribute.followup_coincident
sets the loop attributes ofall loops without loop-carried dependencies (i.e. vectorizable loops).There might be more than one such loops. If not defined, the loops willinherit the original loop’s attributes.
llvm.loop.distribute.followup_sequential
sets the loop attributes of theloop with potentially unsafe dependencies. There should be at most onesuch loop. If not defined, the loop will inherit the original loop’sattributes.
llvm.loop.distribute.followup_fallback
defines the loop attributesfor the fallback loop, which is a copy of the original loop for whenloop versioning is required. If undefined, the fallback loop inheritsall attributes from the original loop.
Attributes defined inllvm.loop.distribute.followup_all
are added toall of the aforementioned output loops.
It is recommended to addllvm.loop.disable_nonforced
tollvm.loop.distribute.followup_fallback
. This avoids that thefallback version (which is likely never executed) is further optimizedwhich would increase the code size.
Versioning LICM¶
The pass hoists code out of loops that are only loop-invariant whendynamic conditions apply. For instance, it transforms the loop
for(inti=0;i<n;i+=1)// original loopA[i]=B[0];
into:
if(rtc){autob=B[0];for(inti=0;i<n;i+=1)// versioned loopA[i]=b;}else{for(inti=0;i<n;i+=1)// unversioned loopA[i]=B[0];}
The runtime condition (rtc
) checks that the arrayA
and theelementB[0] do not alias.
Currently, this transformation does not support followup-attributes.
Loop Interchange¶
Currently, theLoopInterchange
pass does not use any metadata.
Ambiguous Transformation Order¶
If there multiple transformations defined, the order in which they areexecuted depends on the order in LLVM’s pass pipeline, which is subjectto change. The default optimization pipeline (anything higher than-O0
) has the following order.
When using the legacy pass manager:
LoopInterchange (if enabled)
SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
VersioningLICM (if enabled)
LoopDistribute
LoopVectorizer
LoopUnrollAndJam (if enabled)
LoopUnroll (partial and runtime unrolling)
When using the legacy pass manager with LTO:
LoopInterchange (if enabled)
SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
LoopVectorizer
LoopUnroll (partial and runtime unrolling)
When using the new pass manager:
SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
LoopDistribute
LoopVectorizer
LoopUnrollAndJam (if enabled)
LoopUnroll (partial and runtime unrolling)
Leftover Transformations¶
Forced transformations that have not been applied after the lasttransformation pass should be reported to the user. The transformationpasses themselves cannot be responsible for this reporting because theymight not be in the pipeline, there might be multiple passes able toapply a transformation (e.g.LoopInterchange
and Polly) or atransformation attribute may be ‘hidden’ inside another passes’ followupattribute.
The pass-transform-warning
(WarnMissedTransformationsPass
)emits such warnings. It should be placed after the last transformationpass.
The current pass pipeline has a fixed order in which transformationspasses are executed. A transformation can be in the followup of a passthat is executed later and thus leftover. For instance, a loop nestcannot be distributed and then interchanged with the current passpipeline. The loop distribution will execute, but there is no loopinterchange pass following such that any loop interchange metadata willbe ignored. The-transform-warning
should emit a warning in thiscase.
Future versions of LLVM may fix this by executing transformations usinga dynamic ordering.