Code Transformation Metadata

Overview

LLVM transformation passes can be controlled by attaching metadata tothe code to transform. By default, transformation passes use heuristicsto determine whether or not to perform transformations, and when doingso, other details of how the transformations are applied (e.g., whichvectorization factor to select).Unless the optimizer is otherwise directed, transformations are appliedconservatively. This conservatism generally allows the optimizer toavoid unprofitable transformations, but in practice, this results in theoptimizer not applying transformations that would be highly profitable.

Frontends can give additional hints to LLVM passes on whichtransformations they should apply. This can be additional knowledge thatcannot be derived from the emitted IR, or directives passed from theuser/programmer. OpenMP pragmas are an example of the latter.

If any such metadata is dropped from the program, the code’s semanticsmust not change.

Metadata on Loops

Attributes can be attached to loops as described in‘llvm.loop’.Attributes can describe properties of the loop, disable transformations,force specific transformations and set transformation options.

Because metadata nodes are immutable (with the exception ofMDNode::replaceOperandWith which is dangerous to use on uniquedmetadata), in order to add or remove a loop attributes, a newMDNodemust be created and assigned as the newllvm.loop metadata. Anyconnection between the oldMDNode and the loop is lost. Thellvm.loop node is also used as LoopID (Loop::getLoopID()), i.e.the loop effectively gets a new identifier. For instance,llvm.mem.parallel_loop_access references the LoopID. Therefore, ifthe parallel access property is to be preserved after adding/removingloop attributes, anyllvm.mem.parallel_loop_access reference must beupdated to the new LoopID.

Transformation Metadata Structure

Some attributes describe code transformations (unrolling, vectorizing,loop distribution, etc.). They can either be a hint to the optimizerthat a transformation might be beneficial, instruction to use a specificoption, , or convey a specific request from the user (such as#pragmaclangloop or#pragmaompsimd).

If a transformation is forced but cannot be carried-out for any reason,an optimization-missed warning must be emitted. Semantic informationsuch as a transformation being safe (e.g.llvm.mem.parallel_loop_access) can be unused by the optimizerwithout generating a warning.

Unless explicitly disabled, any optimization pass may heuristicallydetermine whether a transformation is beneficial and apply it. Ifmetadata for another transformation was specified, applying a differenttransformation before it might be inadvertent due to being applied on adifferent loop or the loop not existing anymore. To avoid having toexplicitly disable an unknown number of passes, the attributellvm.loop.disable_nonforced disables all optional, high-level,restructuring transformations.

The following example avoids the loop being altered before beingvectorized, for instance being unrolled.

bri1%exitcond,label%for.exit,label%for.header,!llvm.loop!0...!0=distinct!{!0,!1,!2}!1=!{!"llvm.loop.vectorize.enable",i1true}!2=!{!"llvm.loop.disable_nonforced"}

After a transformation is applied, follow-up attributes are set on thetransformed and/or new loop(s). This allows additional attributesincluding followup-transformations to be specified. Specifying multipletransformations in the same metadata node is possible for compatibilityreasons, but their execution order is undefined. For instance, whenllvm.loop.vectorize.enable andllvm.loop.unroll.enable arespecified at the same time, unrolling may occur either before or aftervectorization.

As an example, the following instructs a loop to be vectorized and onlythen unrolled.

!0=distinct!{!0,!1,!2,!3}!1=!{!"llvm.loop.vectorize.enable",i1true}!2=!{!"llvm.loop.disable_nonforced"}!3=!{!"llvm.loop.vectorize.followup_vectorized",!{"llvm.loop.unroll.enable"}}

If, and only if, no followup is specified, the pass may add attributes itself.For instance, the vectorizer adds allvm.loop.isvectorized attribute andall attributes from the original loop excluding its loop vectorizerattributes. To avoid this, an empty followup attribute can be used, e.g.

!3=!{!"llvm.loop.vectorize.followup_vectorized"}

The followup attributes of a transformation that cannot be applied willnever be added to a loop and are therefore effectively ignored. This meansthat any followup-transformation in such attributes requires that itsprior transformations are applied before the followup-transformation.The user should receive a warning about the first transformation in thetransformation chain that could not be applied if it a forcedtransformation. All following transformations are skipped.

Pass-Specific Transformation Metadata

Transformation options are specific to each transformation. In thefollowing, we present the model for each LLVM loop optimization pass andthe metadata to influence them.

Loop Vectorization and Interleaving

Loop vectorization and interleaving is interpreted as a singletransformation. It is interpreted as forced if!{"llvm.loop.vectorize.enable",i1true} is set.

Assuming the pre-vectorization loop is

for(inti=0;i<n;i+=1)// original loopStmt(i);

then the code after vectorization will be approximately (assuming anSIMD width of 4):

inti=0;if(rtc){for(;i+3<n;i+=4)// vectorized/interleaved loopStmt(i:i+3);}for(;i<n;i+=1)// epilogue loopStmt(i);

wherertc is a generated runtime check.

llvm.loop.vectorize.followup_vectorized will set the attributes forthe vectorized loop. If not specified,llvm.loop.isvectorized iscombined with the original loop’s attributes to avoid it beingvectorized multiple times.

llvm.loop.vectorize.followup_epilogue will set the attributes forthe remainder loop. If not specified, it will have the original loop’sattributes combined withllvm.loop.isvectorized andllvm.loop.unroll.runtime.disable (unless the original loop alreadyhas unroll metadata).

The attributes specified byllvm.loop.vectorize.followup_all areadded to both loops.

When using a follow-up attribute, it replaces any automatically deducedattributes for the generated loop in question. Therefore it isrecommended to addllvm.loop.isvectorized tollvm.loop.vectorize.followup_all which avoids that the loopvectorizer tries to optimize the loops again.

Loop Unrolling

Unrolling is interpreted as forced any!{!"llvm.loop.unroll.enable"}metadata or option (llvm.loop.unroll.count,llvm.loop.unroll.full)is present. Unrolling can be full unrolling, partial unrolling of a loopwith constant trip count or runtime unrolling of a loop with a tripcount unknown at compile-time.

If the loop has been unrolled fully, there is no followup-loop. Forpartial/runtime unrolling, the original loop of

for(inti=0;i<n;i+=1)// original loopStmt(i);

is transformed into (using an unroll factor of 4):

inti=0;for(;i+3<n;i+=4){// unrolled loopStmt(i);Stmt(i+1);Stmt(i+2);Stmt(i+3);}for(;i<n;i+=1)// remainder loopStmt(i);

llvm.loop.unroll.followup_unrolled will set the loop attributes ofthe unrolled loop. If not specified, the attributes of the original loopwithout thellvm.loop.unroll.* attributes are copied andllvm.loop.unroll.disable added to it.

llvm.loop.unroll.followup_remainder defines the attributes of theremainder loop. If not specified the remainder loop will have noattributes. The remainder loop might not be present due to being fullyunrolled in which case this attribute has no effect.

Attributes defined inllvm.loop.unroll.followup_all are added to theunrolled and remainder loops.

To avoid that the partially unrolled loop is unrolled again, it isrecommended to addllvm.loop.unroll.disable tollvm.loop.unroll.followup_all. If no follow-up attribute specifiedfor a generated loop, it is added automatically.

Unroll-And-Jam

Unroll-and-jam uses the following transformation model (here with anunroll factor if 2). Currently, it does not support a fallback versionwhen the transformation is unsafe.

for(inti=0;i<n;i+=1){// original outer loopFore(i);for(intj=0;j<m;j+=1)// original inner loopSubLoop(i,j);Aft(i);}
inti=0;for(;i+1<n;i+=2){// unrolled outer loopFore(i);Fore(i+1);for(intj=0;j<m;j+=1){// unrolled inner loopSubLoop(i,j);SubLoop(i+1,j);}Aft(i);Aft(i+1);}for(;i<n;i+=1){// remainder outer loopFore(i);for(intj=0;j<m;j+=1)// remainder inner loopSubLoop(i,j);Aft(i);}

llvm.loop.unroll_and_jam.followup_outer will set the loop attributesof the unrolled outer loop. If not specified, the attributes of theoriginal outer loop without thellvm.loop.unroll.* attributes arecopied andllvm.loop.unroll.disable added to it.

llvm.loop.unroll_and_jam.followup_inner will set the loop attributesof the unrolled inner loop. If not specified, the attributes of theoriginal inner loop are used unchanged.

llvm.loop.unroll_and_jam.followup_remainder_outer sets the loopattributes of the outer remainder loop. If not specified it will nothave any attributes. The remainder loop might not be present due tobeing fully unrolled.

llvm.loop.unroll_and_jam.followup_remainder_inner sets the loopattributes of the inner remainder loop. If not specified it will havethe attributes of the original inner loop. It the outer remainder loopis unrolled, the inner remainder loop might be present multiple times.

Attributes defined inllvm.loop.unroll_and_jam.followup_all areadded to all of the aforementioned output loops.

To avoid that the unrolled loop is unrolled again, it isrecommended to addllvm.loop.unroll.disable tollvm.loop.unroll_and_jam.followup_all. It suppresses unroll-and-jamas well as an additional inner loop unrolling. If no follow-upattribute specified for a generated loop, it is added automatically.

Loop Distribution

The LoopDistribution pass tries to separate vectorizable parts of a loopfrom the non-vectorizable part (which otherwise would make the entireloop non-vectorizable). Conceptually, it transforms a loop such as

for(inti=1;i<n;i+=1){// original loopA[i]=i;B[i]=2+B[i];C[i]=3+C[i-1];}

into the following code:

if(rtc){for(inti=1;i<n;i+=1)// coincident loopA[i]=i;for(inti=1;i<n;i+=1)// coincident loopB[i]=2+B[i];for(inti=1;i<n;i+=1)// sequential loopC[i]=3+C[i-1];}else{for(inti=1;i<n;i+=1){// fallback loopA[i]=i;B[i]=2+B[i];C[i]=3+C[i-1];}}

wherertc is a generated runtime check.

llvm.loop.distribute.followup_coincident sets the loop attributes ofall loops without loop-carried dependencies (i.e. vectorizable loops).There might be more than one such loops. If not defined, the loops willinherit the original loop’s attributes.

llvm.loop.distribute.followup_sequential sets the loop attributes of theloop with potentially unsafe dependencies. There should be at most onesuch loop. If not defined, the loop will inherit the original loop’sattributes.

llvm.loop.distribute.followup_fallback defines the loop attributesfor the fallback loop, which is a copy of the original loop for whenloop versioning is required. If undefined, the fallback loop inheritsall attributes from the original loop.

Attributes defined inllvm.loop.distribute.followup_all are added toall of the aforementioned output loops.

It is recommended to addllvm.loop.disable_nonforced tollvm.loop.distribute.followup_fallback. This avoids that thefallback version (which is likely never executed) is further optimizedwhich would increase the code size.

Versioning LICM

The pass hoists code out of loops that are only loop-invariant whendynamic conditions apply. For instance, it transforms the loop

for(inti=0;i<n;i+=1)// original loopA[i]=B[0];

into:

if(rtc){autob=B[0];for(inti=0;i<n;i+=1)// versioned loopA[i]=b;}else{for(inti=0;i<n;i+=1)// unversioned loopA[i]=B[0];}

The runtime condition (rtc) checks that the arrayA and theelementB[0] do not alias.

Currently, this transformation does not support followup-attributes.

Loop Interchange

Currently, theLoopInterchange pass does not use any metadata.

Ambiguous Transformation Order

If there multiple transformations defined, the order in which they areexecuted depends on the order in LLVM’s pass pipeline, which is subjectto change. The default optimization pipeline (anything higher than-O0) has the following order.

When using the legacy pass manager:

  • LoopInterchange (if enabled)

  • SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)

  • VersioningLICM (if enabled)

  • LoopDistribute

  • LoopVectorizer

  • LoopUnrollAndJam (if enabled)

  • LoopUnroll (partial and runtime unrolling)

When using the legacy pass manager with LTO:

  • LoopInterchange (if enabled)

  • SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)

  • LoopVectorizer

  • LoopUnroll (partial and runtime unrolling)

When using the new pass manager:

  • SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)

  • LoopDistribute

  • LoopVectorizer

  • LoopUnrollAndJam (if enabled)

  • LoopUnroll (partial and runtime unrolling)

Leftover Transformations

Forced transformations that have not been applied after the lasttransformation pass should be reported to the user. The transformationpasses themselves cannot be responsible for this reporting because theymight not be in the pipeline, there might be multiple passes able toapply a transformation (e.g.LoopInterchange and Polly) or atransformation attribute may be ‘hidden’ inside another passes’ followupattribute.

The pass-transform-warning (WarnMissedTransformationsPass)emits such warnings. It should be placed after the last transformationpass.

The current pass pipeline has a fixed order in which transformationspasses are executed. A transformation can be in the followup of a passthat is executed later and thus leftover. For instance, a loop nestcannot be distributed and then interchanged with the current passpipeline. The loop distribution will execute, but there is no loopinterchange pass following such that any loop interchange metadata willbe ignored. The-transform-warning should emit a warning in thiscase.

Future versions of LLVM may fix this by executing transformations usinga dynamic ordering.