How to Update Debug Info: A Guide for LLVM Pass Authors¶

Introduction ¶

Certain kinds of code transformations can inadvertently result in a loss ofdebug info, or worse, make debug info misrepresent the state of a program. Debuginfo availability is also essential for SamplePGO.

This document specifies how to correctly update debug info in various kinds ofcode transformations, and offers suggestions for how to create targeted debuginfo tests for arbitrary transformations.

For more on the philosophy behind LLVM debugging information, seeSource Level Debugging with LLVM.

Rules for updating debug locations ¶

When to preserve an instruction location ¶

A transformation should preserve the debug location of an instruction if theinstruction either remains in its basic block, or if its basic block is foldedinto a predecessor that branches unconditionally. The APIs to use areIRBuilder, orInstruction::setDebugLoc.

The purpose of this rule is to ensure that common block-local optimizationspreserve the ability to set breakpoints on source locations corresponding tothe instructions they touch. Debugging, crash logs, and SamplePGO accuracywould be severely impacted if that ability were lost.

Examples of transformations that should follow this rule include:

Instruction scheduling. Block-local instruction reordering should not dropsource locations, even though this may lead to jumpy single-steppingbehavior.
Simple jump threading. For example, if blockB1 unconditionally jumps toB2,and is its unique predecessor, instructions fromB2 can behoisted intoB1. Source locations fromB2 should be preserved.
Peephole optimizations that replace or expand an instruction, like(addXX)=>(shlX1). The location of theshl instruction should be the sameas the location of theadd instruction.
Tail duplication. For example, if blocksB1 andB2 bothunconditionally branch toB3 andB3 can be folded into itspredecessors, source locations fromB3 should be preserved.

Examples of transformations for which this ruledoes not apply include:

LICM. E.g., if an instruction is moved from the loop body to the preheader,the rule fordropping locations applies.

In addition to the rule above, a transformation should also preserve the debuglocation of an instruction that is moved between basic blocks, if thedestination block already contains an instruction with an identical debuglocation.

Examples of transformations that should follow this rule include:

Moving instructions between basic blocks. For example, if instructionI1inBB1 is moved beforeI2 inBB2, the source location ofI1can be preserved if it has the same source location asI2.

When to merge instruction locations ¶

A transformation should merge instruction locations if it replaces multipleinstructions with one or more new instructions,and the new instruction(s)produce the output of more than one of the original instructions. The API to useisInstruction::applyMergedLocation. For each new instruction I, its newlocation should be a merge of the locations of all instructions whose output isproduced by I. Typically, this includes any instruction being RAUWed by a newinstruction, and excludes any instruction that only produces an intermediatevalue used by the RAUWed instruction.

The purpose of this rule is to ensure that a) the single merged instructionhas a location with an accurate scope attached, and b) to prevent misleadingsingle-stepping (or breakpoint) behavior. Often, merged instructions are memoryaccesses which can trap: having an accurate scope attached greatly assists incrash triage by identifying the (possibly inlined) function where the badmemory access occurred.

To maintain distinct source locations for SamplePGO, it is often beneficial toretain an arbitrary but deterministic location instead of discarding line andcolumn information as part of merging. In particular, loss of locationinformation for calls inhibit optimizations such as indirect call promotion.This behavior can be optionally enabled until support for accuratelyrepresenting merged instructions in the line table is implemented.

Examples of transformations that should follow this rule include:

Hoisting identical instructions from all successors of a conditional branchor sinking those from all paths to a postdominating block. For example,merging identical loads/stores which occur on both sides of a CFG diamond(see theMergedLoadStoreMotion pass). For each group of identicalinstructions being hoisted/sunk, the merge of all their locations should beapplied to the merged instruction.
Merging identical loop-invariant stores (see the LICM utilityllvm::promoteLoopAccessesToScalars).
Scalar instructions being combined into a vector instruction, like(addA1,B1),(addA2,B2)=>(add(A1,A2),(B1,B2)). As the new vectoradd computes the result of both originaladd instructionssimultaneously, it should use a merge of the two locations. Similarly, ifprior optimizations have already produced vectors(A1,A2) and(B2,B1), then we might create a(shufflevector(1,0),(B2,B1))instruction to produce(B1,B2) for the vectoradd; in this case we’vecreated two instructions to replace the originaladds, so both newinstructions should use the merged location.

Examples of transformations for which this ruledoes not apply include:

Block-local peepholes which delete redundant instructions, like(sext(zexti8%xtoi16)toi32)=>(zexti8%xtoi32). The innerzext is modified but remains in its block, so the rule forpreserving locations should apply.
Peephole optimizations which combine multiple instructions together, like(add(mulAB)C)=>llvm.fma.f32(A,B,C). Note that the result of themul no longer appears in the program, while the result of theadd isnow produced by thefma, so theadd’s location should be used.
Converting an if-then-else CFG diamond into aselect. Preserving thedebug locations of speculated instructions can make it seem like a conditionis true when it’s not (or vice versa), which leads to a confusingsingle-stepping experience. The rule fordropping locations should apply here.
Hoisting/sinking that would make a location reachable when it previouslywasn’t. Consider hoisting two identical instructions with the same locationfrom first two cases of a switch that has three cases. Merging theirlocations would make the location from the first two cases reachable when thethird case is taken. The rule fordropping locations applies.

When to drop an instruction location ¶

A transformation should drop debug locations if the rules forpreserving andmerging debug locations do not apply. The API touse isInstruction::dropLocation().

The purpose of this rule is to prevent erratic or misleading single-steppingbehavior in situations in which an instruction has no clear, unambiguousrelationship to a source location.

To handle an instruction without a location, the DWARF generatordefaults to allowing the last-set location after a label to cascade forward, orto setting a line 0 location with viable scope information if no previouslocation is available.

See the discussion in the section aboutmerging locations for examples of when the rule fordropping locations applies.

Setting locations for new instructions ¶

Whenever a new instruction is created and there is no suitable location for thatinstruction, that instruction should be annotated accordingly. There are a setof specialDebugLoc values that can be set on an instruction to annotate thereason that it does not have a valid location. These are as follows:

DebugLoc::getCompilerGenerated(): This indicates that the instruction is acompiler-generated instruction, i.e. it is not associated with any user sourcecode.
DebugLoc::getDropped(): This indicates that the instruction hasintentionally had its source location removed, according to the rules fordropping locations; this is set automatically byInstruction::dropLocation().
DebugLoc::getUnknown(): This indicates that the instruction does not havea known or currently knowable source location, e.g. that it is infeasible todetermine the correct source location, or that the source location isambiguous in a way that LLVM cannot currently represent.
DebugLoc::getTemporary(): This is used for instructions that we don’texpect to be emitted (e.g.UnreachableInst), and so should not need avalid location; if we ever try to emit a temporary location into an object/asmfile, this indicates that something has gone wrong.

Where applicable, these should be used instead of leaving an instruction withoutan assigned location or explicitly setting the location asDebugLoc().Ordinarily these special locations are identical to an absent location, but LLVMbuilt with coverage-tracking(-DLLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING="COVERAGE") will keep track ofthese special locations in order to detect unintentionally-missing locations;for this reason, the most important rule is tonot apply any of these if itisn’t clear which, if any, is appropriate - an absent location can be detectedand fixed, while an incorrectly annotated instruction is much harder to detect.On the other hand, if any of these clearly apply, then they should be used toprevent false positives from being flagged up.

Rules for updating debug values ¶

Deleting an IR-level Instruction ¶

When anInstruction is deleted, its debug uses change toundef. This isa loss of debug info: the value of one or more source variables becomesunavailable, starting with the#dbg_value(undef,...). When there is noway to reconstitute the value of the lost instruction, this is the bestpossible outcome. However, it’s often possible to do better:

If the dying instruction can be RAUW’d, do so. TheValue::replaceAllUsesWith API transparently updates debug uses of thedying instruction to point to the replacement value.
If the dying instruction cannot be RAUW’d, callllvm::salvageDebugInfo onit. This makes a best-effort attempt to rewrite debug uses of the dyinginstruction by describing its effect as aDIExpression.
If one of theoperands of a dying instruction would become triviallydead, usellvm::replaceAllDbgUsesWith to rewrite the debug uses of thatoperand. Consider the following example function:

definei16@foo(i16%a){%b=sexti16%atoi32%c=andi32%b,15#dbg_value(i32%c,...)%d=trunci32%ctoi16reti16%d}

Now, here’s what happens after the unnecessary truncation instruction%d isreplaced with a simplified instruction:

definei16@foo(i16%a){#dbg_value(i32undef,...)%simplified=andi16%a,15reti16%simplified}

Note that after deleting%d, all uses of its operand%c becometrivially dead. The debug use which used to point to%c is nowundef,and debug info is needlessly lost.

To solve this problem, do:

llvm::replaceAllDbgUsesWith(%c,theSimplifiedAndInstruction,...)

This results in better debug info because the debug use of%c is preserved:

definei16@foo(i16%a){%simplified=andi16%a,15#dbg_value(i16%simplified,...)reti16%simplified}

You may have noticed that%simplified is narrower than%c: this is nota problem, becausellvm::replaceAllDbgUsesWith takes care of inserting thenecessary conversion operations into the DIExpressions of updated debug uses.

Deleting a MIR-level MachineInstr ¶

TODO

Rules for updating`DIAssignID` Attachments ¶

DIAssignID metadata attachments are used by Assignment Tracking, which iscurrently an experimental debug mode.

SeeDebug Info Assignment Tracking for how to update them and for more info onAssignment Tracking.

How to automatically convert tests into debug info tests ¶

Mutation testing for IR-level transformations ¶

An IR test case for a transformation can, in many cases, be automaticallymutated to test debug info handling within that transformation. This is asimple way to test for proper debug info handling.

The`debugify` utility pass ¶

Thedebugify testing utility is just a pair of passes:debugify andcheck-debugify.

The first applies synthetic debug information to every instruction of themodule, and the second checks that this DI is still available after anoptimization has occurred, reporting any errors/warnings while doing so.

The instructions are assigned sequentially increasing line locations, and areimmediately used by debug value records everywhere possible.

For example, here is a module before:

definevoid@f(i32*%x){entry:%x.addr=allocai32*,align8storei32*%x,i32**%x.addr,align8%0=loadi32*,i32**%x.addr,align8storei3210,i32*%0,align4retvoid}

and after runningopt-debugify:

definevoid@f(i32*%x)!dbg!6{entry:%x.addr=allocai32*,align8,!dbg!12#dbg_value(i32**%x.addr,!9,!DIExpression(),!12)storei32*%x,i32**%x.addr,align8,!dbg!13%0=loadi32*,i32**%x.addr,align8,!dbg!14#dbg_value(i32*%0,!11,!DIExpression(),!14)storei3210,i32*%0,align4,!dbg!15retvoid,!dbg!16}!llvm.dbg.cu=!{!0}!llvm.debugify=!{!3,!4}!llvm.module.flags=!{!5}!0=distinct!DICompileUnit(language:DW_LANG_C,file:!1,producer:"debugify",isOptimized:true,runtimeVersion:0,emissionKind:FullDebug,enums:!2)!1=!DIFile(filename:"debugify-sample.ll",directory:"/")!2=!{}!3=!{i325}!4=!{i322}!5=!{i322,!"Debug Info Version",i323}!6=distinct!DISubprogram(name:"f",linkageName:"f",scope:null,file:!1,line:1,type:!7,isLocal:false,isDefinition:true,scopeLine:1,isOptimized:true,unit:!0,retainedNodes:!8)!7=!DISubroutineType(types:!2)!8=!{!9,!11}!9=!DILocalVariable(name:"1",scope:!6,file:!1,line:1,type:!10)!10=!DIBasicType(name:"ty64",size:64,encoding:DW_ATE_unsigned)!11=!DILocalVariable(name:"2",scope:!6,file:!1,line:3,type:!10)!12=!DILocation(line:1,column:1,scope:!6)!13=!DILocation(line:2,column:1,scope:!6)!14=!DILocation(line:3,column:1,scope:!6)!15=!DILocation(line:4,column:1,scope:!6)!16=!DILocation(line:5,column:1,scope:!6)

Using`debugify`¶

A simple way to usedebugify is as follows:

$opt-debugify-pass-to-test-check-debugifysample.ll

This will inject synthetic DI tosample.ll run thepass-to-test andthen check for missing DI. The-check-debugify step can of course beomitted in favor of more customizable FileCheck directives.

Some other ways to run debugify are available:

# Same as the above example.$opt-enable-debugify-pass-to-testsample.ll# Suppresses verbose debugify output.$opt-enable-debugify-debugify-quiet-pass-to-testsample.ll# Prepend -debugify before and append -check-debugify -strip after# each pass on the pipeline (similar to -verify-each).$opt-debugify-each-O2sample.ll

In order forcheck-debugify to work, the DI must be coming fromdebugify. Thus, modules with existing DI will be skipped.

debugify can be used to test a backend, e.g:

$opt-debugify<sample.ll|llc-o-

There is also a MIR-level debugify pass that can be run before each backendpass, see:Mutation testing for MIR-level transformations.

`debugify` in regression tests ¶

The output of thedebugify pass must be stable enough to use in regressiontests. Changes to this pass are not allowed to break existing tests.

Note

Regression tests must be robust. Avoid hardcoding line/variable numbers incheck lines. In cases where this can’t be avoided (say, if a test wouldn’tbe precise enough), moving the test to its own file is preferred.

Using Coverage Tracking to remove false positives ¶

As describedabove, there are valid reasons forinstructions to not have source locations. Therefore, when detecting dropped ornot-generated source locations, it may be preferable to avoid detecting caseswhere the missing source location is intentional. For this, you can use the“coverage tracking” feature in LLVM to prevent these from appearing in thedebugify output. This is enabled in a build of LLVM by setting the CMakeflag-DLLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING=COVERAGE. When this has beenset, LLVM will enable runtime tracking ofDebugLoc annotations, allowingdebugify to ignoreinstructions that have an explicitly recorded reason given for not having asource location.

For triaging source location bugs detected withdebugify, you may find ithelpful to instead set the CMake flag to enable “origin tracking”,-DLLVM_ENABLE_DEBUGLOC_COVERAGE_TRACKING=COVERAGE_AND_ORIGIN. This flag addsmore detail todebugify’s output, by including one or more stacktraces withevery missing source location, capturing the point at which the empty sourcelocation was created, and every point at which it was copied to an instruction,making it trivial in most cases to find the origin of the underlying bug. Ifusing origin tracking, it is recommended to also build LLVM with debug infoenabled, so that the stacktrace can be accurately symbolized.

Note

The coverage tracking feature has been designed primarily for use with theoriginal debug info preservation mode ofdebugify, andso may not be reliable in other settings. When using this mode, thestacktraces produced by theCOVERAGE_AND_ORIGIN setting will be printedin an easy-to-read format as part of the reports generated by thellvm-original-di-preservation.py script.

Test original debug info preservation in optimizations ¶

In addition to automatically generating debug info, the checks provided bythedebugify utility pass can also be used to test the preservation ofpre-existing debug info metadata. It could be run as follows:

# Run the pass by checking original Debug Info preservation.$opt-verify-debuginfo-preserve-pass-to-testsample.ll# Check the preservation of original Debug Info after each pass.$opt-verify-each-debuginfo-preserve-O2sample.ll

Limit number of observed functions to speed up the analysis:

# Test up to 100 functions (per compile unit) per pass.$opt-verify-each-debuginfo-preserve-O2-debugify-func-limit=100sample.ll

Please do note that running-verify-each-debuginfo-preserve on big projectscould be heavily time consuming. Therefore, we suggest using-debugify-func-limit with a suitable limit number to prevent extremely longbuilds.

Furthermore, there is a way to export the issues that have been found intoa JSON file as follows:

$opt-verify-debuginfo-preserve-verify-di-preserve-export=sample.json-pass-to-testsample.ll

and then use thellvm/utils/llvm-original-di-preservation.py scriptto generate an HTML page with the issues reported in a more human readable formas follows:

$llvm-original-di-preservation.pysample.jsonsample.html

Testing of original debug info preservation can be invoked from front-end levelas follows:

# Test each pass.$clang-Xclang-fverify-debuginfo-preserve-g-O2sample.c# Test each pass and export the issues report into the JSON file.$clang-Xclang-fverify-debuginfo-preserve-Xclang-fverify-debuginfo-preserve-export=sample.json-g-O2sample.c

Please do note that there are some known false positives, for source locationsand debug record checking, so that will be addressed as a future work.

Mutation testing for MIR-level transformations ¶

A variant of thedebugify utility described inMutation testing for IR-level transformations can be usedfor MIR-level transformations as well: much like the IR-level pass,mir-debugify inserts sequentially increasing line locations to eachMachineInstr in aModule. And the MIR-levelmir-check-debugify issimilar to IR-levelcheck-debugify pass.

For example, here is a snippet before:

name:testbody:|bb.1(%ir-block.0):%0:_(s32)=IMPLICIT_DEF%1:_(s32)=IMPLICIT_DEF%2:_(s32)=G_CONSTANTi322%3:_(s32)=G_ADD%0,%2%4:_(s32)=G_SUB%3,%1

and after runningllc-run-pass=mir-debugify:

name:testbody:|bb.0(%ir-block.0):%0:_(s32)=IMPLICIT_DEFdebug-location!12DBG_VALUE%0(s32),$noreg,!9,!DIExpression(),debug-location!12%1:_(s32)=IMPLICIT_DEFdebug-location!13DBG_VALUE%1(s32),$noreg,!11,!DIExpression(),debug-location!13%2:_(s32)=G_CONSTANTi322,debug-location!14DBG_VALUE%2(s32),$noreg,!9,!DIExpression(),debug-location!14%3:_(s32)=G_ADD%0,%2,debug-location!DILocation(line:4,column:1,scope:!6)DBG_VALUE%3(s32),$noreg,!9,!DIExpression(),debug-location!DILocation(line:4,column:1,scope:!6)%4:_(s32)=G_SUB%3,%1,debug-location!DILocation(line:5,column:1,scope:!6)DBG_VALUE%4(s32),$noreg,!9,!DIExpression(),debug-location!DILocation(line:5,column:1,scope:!6)

By default,mir-debugify insertsDBG_VALUE instructionseverywhereit is legal to do so. In particular, every (non-PHI) machine instruction thatdefines a register must be followed by aDBG_VALUE use of that def. Ifan instruction does not define a register, but can be followed by a debug inst,MIRDebugify inserts aDBG_VALUE that references a constant. Insertion ofDBG_VALUE’s can be disabled by setting-debugify-level=locations.

To run MIRDebugify once, simply insertmir-debugify into yourllcinvocation, like:

# Before some other pass.$llc-run-pass=mir-debugify,other-pass...# After some other pass.$llc-run-pass=other-pass,mir-debugify...

To run MIRDebugify before each pass in a pipeline, use-debugify-and-strip-all-safe. This can be combined with-start-beforeand-start-after. For example:

$llc-debugify-and-strip-all-safe-run-pass=...<otherllcargs>$llc-debugify-and-strip-all-safe-O1<otherllcargs>

If you want to check it after each pass in a pipeline, use-debugify-check-and-strip-all-safe. This can also be combined with-start-before and-start-after. For example:

$llc-debugify-check-and-strip-all-safe-run-pass=...<otherllcargs>$llc-debugify-check-and-strip-all-safe-O1<otherllcargs>

To check all debug info from a test, usemir-check-debugify, like:

$llc-run-pass=mir-debugify,other-pass,mir-check-debugify

To strip out all debug info from a test, usemir-strip-debug, like:

$llc-run-pass=mir-debugify,other-pass,mir-strip-debug

It can be useful to combinemir-debugify,mir-check-debugify and/ormir-strip-debug to identify backend transformations which break inthe presence of debug info. For example, to run the AArch64 backend testswith all normal passes “sandwiched” in between MIRDebugify andMIRStripDebugify mutation passes, run:

$llvm-littest/CodeGen/AArch64-Dllc="llc -debugify-and-strip-all-safe"

Using LostDebugLocObserver ¶

TODO

Movatterモバイル変換

Navigation

Documentation

Getting Involved

Additional Links

This Page

Quick search

How to Update Debug Info: A Guide for LLVM Pass Authors¶

Introduction ¶

Rules for updating debug locations ¶

When to preserve an instruction location ¶

When to merge instruction locations ¶

When to drop an instruction location ¶

Setting locations for new instructions ¶

Rules for updating debug values ¶

Deleting an IR-level Instruction ¶

Deleting a MIR-level MachineInstr ¶

Rules for updating`DIAssignID` Attachments ¶

How to automatically convert tests into debug info tests ¶

Mutation testing for IR-level transformations ¶

The`debugify` utility pass ¶

Using`debugify`¶

`debugify` in regression tests ¶

Using Coverage Tracking to remove false positives ¶

Test original debug info preservation in optimizations ¶

Mutation testing for MIR-level transformations ¶

Using LostDebugLocObserver ¶

Navigation