7. ORC unwinder¶
7.1. Overview¶
The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which issimilar in concept to a DWARF unwinder. The difference is that theformat of the ORC data is much simpler than DWARF, which in turn allowsthe ORC unwinder to be much simpler and faster.
The ORC data consists of unwind tables which are generated by objtool.They contain out-of-band data which is used by the in-kernel ORCunwinder. Objtool generates the ORC data by first doing compile-timestack metadata validation (CONFIG_STACK_VALIDATION). After analyzingall the code paths of a .o file, it determines information about thestack state at each instruction address in the file and outputs thatinformation to the .orc_unwind and .orc_unwind_ip sections.
The per-object ORC sections are combined at link time and are sorted andpost-processed at boot time. The unwinder uses the resulting data tocorrelate instruction addresses with their stack states at run time.
7.2. ORC vs frame pointers¶
With frame pointers enabled, GCC adds instrumentation code to everyfunction in the kernel. The kernel’s .text size increases by about3.2%, resulting in a broad kernel-wide slowdown. Measurements by MelGorman[1] have shown a slowdown of 5-10% for some workloads.
In contrast, the ORC unwinder has no effect on text size or runtimeperformance, because the debuginfo is out of band. So if you disableframe pointers and enable the ORC unwinder, you get a nice performanceimprovement across the board, and still have reliable stack traces.
Ingo Molnar says:
“Note that it’s not just a performance improvement, but also aninstruction cache locality improvement: 3.2% .text savings almostdirectly transform into a similarly sized reduction in cachefootprint. That can transform to even higher speedups for workloadswhose cache locality is borderline.”
Another benefit of ORC compared to frame pointers is that it canreliably unwind across interrupts and exceptions. Frame pointer basedunwinds can sometimes skip the caller of the interrupted function, if itwas a leaf function or if the interrupt hit before the frame pointer wassaved.
The main disadvantage of the ORC unwinder compared to frame pointers isthat it needs more memory to store the ORC unwind tables: roughly 2-4MBdepending on the kernel config.
7.3. ORC vs DWARF¶
ORC debuginfo’s advantage over DWARF itself is that it’s much simpler.It gets rid of the complex DWARF CFI state machine and also gets rid ofthe tracking of unnecessary registers. This allows the unwinder to bemuch simpler, meaning fewer bugs, which is especially important formission critical oops code.
The simpler debuginfo format also enables the unwinder to be much fasterthan DWARF, which is important for perf and lockdep. In a basicperformance test by Jiri Slaby[2], the ORC unwinder was about 20xfaster than an out-of-tree DWARF unwinder. (Note: That measurement wastaken before some performance tweaks were added, which doubledperformance, so the speedup over DWARF may be closer to 40x.)
The ORC data format does have a few downsides compared to DWARF. ORCunwind tables take up ~50% more RAM (+1.3MB on an x86 defconfig kernel)than DWARF-based eh_frame tables.
Another potential downside is that, as GCC evolves, it’s conceivablethat the ORC data may end up beingtoo simple to describe the state ofthe stack for certain optimizations. But IMO this is unlikely becauseGCC saves the frame pointer for any unusual stack adjustments it does,so I suspect we’ll really only ever need to keep track of the stackpointer and the frame pointer between call frames. But even if we doend up having to track all the registers DWARF tracks, at least we willstill be able to control the format, e.g. no complex state machines.
7.4. ORC unwind table generation¶
The ORC data is generated by objtool. With the existing compile-timestack metadata validation feature, objtool already follows all codepaths, and so it already has all the information it needs to be able togenerate ORC data from scratch. So it’s an easy step to go from stackvalidation to ORC data generation.
It should be possible to instead generate the ORC data with a simpletool which converts DWARF to ORC data. However, such a solution wouldbe incomplete due to the kernel’s extensive use of asm, inline asm, andspecial sections like exception tables.
That could be rectified by manually annotating those special code pathsusing GNU assembler .cfi annotations in .S files, and homegrownannotations for inline asm in .c files. But asm annotations were triedin the past and were found to be unmaintainable. They were oftenincorrect/incomplete and made the code harder to read and keep updated.And based on looking at glibc code, annotating inline asm in .c filesmight be even worse.
Objtool still needs a few annotations, but only in code which doesunusual things to the stack like entry code. And even then, far fewerannotations are needed than what DWARF would need, so they’re much moremaintainable than DWARF CFI annotations.
So the advantages of using objtool to generate ORC data are that itgives more accurate debuginfo, with very few annotations. It alsoinsulates the kernel from toolchain bugs which can be very painful todeal with in the kernel since we often have to workaround issues inolder versions of the toolchain for years.
The downside is that the unwinder now becomes dependent on objtool’sability to reverse engineer GCC code flow. If GCC optimizations becometoo complicated for objtool to follow, the ORC data generation mightstop working or become incomplete. (It’s worth noting that livepatchalready has such a dependency on objtool’s ability to follow GCC codeflow.)
If newer versions of GCC come up with some optimizations which breakobjtool, we may need to revisit the current implementation. Somepossible solutions would be asking GCC to make the optimizations morepalatable, or having objtool use DWARF as an additional input, orcreating a GCC plugin to assist objtool with its analysis. But for now,objtool follows GCC code quite well.
7.5. Unwinder implementation details¶
Objtool generates the ORC data by integrating with the compile-timestack metadata validation feature, which is described in detail intools/objtool/Documentation/stack-validation.txt. After analyzing allthe code paths of a .o file, it creates an array of orc_entry structs,and a parallel array of instruction addresses associated with thosestructs, and writes them to the .orc_unwind and .orc_unwind_ip sectionsrespectively.
The ORC data is split into the two arrays for performance reasons, tomake the searchable part of the data (.orc_unwind_ip) more compact. Thearrays are sorted in parallel at boot time.
Performance is further improved by the use of a fast lookup table whichis created at runtime. The fast lookup table associates a given addresswith a range of indices for the .orc_unwind table, so that only a smallsubset of the table needs to be searched.
7.6. Etymology¶
Orcs, fearsome creatures of medieval folklore, are the Dwarves’ naturalenemies. Similarly, the ORC unwinder was created in opposition to thecomplexity and slowness of DWARF.
“Although Orcs rarely consider multiple solutions to a problem, they doexcel at getting things done because they are creatures of action, notthought.”[3] Similarly, unlike the esoteric DWARF unwinder, theveracious ORC unwinder wastes no time or siloconic effort decodingvariable-length zero-extended unsigned-integer byte-codedstate-machine-based debug information entries.
Similar to how Orcs frequently unravel the well-intentioned plans oftheir adversaries, the ORC unwinder frequently unravels stacks withbrutal, unyielding efficiency.
ORC stands for Oops Rewind Capability.
| [1] | https://lkml.kernel.org/r/20170602104048.jkkzssljsompjdwy@suse.de |
| [2] | https://lkml.kernel.org/r/d2ca5435-6386-29b8-db87-7f227c2b713a@suse.cz |
| [3] | http://dustin.wikidot.com/half-orcs-and-orcs |