- Notifications
You must be signed in to change notification settings - Fork1.1k
AddressSanitizerInHardware
A more advanced scheme ("memory tagging") is described here:https://arxiv.org/pdf/1802.09517.pdf
AddressSanitizer introduces~2x slowdown on average.This is very good for most kinds of testing, but still often prohibitively slowfor production use. We believe thatAddressSanitizer can beefficiently implemented in hardware which will reduce the CPU overhead to ~20%and thus allow wider use ofAddressSanitizer in production.On this page we try to explain why a hardware-assistedAddressSanitizer (HWASAN)could be faster and more flexible than a pure-software implementation (SWASAN).
As explained in detail inAddressSanitizerAlgorithm,every memory access in the program is instrumented like this:
CheckAddressAndCrashIfBad(Addr, kSize);*Addr = ... # Original memory access, reads or writes kSize bytesThe size of the memory access in bytes (kSize)is a compile-time constant and is one of 1, 2, 4, 8, 16, 32, and 64.
The code forCheckAddressAndCrashIfBad(Addr, kSize) is inlined by the compiler module.Currently it looks like this (kSize < 8):
ShadowAddr = (Addr >> 3) + kOffset;Shadow = LoadByte(ShadowAddr);if (Shadow && Shadow <= (Addr & 7) + kSize - 1) ReportBug(Addr);and even simpler for kSize >= 8:
ShadowAddr = (Addr >> 3) + kOffset;Shadow = LoadNBytes(ShadowAddr, kSize / 8);if (Shadow) ReportBug(Addr);kOffset is a compile-time constant that depends on the particular platform.E.g. on Linux i386 kOffset=0x20000000, and on Linux x86_64 kOffset=0x7fff8000.The offset is different when usingAddressSanitizerForKernel.
This is how the x86_64 assembly look:
# long load8(long *a) { return *a; }0000000000000030 <load8>: 30:48 89 f8 mov %rdi,%rax 33:48 c1 e8 03 shr $0x3,%rax 37:80 b8 00 80 ff 7f 00 cmpb $0x0,0x7fff8000(%rax) 3e:75 04 jne 44 <load8+0x14> 40:48 8b 07 mov (%rdi),%rax <<<<<< original load 43:c3 retq 44:52 push %rdx 45:e8 00 00 00 00 callq __asan_report_load8# int load4(int *a) { return *a; }0000000000000000 <load4>: 0:48 89 f8 mov %rdi,%rax 3:48 89 fa mov %rdi,%rdx 6:48 c1 e8 03 shr $0x3,%rax a:83 e2 07 and $0x7,%edx d:0f b6 80 00 80 ff 7f movzbl 0x7fff8000(%rax),%eax 14:83 c2 03 add $0x3,%edx 17:38 c2 cmp %al,%dl 19:7d 03 jge 1e <load4+0x1e> 1b:8b 07 mov (%rdi),%eax <<<<<< original load 1d:c3 retq 1e:84 c0 test %al,%al 20:74 f9 je 1b <load4+0x1b> 22:50 push %rax 23:e8 00 00 00 00 callq __asan_report_load4So, for an 8-byte memory accesses we add two arithmetic instructions, one load andone branch instructionon the main path. One or two instructions are added for failure handling(we could use any trap instruction instead of calling__asan_report_load8).For smaller memory accesses extra branch instruction and 3-4 arithmetic instructions are added.
We believe that all these computations could be done by a single new machine instruction,ASANCHK, that takesAddr as a single parameter and has 7 modifications for 7 differentaccesses sizes.This may sound too complex for one instruction, but Intel has recently introducedtwo even more complex instructions:BNDLDX/BNDSTX.
The instrumented code would look like this:
# long load8(long *a) { return *a; }0000000000000030 <load8>:asanchk.8 (%rdi)mov (%rdi),%rax <<<<<< original loadretq# int load4(int *a) { return *a; }0000000000000000 <load4>:asanchk.4 (%rdi)mov (%rdi),%eax <<<<<< original loadretqJust like Intel's MPX has configuration registers BNDCFGU/BNDCFGS(one for user-space, one for kernel),HWASAN could use a pair of configuration registers (ASANCFGU for user space, ASANCFGS for kernel)to achieve greater flexibility.
One bit in ASANCFGS could indicate whether HWASAN is enabled.If this bit is not set, ASANCHK is treated as a NOP.This would be a great advantage over SWASAN since SWASAN has no way to enable/disablechecking at run-time.
We could have two separate 'enabled' bits, one for loads and one for stores,so that a user can enable checking loads and stores independently.In this case the ASANCHK instruction opcode will need to contain aload-or-store bit:
# long load8(long *a) { return *a; }0000000000000030 <load8>:asanchk.l.8 (%rdi)mov (%rdi),%rax <<<<<< original loadretq# void store8(long *a) { *a = 0x1234; }0000000000000030 <store8>:asanchk.s.8 (%rdi)movq $0x1234,(%rdi)retq48 bits in ASANCHK could store kOffset. Yet another advantage over SWASAN providing greater flexibility.Currently, SWASAN has to reserve the shadow memory region at startup at a fixed address,which causes conflict with other sandbox-like environments. With kOffset in ASANCFGSthe tool will be able to choose an arbitrary region for the shadow.
Two bits in ASANCFGS could indicate the ShadowScale(the number by which the Addr is right-shifted to compute the shadow).In SWASAN, ShadowScale is 3 which appears to be the best value performance-wisefor the pure software approach because it allowsto have shorter instrumentation for 8-byte accesses.However values 4, 5, and 6 allow to use less memory for shadow at the cost of increased minimalredzone. If the minimal redzone size is 16 bytes (which is the default for SWASAN),then ShadowScale=4 might be better for HWASAN.
This configuration option is less important and may be harder to implement than the first two.
We expect HWASAN to be faster than SWASAN for a few reasons.
Fewer instructions, fewer instruction cache problem. This is important for huge application,i.e. especially for production deployment.
ASANCHK should not require any extra general purpose registers, thus fewer spills/fills.
The arithmetic performed by ASANCHK is simple and can be implemented more efficientlythan a series of general instructions.
(probably, the major reason) accesses to the shadow and to the application memoryare interrelatedand the memory subsystem could start fetching the application address together with theshadow address.
Since the shadow memory is 1/8 of the application memory or less (i.e. <= 12.5%)and the arithmetic instructions cost next to nothing, ourguesstimateis that HWASAN may slowdown the application by 20% on average.
Similar to Intel MPX, ASANCHK instruction could use one of existing NOP opcodesso that the binaries can run on legacy hardware.
Using single instruction for bounds checking will simplify manual assembly debuggingand inspection (in SWASAN, the instrumentation instructions pollute the code too much)
It will become possible to always build shared libraries with instrumentation enabledand link them to all binaries (instrumented or not).This is a common issue with SWASAN where users want to use a single shared librarywith their instrumented binary and with a pre-built non-instrumented binary of e.g. Python interpreter.
SWASAN implementation has been tuned for the pure software case.However HWASAN could be different because a hardware implementation may allow usto do more work per every memory access.
Several memory error detection tools(DrMemory,LBC)rely on magic bytes (patterns) in the application memory to detect unaddressable accessesand use the shadow memory as a slow-and-rare fall-back.In SWASAN this will introduce much more instrumentation code.But for HWASAN this could be beneficial since the memory system will be stressed less.
Pattern mode is less suitable for detecting stack-buffer-overflow bugs because it requiresto poison 8 times more memory on every function entry and exit.
Alternative to theAddressSanitizer shadow encoding is a simple byte-to-bit shadow mapping whereone bit of shadow represent addressability of the corresponding byte in memory.Disadvantage of this mapping is that it requires 1/8 of address space to be used for shadow memory,while ShadowScale>=4 allows to use less address space.Besides, with byte-to-bit mapping it will be harder to findunaligned partially OOB accesses.
But byte-to-bit mappingmay be simpler to implement in hardware.
An alternative to ASANCHK instruction is to perform the checks for all load/store instructionsif HWASAN is anabled (ASANCFGU).The benefit is that no instrumentation will be required and all legacy code will be automatically checkedif linked with theAddressSanitizer run-time library.The downside is that no compiler optimizations (eliminations of ASANCHK) will be possible.
- The paperWatchdogLite: Hardware-Accelerated Compiler-Based Pointer Checking proposes a similar set of extra instructions, although they are much closer in spirit toMPX than toAddressSanitizer