google/sanitizersPublic

NotificationsYou must be signed in to change notification settings
Fork1.1k
Star12.3k

AddressSanitizerInHardware

Kostya Serebryany edited this pageApr 21, 2018 ·3 revisions

A more advanced scheme ("memory tagging") is described here:https://arxiv.org/pdf/1802.09517.pdf

Introduction

AddressSanitizer introduces~2x slowdown on average.This is very good for most kinds of testing, but still often prohibitively slowfor production use. We believe thatAddressSanitizer can beefficiently implemented in hardware which will reduce the CPU overhead to ~20%and thus allow wider use ofAddressSanitizer in production.On this page we try to explain why a hardware-assistedAddressSanitizer (HWASAN)could be faster and more flexible than a pure-software implementation (SWASAN).

Instrumentation

As explained in detail inAddressSanitizerAlgorithm,every memory access in the program is instrumented like this:

CheckAddressAndCrashIfBad(Addr, kSize);*Addr = ... # Original memory access, reads or writes kSize bytes

The size of the memory access in bytes (kSize)is a compile-time constant and is one of 1, 2, 4, 8, 16, 32, and 64.

The code forCheckAddressAndCrashIfBad(Addr, kSize) is inlined by the compiler module.Currently it looks like this (kSize < 8):

ShadowAddr = (Addr >> 3) + kOffset;Shadow = LoadByte(ShadowAddr);if (Shadow && Shadow <= (Addr & 7) + kSize - 1)  ReportBug(Addr);

and even simpler for kSize >= 8:

ShadowAddr = (Addr >> 3) + kOffset;Shadow = LoadNBytes(ShadowAddr, kSize / 8);if (Shadow)  ReportBug(Addr);

kOffset is a compile-time constant that depends on the particular platform.E.g. on Linux i386 kOffset=0x20000000, and on Linux x86_64 kOffset=0x7fff8000.The offset is different when usingAddressSanitizerForKernel.

This is how the x86_64 assembly look:

# long load8(long *a) { return *a; }0000000000000030 <load8>:  30:48 89 f8             mov    %rdi,%rax  33:48 c1 e8 03          shr    $0x3,%rax  37:80 b8 00 80 ff 7f 00 cmpb   $0x0,0x7fff8000(%rax)  3e:75 04                jne    44 <load8+0x14>  40:48 8b 07             mov    (%rdi),%rax   <<<<<< original load  43:c3                   retq     44:52                   push   %rdx  45:e8 00 00 00 00       callq  __asan_report_load8

# int  load4(int *a)  { return *a; }0000000000000000 <load4>:   0:48 89 f8             mov    %rdi,%rax   3:48 89 fa             mov    %rdi,%rdx   6:48 c1 e8 03          shr    $0x3,%rax   a:83 e2 07             and    $0x7,%edx   d:0f b6 80 00 80 ff 7f movzbl 0x7fff8000(%rax),%eax  14:83 c2 03             add    $0x3,%edx  17:38 c2                cmp    %al,%dl  19:7d 03                jge    1e <load4+0x1e>  1b:8b 07                mov    (%rdi),%eax    <<<<<< original load  1d:c3                   retq     1e:84 c0                test   %al,%al  20:74 f9                je     1b <load4+0x1b>  22:50                   push   %rax  23:e8 00 00 00 00       callq  __asan_report_load4

So, for an 8-byte memory accesses we add two arithmetic instructions, one load andone branch instructionon the main path. One or two instructions are added for failure handling(we could use any trap instruction instead of calling__asan_report_load8).For smaller memory accesses extra branch instruction and 3-4 arithmetic instructions are added.

We believe that all these computations could be done by a single new machine instruction,ASANCHK, that takesAddr as a single parameter and has 7 modifications for 7 differentaccesses sizes.This may sound too complex for one instruction, but Intel has recently introducedtwo even more complex instructions:BNDLDX/BNDSTX.

The instrumented code would look like this:

# long load8(long *a) { return *a; }0000000000000030 <load8>:asanchk.8 (%rdi)mov       (%rdi),%rax   <<<<<< original loadretq

# int  load4(int *a)  { return *a; }0000000000000000 <load4>:asanchk.4 (%rdi)mov       (%rdi),%eax    <<<<<< original loadretq

Configuration

Just like Intel's MPX has configuration registers BNDCFGU/BNDCFGS(one for user-space, one for kernel),HWASAN could use a pair of configuration registers (ASANCFGU for user space, ASANCFGS for kernel)to achieve greater flexibility.

Enabled bit

One bit in ASANCFGS could indicate whether HWASAN is enabled.If this bit is not set, ASANCHK is treated as a NOP.This would be a great advantage over SWASAN since SWASAN has no way to enable/disablechecking at run-time.

We could have two separate 'enabled' bits, one for loads and one for stores,so that a user can enable checking loads and stores independently.In this case the ASANCHK instruction opcode will need to contain aload-or-store bit:

# long load8(long *a) { return *a; }0000000000000030 <load8>:asanchk.l.8 (%rdi)mov         (%rdi),%rax   <<<<<< original loadretq

# void store8(long *a) { *a = 0x1234; }0000000000000030 <store8>:asanchk.s.8 (%rdi)movq        $0x1234,(%rdi)retq

kOffset

48 bits in ASANCHK could store kOffset. Yet another advantage over SWASAN providing greater flexibility.Currently, SWASAN has to reserve the shadow memory region at startup at a fixed address,which causes conflict with other sandbox-like environments. With kOffset in ASANCFGSthe tool will be able to choose an arbitrary region for the shadow.

ShadowScale

Two bits in ASANCFGS could indicate the ShadowScale(the number by which the Addr is right-shifted to compute the shadow).In SWASAN, ShadowScale is 3 which appears to be the best value performance-wisefor the pure software approach because it allowsto have shorter instrumentation for 8-byte accesses.However values 4, 5, and 6 allow to use less memory for shadow at the cost of increased minimalredzone. If the minimal redzone size is 16 bytes (which is the default for SWASAN),then ShadowScale=4 might be better for HWASAN.

This configuration option is less important and may be harder to implement than the first two.

Sources of speedup

We expect HWASAN to be faster than SWASAN for a few reasons.

Fewer instructions, fewer instruction cache problem. This is important for huge application,i.e. especially for production deployment.
ASANCHK should not require any extra general purpose registers, thus fewer spills/fills.
The arithmetic performed by ASANCHK is simple and can be implemented more efficientlythan a series of general instructions.
(probably, the major reason) accesses to the shadow and to the application memoryare interrelatedand the memory subsystem could start fetching the application address together with theshadow address.

Since the shadow memory is 1/8 of the application memory or less (i.e. <= 12.5%)and the arithmetic instructions cost next to nothing, ourguesstimateis that HWASAN may slowdown the application by 20% on average.

Other benefits

Similar to Intel MPX, ASANCHK instruction could use one of existing NOP opcodesso that the binaries can run on legacy hardware.

Using single instruction for bounds checking will simplify manual assembly debuggingand inspection (in SWASAN, the instrumentation instructions pollute the code too much)

It will become possible to always build shared libraries with instrumentation enabledand link them to all binaries (instrumented or not).This is a common issue with SWASAN where users want to use a single shared librarywith their instrumented binary and with a pre-built non-instrumented binary of e.g. Python interpreter.

Possible modifications

SWASAN implementation has been tuned for the pure software case.However HWASAN could be different because a hardware implementation may allow usto do more work per every memory access.

Pattern mode

Several memory error detection tools(DrMemory,LBC)rely on magic bytes (patterns) in the application memory to detect unaddressable accessesand use the shadow memory as a slow-and-rare fall-back.In SWASAN this will introduce much more instrumentation code.But for HWASAN this could be beneficial since the memory system will be stressed less.

Pattern mode is less suitable for detecting stack-buffer-overflow bugs because it requiresto poison 8 times more memory on every function entry and exit.

Byte-to-bit shadow

Alternative to theAddressSanitizer shadow encoding is a simple byte-to-bit shadow mapping whereone bit of shadow represent addressability of the corresponding byte in memory.Disadvantage of this mapping is that it requires 1/8 of address space to be used for shadow memory,while ShadowScale>=4 allows to use less address space.Besides, with byte-to-bit mapping it will be harder to findunaligned partially OOB accesses.

But byte-to-bit mappingmay be simpler to implement in hardware.

Automatically check loads/stores

An alternative to ASANCHK instruction is to perform the checks for all load/store instructionsif HWASAN is anabled (ASANCFGU).The benefit is that no instrumentation will be required and all legacy code will be automatically checkedif linked with theAddressSanitizer run-time library.The downside is that no compiler optimizations (eliminations of ASANCHK) will be possible.

Movatterモバイル変換

AddressSanitizerInHardware

Introduction

Instrumentation

Configuration

Enabled bit

kOffset

ShadowScale

Sources of speedup

Other benefits

Possible modifications

Pattern mode

Byte-to-bit shadow

Automatically check loads/stores

Links

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!