ShadowCallStack¶
Introduction¶
ShadowCallStack is an instrumentation pass, currently only implemented foraarch64, that protects programs against return address overwrites(e.g. stack buffer overflows.) It works by saving a function’s return addressto a separately allocated ‘shadow call stack’ in the function prolog innon-leaf functions and loading the return address from the shadow call stackin the function epilog. The return address is also stored on the regular stackfor compatibility with unwinders, but is otherwise unused.
The aarch64 implementation is considered production ready, andanimplementation of the runtime has been added to Android’s libc(bionic). An x86_64 implementation was evaluated using Chromium and was foundto have critical performance and security deficiencies–it was removed inLLVM 9.0. Details on the x86_64 implementation can be found in theClang 7.0.1 documentation.
Comparison¶
To optimize for memory consumption and cache locality, the shadow callstack stores only an array of return addresses. This is in contrast to otherschemes, likeSafeStack, that mirror the entire stack and trade-offconsuming more memory for shorter function prologs and epilogs with fewermemory accesses.
Return Flow Guard is a pure software implementation of shadow call stackson x86_64. Like the previous implementation of ShadowCallStack on x86_64, it isinherently racy due to the architecture’s use of the stack for calls andreturns.
IntelControl-flow Enforcement Technology (CET) is a proposed hardwareextension that would add native support to use a shadow stack to store/checkreturn addresses at call/return time. Being a hardware implementation, itwould not suffer from race conditions and would not incur the overhead offunction instrumentation, but it does require operating system support.
Compatibility¶
A runtime is not provided in compiler-rt so one must be provided by thecompiled application or the operating system. Integrating the runtime intothe operating system should be preferred since otherwise all thread creationand destruction would need to be intercepted by the application.
The instrumentation makes use of the platform registerx18
. On someplatforms,x18
is reserved, and on others, it is designated as a scratchregister. This generally means that any code that may run on the same threadas code compiled with ShadowCallStack must either target one of the platformswhose ABI reservesx18
(currently Android, Darwin, Fuchsia and Windows)or be compiled with the flag-ffixed-x18
. If absolutely necessary, codecompiled without-ffixed-x18
may be run on the same thread as code thatuses ShadowCallStack by saving the register value temporarily on the stack(example in Android) but this should be done with care since it risksleaking the shadow call stack address.
Because of the use of registerx18
, the ShadowCallStack feature isincompatible with any other feature that may usex18
. However, thereis no inherent reason why ShadowCallStack needs to use registerx18
specifically; in principle, a platform could choose to reserve and use anotherregister for ShadowCallStack, but this would be incompatible with the AAPCS64.
Special unwind information is required on functions that are compiledwith ShadowCallStack and that may be unwound, i.e. functions compiled with-fexceptions
(which is the default in C++). Some unwinders (such as thelibgcc 4.9 unwinder) do not understand this unwind info and will segfaultwhen encountering it. LLVM libunwind processes this unwind info correctly,however. This means that if exceptions are used together with ShadowCallStack,the program must use a compatible unwinder.
Security¶
ShadowCallStack is intended to be a stronger alternative to-fstack-protector
. It protects from non-linear overflows and arbitrarymemory writes to the return address slot.
The instrumentation makes use of thex18
register to reference the shadowcall stack, meaning that references to the shadow call stack do not haveto be stored in memory. This makes it possible to implement a runtime thatavoids exposing the address of the shadow call stack to attackers that canread arbitrary memory. However, attackers could still try to exploit sidechannels exposed by the operating system[1][2] or processor[3]to discover the address of the shadow call stack.
Unless care is taken when allocating the shadow call stack, it may bepossible for an attacker to guess its address using the addresses ofother allocations. Therefore, the address should be chosen to make thisdifficult. One way to do this is to allocate a large guard region withoutread/write permissions, randomly select a small region within it to beused as the address of the shadow call stack and mark only that region asread/write. This also mitigates somewhat against processor side channels.The intent is that the Android runtimewill do this, but the platform willfirst need to bechanged to avoid usingsetrlimit(RLIMIT_AS)
to limitmemory allocations in certain processes, as this also limits the number ofguard regions that can be allocated.
The runtime will need the address of the shadow call stack in order todeallocate it when destroying the thread. If the entire program is compiledwith-ffixed-x18
, this is trivial: the address can be derived from thevalue stored inx18
(e.g. by masking out the lower bits). If a guardregion is used, the address of the start of the guard region could then bestored at the start of the shadow call stack itself. But if it is possiblefor code compiled without-ffixed-x18
to run on a thread managed by theruntime, which is the case on Android for example, the address must be storedsomewhere else instead. On Android we store the address of the start of theguard region in TLS and deallocate the entire guard region including theshadow call stack at thread exit. This is considered acceptable given thatthe address of the start of the guard region is already somewhat guessable.
One way in which the address of the shadow call stack could leak is in thejmp_buf
data structure used bysetjmp
andlongjmp
. The Androidruntimeavoids this by only storing the low bits ofx18
in thejmp_buf
, which requires the address of the shadow call stack to bealigned to its size.
The architecture’s call and return instructions (bl
andret
) operate ona register rather than the stack, which means that leaf functions are generallyprotected from return address overwrites even without ShadowCallStack.
Usage¶
To enable ShadowCallStack, just pass the-fsanitize=shadow-call-stack
flag to both compile and link command lines. On aarch64, you also need to pass-ffixed-x18
unless your target already reservesx18
.
Low-level API¶
__has_feature(shadow_call_stack)
¶
In some cases one may need to execute different code depending on whetherShadowCallStack is enabled. The macro__has_feature(shadow_call_stack)
canbe used for this purpose.
#if defined(__has_feature)# if __has_feature(shadow_call_stack)// code that builds only under ShadowCallStack# endif#endif
__attribute__((no_sanitize("shadow-call-stack")))
¶
Use__attribute__((no_sanitize("shadow-call-stack")))
on a functiondeclaration to specify that the shadow call stack instrumentation should not beapplied to that function, even if enabled globally.
Example¶
The following example code:
intfoo(){returnbar()+1;}
Generates the following aarch64 assembly when compiled with-O2
:
stp x29, x30, [sp, #-16]!mov x29, spbl baradd w0, w0, #1ldp x29, x30, [sp], #16ret
Adding-fsanitize=shadow-call-stack
would output the following assembly:
str x30, [x18], #8stp x29, x30, [sp, #-16]!mov x29, spbl baradd w0, w0, #1ldp x29, x30, [sp], #16ldr x30, [x18, #-8]!ret