Debugging with Sanitizers

Undefined Behaviour Sanitizer

Clang’sundefined behavior sanitizer (UBSan) is available for use withEmscripten. This makes it much easier to catch bugs in your code.

To use UBSan, simply pass-fsanitize=undefined toemcc orem++. Notethat you need to pass this at both the compile and link stages, as it affectsboth codegen and system libraries.

Catching Null Dereference

By default, with Emscripten, dereferencing a null pointer does not immediatelycause a segmentation fault, unlike traditional platforms, as 0 is just a normaladdress in a WebAssembly memory. 0 is also a normal location in aJavaScript Typed Array, which is an issue in the JavaScript alongside theWebAssembly (runtime support code, JS library methods,EM_ASM/EM_JS, etc.),and also for the compiled code if you build with-sWASM=0.

In builds withASSERTIONS enabled, a magic cookie stored at address 0 ischecked at the end of the program execution. That is, it will notify you ifanything wrote to that location while the program ran. This only detects writes,not reads, and does not help to find where the bad write actually is.

Consider the following program,null-assign.c:

intmain(void){int*a=0;*a=0;}

Without UBSan you get an error when the program exits:

$emccnull-assign.c$nodea.out.jsRuntime error: The application has corrupted its heap memory area (address zero)!

With UBSan, you get the exact line number where this happened:

$emcc-fsanitize=undefinednull-assign.c$nodea.out.jsnull-assign.c:3:5: runtime error: store to null pointer of type 'int'Runtime error: The application has corrupted its heap memory area (address zero)!

Consider the following program,null-read.c:

intmain(void){int*a=0,b;b=*a;}

Without UBSan there is no feedback:

$emccnull-read.c$nodea.out.js$

With UBSan, you get the exact line number where this happened:

$emcc-fsanitize=undefinednull-assign.c$nodea.out.jsnull-read.c:3:9: runtime error: load of null pointer of type 'int'

Minimal Runtime

UBSan’s runtime is non-trivial, and its use can unnecessarily increase theattack surface. For this reason, there is a minimal UBSan runtime that isdesigned for production uses.

The minimal runtime is supported by Emscripten. To use it, pass the flag-fsanitize-minimal-runtime in addition to your-fsanitize flag.

$emcc-fsanitize=null-fsanitize-minimal-runtimenull-read.c$nodea.out.jsubsan: type-mismatch$emcc-fsanitize=null-fsanitize-minimal-runtimenull-assign.c$nodea.out.jsubsan: type-mismatchRuntime error: The application has corrupted its heap memory area (address zero)!

Address Sanitizer

Clang’saddress sanitizer (ASan) is also available for use with Emscripten.This makes it much easier to catch buffer overflows, memory leaks, and otherrelated bugs in your code.

To use ASan, simply pass-fsanitize=address toemcc orem++. Aswith UBSan, you need to pass this at both the compile and link stages,as it affects both codegen and system libraries.

You probably need to increaseINITIAL_MEMORY to at least 64 MB or setALLOW_MEMORY_GROWTH so that ASan has enough memory to start. Otherwise,you will receive an error message that looks something like:

Cannot enlarge memory arrays to size 55152640 bytes (OOM). Either (1) compilewith-sINITIAL_MEMORY=X with X higher than the current value 50331648, (2)compile with-sALLOW_MEMORY_GROWTH which allows increasing the size atruntime, or (3) if you want malloc to return NULL (0) instead of this abort,compile with-sABORTING_MALLOC=0

ASan fully supports multi-thread environments. ASan also operates on the JSsupport code, that is, if JS tries to read from a memory address that is notvalid, it will be caught, just like if that access happened from Wasm.

Examples

Here are some examples of how AddressSanitizer can be used to help find bugs.

Buffer Overflow

Considerbuffer_overflow.c:

#include<string.h>intmain(void){charx[10];memset(x,0,11);}
$emcc-gsource-map-fsanitize=address-sALLOW_MEMORY_GROWTHbuffer_overflow.c$nodea.out.js===================================================================42==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x02965e5a at pc 0x000015f0 bp 0x02965a30 sp 0x02965a30WRITE of size 11 at 0x02965e5a thread T0    #00x15f0in__asan_memset+0x15f0(a.out.wasm+0x15f0)    #10xc46in__original_mainstack_buffer_overflow.c:5:3    #20xcbcinmain+0xcbc(a.out.wasm+0xcbc)    #30x800019bcinObject.Module._maina.out.js:6588:32    #40x80001aebinObject.callMaina.out.js:6891:30    #50x80001b25indoRuna.out.js:6949:60    #60x80001b33inruna.out.js:6963:5    #70x80001ad6inrunCallera.out.js:6870:29Address 0x02965e5a is located in stack of thread T0 at offset 26 in frame    #00x11(a.out.wasm+0x11)  This frame has 1 object(s):    [16, 26) 'x' (line 4) <== Memory access at offset 26 overflows this variableHINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork      (longjmp and C++ exceptions *are* supported)SUMMARY: AddressSanitizer: stack-buffer-overflow (a.out.wasm+0x15ef)...

Use After Free

Consideruse_after_free.cpp:

intmain(){int*array=newint[100];delete[]array;returnarray[0];}
$em++-gsource-map-fsanitize=address-sALLOW_MEMORY_GROWTHuse_after_free.cpp$nodea.out.js===================================================================42==ERROR: AddressSanitizer: heap-use-after-free on address 0x03203e40 at pc 0x00000c1b bp 0x02965e70 sp 0x02965e7cREAD of size 4 at 0x03203e40 thread T0    #00xc1bin__original_mainuse_after_free.cpp:4:10    #10xc48inmain+0xc48(a.out.wasm+0xc48)0x03203e40 is located 0 bytes inside of 400-byte region [0x03203e40,0x03203fd0)freed by thread T0 here:    #00x5fe8inoperatordelete[](void*)+0x5fe8(a.out.wasm+0x5fe8)    #10xb76in__original_mainuse_after_free.cpp:3:3    #20xc48inmain+0xc48(a.out.wasm+0xc48)    #30x800019b5inObject.Module._maina.out.js:6581:32    #40x80001adeinObject.callMaina.out.js:6878:30    #50x80001b18indoRuna.out.js:6936:60    #60x80001b26inruna.out.js:6950:5    #70x80001ac9inrunCallera.out.js:6857:29previously allocated by thread T0 here:    #00x5db4inoperatornew[](unsignedlong)+0x5db4(a.out.wasm+0x5db4)    #10xb41in__original_mainuse_after_free.cpp:2:16    #20xc48inmain+0xc48(a.out.wasm+0xc48)    #30x800019b5inObject.Module._maina.out.js:6581:32    #40x80001adeinObject.callMaina.out.js:6878:30    #50x80001b18indoRuna.out.js:6936:60    #60x80001b26inruna.out.js:6950:5    #70x80001ac9inrunCallera.out.js:6857:29SUMMARY: AddressSanitizer: heap-use-after-free (a.out.wasm+0xc1a)...

Memory Leaks

Considerleak.cpp:

intmain(){newint[10];}
$em++-gsource-map-fsanitize=address-sALLOW_MEMORY_GROWTH-sEXIT_RUNTIMEleak.cpp$nodea.out.js===================================================================42==ERROR: LeakSanitizer: detected memory leaksDirect leak of 40 byte(s) in 1 object(s) allocated from:    #00x5ce5inoperatornew[](unsignedlong)+0x5ce5(a.out.wasm+0x5ce5)    #10xb24in__original_mainleak.cpp:2:3    #20xb3ainmain+0xb3a(a.out.wasm+0xb3a)    #30x800019b8inObject.Module._maina.out.js:6584:32    #40x80001ae1inObject.callMaina.out.js:6881:30    #50x80001b1bindoRuna.out.js:6939:60    #60x80001b29inruna.out.js:6953:5    #70x80001accinrunCallera.out.js:6860:29SUMMARY: AddressSanitizer: 40 byte(s) leaked in 1 allocation(s).

Note that since leak checks take place at program exit, you must use-sEXIT_RUNTIME, or invoke__lsan_do_leak_check or__lsan_do_recoverable_leak_check manually.

You can detect that AddressSanitizer is enabled and run__lsan_do_leak_checkby doing:

#include<sanitizer/lsan_interface.h>#if defined(__has_feature)#if __has_feature(address_sanitizer)// code for ASan-enabled builds__lsan_do_leak_check();#endif#endif

This will be fatal if there are memory leaks. To check for memory leaksand allow the process to continue running, use__lsan_do_recoverable_leak_check.

Also, if you only want to check for memory leaks, you may use-fsanitize=leak instead of-fsanitize=address.-fsanitize=leakdoes not instrument all memory accesses, and as a result is much faster than-fsanitize=address.

Use After Return

Consideruse_after_return.c:

#include<stdio.h>constchar*__asan_default_options(){return"detect_stack_use_after_return=1";}int*f(){intbuf[10];returnbuf;}intmain(){*f()=1;}

Note that to do this check, you have to use the ASan optiondetect_stack_use_after_return. You may enable this option by declaringa function called__asan_default_options like the example, or you candefineModule['ASAN_OPTIONS']='detect_stack_use_after_return=1' in thegenerated JavaScript.--pre-js is helpful here.

This option is fairly expensive because it converts stack allocations intoheap allocations, and these allocations are not reused so that future accessescan cause traps. Hence, it is not enabled by default.

$emcc-gsource-map-fsanitize=address-sALLOW_MEMORY_GROWTHuse_after_return.c$nodea.out.js===================================================================42==ERROR: AddressSanitizer: stack-use-after-return on address 0x02a95010 at pc 0x00000d90 bp 0x02965f70 sp 0x02965f7cWRITE of size 4 at 0x02a95010 thread T0    #00xd90in__original_mainuse_after_return.c:13:10    #10xe0ainmain+0xe0a(a.out.wasm+0xe0a)Address 0x02a95010 is located in stack of thread T0 at offset 16 in frame    #00x11(a.out.wasm+0x11)  This frame has 1 object(s):    [16, 56) 'buf' (line 8) <== Memory access at offset 16 is inside this variableHINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork      (longjmp and C++ exceptions *are* supported)SUMMARY: AddressSanitizer: stack-use-after-return (a.out.wasm+0xd8f)...

Configuration

ASan can be configured via a--pre-js file:

Module.ASAN_OPTIONS='option1=a:option2=b';

For example, put the above snippet with your options intoasan_options.js,and compile with--pre-jsasan_options.js.

For standalone LSan, useModule.LSAN_OPTIONS instead.

For a detailed understanding of the flags, see theASan documentation.Please be warned that most flag combinations are not tested and may or may notwork.

Disablingmalloc/free Stack Traces

In a program that usesmalloc/free (or their C++ equivalent,operatornew/operatordelete) very frequently, taking a stack trace atall invocations tomalloc/free can be very expensive. As a result, ifyou find your program to be very slow when using ASan, you can try using theoptionmalloc_context_size=0, like this:

Module.ASAN_OPTIONS='malloc_context_size=0';

This prevents ASan from reporting the location of memory leaks or offeringinsight into where the memory for a heap-based memory error originated,but may provide tremendous speed ups.

Comparison toSAFE_HEAP

Emscripten provides aSAFE_HEAP mode, which can be activated by runningemcc with-sSAFE_HEAP. This does several things, some of which overlapwith sanitizers.

In general,SAFE_HEAP focuses on the specific pain points that come up whentargeting Wasm. The sanitizers on the other hand focus on the specific pain points that areinvolved with using languages like C/C++. Those two sets overlap, but are notidentical. Which you should use depends on which types of problems you arelooking for. You may want to test with all sanitizers and withSAFE_HEAPfor maximal coverage, but you may need to build separately for each mode, sincenot all sanitizers are compatible with each other, and not all of them arecompatible withSAFE_HEAP (because the sanitizers do some pretty radicalthings!). You will get a compiler error if there is an issue with the flags youpassed. A reasonable set of separate test builds to do might be: ASan, UBsan,andSAFE_HEAP.

The specific thingsSAFE_HEAP errors on include:

  • NULL pointer (address 0) reads or writes. As mentioned earlier, this isannoying in WebAssembly and JavaScript because 0 is just a normal address, soyou don’t get an immediate segfault, which can be confusing.

  • Unaligned reads or writes. These work in WebAssembly, but on some platformsan incorrectly-aligned read or write may be much slower, and with wasm2js(WASM=0) it will be incorrect, as JavaScript Typed Arrays do not allowunaligned operations.

  • Reads or writes past the top of valid memory as managed bysbrk(), that is,memory that was not properly allocated bymalloc(). This is not specificto Wasm, however, in JavaScript if the address is big enough to be outside theTyped Array,undefined is returned which can be very confusing, which iswhy this was added (in Wasm at least an error is thrown;SAFE_HEAP stillhelps with Wasm though, by checking the area between the top ofsbrk()’smemory and the end of the Wasm Memory).

SAFE_HEAP does these checks by instrumenting every single load and store.That has the cost of slowing things down, but it does give a simple guaranteeof findingall such problems. It can also be done after compilation, on anarbitrary Wasm binary, while the sanitizers must be done when compiling fromsource.

In comparison, UBSan can also find null pointer reads and writes. It does notinstrument every single load and store, however, as it is done duringcompilation of the source code, so the checks are added where clang knows theyare needed. This is much more efficient, but there is a risk of codegen andoptimizations changing something, or clang missing a specific location.

ASan can find reads or writes of unallocated memory, which includes addressesabove thesbrk()-managed memory. It may be more efficient thanSAFE_HEAPin some cases: while it also checks every load and store, the LLVMoptimizer is run after it adds those checks, which can remove some of them.