Kernel Concurrency Sanitizer (KCSAN)

The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, whichrelies on compile-time instrumentation, and uses a watchpoint-based samplingapproach to detect races. KCSAN’s primary purpose is to detectdata races.

Usage

KCSAN is supported by both GCC and Clang. With GCC we require version 11 orlater, and with Clang also require version 11 or later.

To enable KCSAN configure the kernel with:

CONFIG_KCSAN = y

KCSAN provides several other configuration options to customize behaviour (seethe respective help text inlib/Kconfig.kcsan for more info).

Error reports

A typical data race report looks like this:

==================================================================BUG: KCSAN: data-race in test_kernel_read / test_kernel_writewrite to 0xffffffffc009a628 of 8 bytes by task 487 on cpu 0: test_kernel_write+0x1d/0x30 access_thread+0x89/0xd0 kthread+0x23e/0x260 ret_from_fork+0x22/0x30read to 0xffffffffc009a628 of 8 bytes by task 488 on cpu 6: test_kernel_read+0x10/0x20 access_thread+0x89/0xd0 kthread+0x23e/0x260 ret_from_fork+0x22/0x30value changed: 0x00000000000009a6 -> 0x00000000000009b2Reported by Kernel Concurrency Sanitizer on:CPU: 6 PID: 488 Comm: access_thread Not tainted 5.12.0-rc2+ #1Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014==================================================================

The header of the report provides a short summary of the functions involved inthe race. It is followed by the access types and stack traces of the 2 threadsinvolved in the data race. If KCSAN also observed a value change, the observedold value and new value are shown on the “value changed” line respectively.

The other less common type of data race report looks like this:

==================================================================BUG: KCSAN: data-race in test_kernel_rmw_array+0x71/0xd0race at unknown origin, with read to 0xffffffffc009bdb0 of 8 bytes by task 515 on cpu 2: test_kernel_rmw_array+0x71/0xd0 access_thread+0x89/0xd0 kthread+0x23e/0x260 ret_from_fork+0x22/0x30value changed: 0x0000000000002328 -> 0x0000000000002329Reported by Kernel Concurrency Sanitizer on:CPU: 2 PID: 515 Comm: access_thread Not tainted 5.12.0-rc2+ #1Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014==================================================================

This report is generated where it was not possible to determine the otherracing thread, but a race was inferred due to the data value of the watchedmemory location having changed. These reports always show a “value changed”line. A common reason for reports of this type are missing instrumentation inthe racing thread, but could also occur due to e.g. DMA accesses. Such reportsare shown only ifCONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=y, which isenabled by default.

Selective analysis

It may be desirable to disable data race detection for specific accesses,functions, compilation units, or entire subsystems. For static blacklisting,the below options are available:

  • KCSAN understands thedata_race(expr) annotation, which tells KCSAN thatany data races due to accesses inexpr should be ignored and resultingbehaviour when encountering a data race is deemed safe. Please see“Marking Shared-Memory Accesses” in the LKMM for more information.

  • Similar todata_race(...), the type qualifier__data_racy can be usedto document that all data races due to accesses to a variable are intendedand should be ignored by KCSAN:

    struct foo {    ...    int __data_racy stats_counter;    ...};
  • Disabling data race detection for entire functions can be accomplished byusing the function attribute__no_kcsan:

    __no_kcsanvoid foo(void) {    ...

    To dynamically limit for which functions to generate reports, see theDebugFS interface blacklist/whitelist feature.

  • To disable data race detection for a particular compilation unit, add to theMakefile:

    KCSAN_SANITIZE_file.o := n
  • To disable data race detection for all compilation units listed in aMakefile, add to the respectiveMakefile:

    KCSAN_SANITIZE := n

Furthermore, it is possible to tell KCSAN to show or hide entire classes ofdata races, depending on preferences. These can be changed via the followingKconfig options:

  • CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY: If enabled and a conflicting writeis observed via a watchpoint, but the data value of the memory location wasobserved to remain unchanged, do not report the data race.

  • CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC: Assume that plain aligned writesup to word size are atomic by default. Assumes that such writes are notsubject to unsafe compiler optimizations resulting in data races. The optioncauses KCSAN to not report data races due to conflicts where the only plainaccesses are aligned writes up to word size.

  • CONFIG_KCSAN_PERMISSIVE: Enable additional permissive rules to ignorecertain classes of common data races. Unlike the above, the rules are morecomplex involving value-change patterns, access type, and address. Thisoption depends onCONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=y. For detailsplease see thekernel/kcsan/permissive.h. Testers and maintainers thatonly focus on reports from specific subsystems and not the whole kernel arerecommended to disable this option.

To use the strictest possible rules, selectCONFIG_KCSAN_STRICT=y, whichconfigures KCSAN to follow the Linux-kernel memory consistency model (LKMM) asclosely as possible.

DebugFS interface

The file/sys/kernel/debug/kcsan provides the following interface:

  • Reading/sys/kernel/debug/kcsan returns various runtime statistics.

  • Writingon oroff to/sys/kernel/debug/kcsan allows turning KCSANon or off, respectively.

  • Writing!some_func_name to/sys/kernel/debug/kcsan addssome_func_name to the report filter list, which (by default) blacklistsreporting data races where either one of the top stackframes are a functionin the list.

  • Writing eitherblacklist orwhitelist to/sys/kernel/debug/kcsanchanges the report filtering behaviour. For example, the blacklist featurecan be used to silence frequently occurring data races; the whitelist featurecan help with reproduction and testing of fixes.

Tuning performance

Core parameters that affect KCSAN’s overall performance and bug detectionability are exposed as kernel command-line arguments whose defaults can also bechanged via the corresponding Kconfig options.

  • kcsan.skip_watch (CONFIG_KCSAN_SKIP_WATCH): Number of per-CPU memoryoperations to skip, before another watchpoint is set up. Setting upwatchpoints more frequently will result in the likelihood of races to beobserved to increase. This parameter has the most significant impact onoverall system performance and race detection ability.

  • kcsan.udelay_task (CONFIG_KCSAN_UDELAY_TASK): For tasks, themicrosecond delay to stall execution after a watchpoint has been set up.Larger values result in the window in which we may observe a race toincrease.

  • kcsan.udelay_interrupt (CONFIG_KCSAN_UDELAY_INTERRUPT): Forinterrupts, the microsecond delay to stall execution after a watchpoint hasbeen set up. Interrupts have tighter latency requirements, and their delayshould generally be smaller than the one chosen for tasks.

They may be tweaked at runtime via/sys/module/kcsan/parameters/.

Data Races

In an execution, two memory accesses form adata race if theyconflict,they happen concurrently in different threads, and at least one of them is aplain access; theyconflict if both access the same memory location, and atleast one is a write. For a more thorough discussion and definition, see“PlainAccesses and Data Races” in the LKMM.

Relationship with the Linux-Kernel Memory Consistency Model (LKMM)

The LKMM defines the propagation and ordering rules of various memoryoperations, which gives developers the ability to reason about concurrent code.Ultimately this allows to determine the possible executions of concurrent code,and if that code is free from data races.

KCSAN is aware ofmarked atomic operations (READ_ONCE,WRITE_ONCE,atomic_*, etc.), and a subset of ordering guarantees implied by memorybarriers. WithCONFIG_KCSAN_WEAK_MEMORY=y, KCSAN models load or storebuffering, and can detect missingsmp_mb(),smp_wmb(),smp_rmb(),smp_store_release(), and allatomic_* operations with equivalentimplied barriers.

Note, KCSAN will not report all data races due to missing memory ordering,specifically where a memory barrier would be required to prohibit subsequentmemory operation from reordering before the barrier. Developers shouldtherefore carefully consider the required memory ordering requirements thatremain unchecked.

Race Detection Beyond Data Races

For code with complex concurrency design, race-condition bugs may not alwaysmanifest as data races. Race conditions occur if concurrently executingoperations result in unexpected system behaviour. On the other hand, data racesare defined at the C-language level. The following macros can be used to checkproperties of concurrent code where bugs would not manifest as data races.

ASSERT_EXCLUSIVE_WRITER

ASSERT_EXCLUSIVE_WRITER(var)

assert no concurrent writes tovar

Parameters

var

variable to assert on

Description

Assert that there are no concurrent writes tovar; other readers areallowed. This assertion can be used to specify properties of concurrent code,where violation cannot be detected as a normal data race.

For example, if we only have a single writer, but multiple concurrentreaders, to avoid data races, all these accesses must be marked; evenconcurrent marked writes racing with the single writer are bugs.Unfortunately, due to being marked, they are no longer data races. For caseslike these, we can use the macro as follows:

voidwriter(void){spin_lock(&update_foo_lock);ASSERT_EXCLUSIVE_WRITER(shared_foo);WRITE_ONCE(shared_foo,...);spin_unlock(&update_foo_lock);}voidreader(void){// update_foo_lock does not need to be held!...=READ_ONCE(shared_foo);}

Note

ASSERT_EXCLUSIVE_WRITER_SCOPED(), if applicable, performs more thoroughchecking if a clear scope where no concurrent writes are expected exists.

ASSERT_EXCLUSIVE_WRITER_SCOPED

ASSERT_EXCLUSIVE_WRITER_SCOPED(var)

assert no concurrent writes tovar in scope

Parameters

var

variable to assert on

Description

Scoped variant ofASSERT_EXCLUSIVE_WRITER().

Assert that there are no concurrent writes tovar for the duration of thescope in which it is introduced. This provides a better way to fully coverthe enclosing scope, compared to multipleASSERT_EXCLUSIVE_WRITER(), andincreases the likelihood for KCSAN to detect racing accesses.

For example, it allows finding race-condition bugs that only occur due tostate changes within the scope itself:

voidwriter(void){spin_lock(&update_foo_lock);{ASSERT_EXCLUSIVE_WRITER_SCOPED(shared_foo);WRITE_ONCE(shared_foo,42);...// shared_foo should still be 42 here!}spin_unlock(&update_foo_lock);}voidbuggy(void){if(READ_ONCE(shared_foo)==42)WRITE_ONCE(shared_foo,1);// bug!}
ASSERT_EXCLUSIVE_ACCESS

ASSERT_EXCLUSIVE_ACCESS(var)

assert no concurrent accesses tovar

Parameters

var

variable to assert on

Description

Assert that there are no concurrent accesses tovar (no readers norwriters). This assertion can be used to specify properties of concurrentcode, where violation cannot be detected as a normal data race.

For example, where exclusive access is expected after determining no otherusers of an object are left, but the object is not actually freed. We cancheck that this property actually holds as follows:

if(refcount_dec_and_test(&obj->refcnt)){ASSERT_EXCLUSIVE_ACCESS(*obj);do_some_cleanup(obj);release_for_reuse(obj);}

Note

  1. ASSERT_EXCLUSIVE_ACCESS_SCOPED(), if applicable, performs more thoroughchecking if a clear scope where no concurrent accesses are expected exists.

  2. For cases where the object is freed,KASAN is a betterfit to detect use-after-free bugs.

ASSERT_EXCLUSIVE_ACCESS_SCOPED

ASSERT_EXCLUSIVE_ACCESS_SCOPED(var)

assert no concurrent accesses tovar in scope

Parameters

var

variable to assert on

Description

Scoped variant ofASSERT_EXCLUSIVE_ACCESS().

Assert that there are no concurrent accesses tovar (no readers nor writers)for the entire duration of the scope in which it is introduced. This providesa better way to fully cover the enclosing scope, compared to multipleASSERT_EXCLUSIVE_ACCESS(), and increases the likelihood for KCSAN to detectracing accesses.

ASSERT_EXCLUSIVE_BITS

ASSERT_EXCLUSIVE_BITS(var,mask)

assert no concurrent writes to subset of bits invar

Parameters

var

variable to assert on

mask

only check for modifications to bits set inmask

Description

Bit-granular variant ofASSERT_EXCLUSIVE_WRITER().

Assert that there are no concurrent writes to a subset of bits invar;concurrent readers are permitted. This assertion captures more detailedbit-level properties, compared to the other (word granularity) assertions.Only the bits set inmask are checked for concurrent modifications, whileignoring the remaining bits, i.e. concurrent writes (or reads) to ~mask bitsare ignored.

Use this for variables, where some bits must not be modified concurrently,yet other bits are expected to be modified concurrently.

For example, variables where, after initialization, some bits are read-only,but other bits may still be modified concurrently. A reader may wish toassert that this is true as follows:

ASSERT_EXCLUSIVE_BITS(flags,READ_ONLY_MASK);foo=(READ_ONCE(flags)&READ_ONLY_MASK)>>READ_ONLY_SHIFT;

Note

The access that immediately followsASSERT_EXCLUSIVE_BITS() is assumedto access the masked bits only, and KCSAN optimistically assumes it istherefore safe, even in the presence of data races, and marking it withREAD_ONCE() is optional from KCSAN’s point-of-view. We caution, however, thatit may still be advisable to do so, since we cannot reason about all compileroptimizations when it comes to bit manipulations (on the reader and writerside). If you are sure nothing can go wrong, we can write the above simplyas:

ASSERT_EXCLUSIVE_BITS(flags,READ_ONLY_MASK);foo=(flags&READ_ONLY_MASK)>>READ_ONLY_SHIFT;

Another example, where this may be used, is when certain bits ofvar mayonly be modified when holding the appropriate lock, but other bits may stillbe modified concurrently. Writers, where other bits may change concurrently,could use the assertion as follows:

spin_lock(&foo_lock);ASSERT_EXCLUSIVE_BITS(flags,FOO_MASK);old_flags=flags;new_flags=(old_flags&~FOO_MASK)|(new_foo<<FOO_SHIFT);if(cmpxchg(&flags,old_flags,new_flags)!=old_flags){...}spin_unlock(&foo_lock);

Implementation Details

KCSAN relies on observing that two accesses happen concurrently. Crucially, wewant to (a) increase the chances of observing races (especially for races thatmanifest rarely), and (b) be able to actually observe them. We can accomplish(a) by injecting various delays, and (b) by using address watchpoints (orbreakpoints).

If we deliberately stall a memory access, while we have a watchpoint for itsaddress set up, and then observe the watchpoint to fire, two accesses to thesame address just raced. Using hardware watchpoints, this is the approach takeninDataCollider.Unlike DataCollider, KCSAN does not use hardware watchpoints, but insteadrelies on compiler instrumentation and “soft watchpoints”.

In KCSAN, watchpoints are implemented using an efficient encoding that storesaccess type, size, and address in a long; the benefits of using “softwatchpoints” are portability and greater flexibility. KCSAN then relies on thecompiler instrumenting plain accesses. For each instrumented plain access:

  1. Check if a matching watchpoint exists; if yes, and at least one access is awrite, then we encountered a racing access.

  2. Periodically, if no matching watchpoint exists, set up a watchpoint andstall for a small randomized delay.

  3. Also check the data value before the delay, and re-check the data valueafter delay; if the values mismatch, we infer a race of unknown origin.

To detect data races between plain and marked accesses, KCSAN also annotatesmarked accesses, but only to check if a watchpoint exists; i.e. KCSAN neversets up a watchpoint on marked accesses. By never setting up watchpoints formarked operations, if all accesses to a variable that is accessed concurrentlyare properly marked, KCSAN will never trigger a watchpoint and therefore neverreport the accesses.

Modeling Weak Memory

KCSAN’s approach to detecting data races due to missing memory barriers isbased on modeling access reordering (withCONFIG_KCSAN_WEAK_MEMORY=y).Each plain memory access for which a watchpoint is set up, is also selected forsimulated reordering within the scope of its function (at most 1 in-flightaccess).

Once an access has been selected for reordering, it is checked along everyother access until the end of the function scope. If an appropriate memorybarrier is encountered, the access will no longer be considered for simulatedreordering.

When the result of a memory operation should be ordered by a barrier, KCSAN canthen detect data races where the conflict only occurs as a result of a missingbarrier. Consider the example:

int x, flag;void T1(void){    x = 1;                  // data race!    WRITE_ONCE(flag, 1);    // correct: smp_store_release(&flag, 1)}void T2(void){    while (!READ_ONCE(flag));   // correct: smp_load_acquire(&flag)    ... = x;                    // data race!}

When weak memory modeling is enabled, KCSAN can considerx inT1 forsimulated reordering. After the write offlag,x is again checked forconcurrent accesses: becauseT2 is able to proceed after the write offlag, a data race is detected. With the correct barriers in place,xwould not be considered for reordering after the proper release offlag,and no data race would be detected.

Deliberate trade-offs in complexity but also practical limitations mean only asubset of data races due to missing memory barriers can be detected. Withcurrently available compiler support, the implementation is limited to modelingthe effects of “buffering” (delaying accesses), since the runtime cannot“prefetch” accesses. Also recall that watchpoints are only set up for plainaccesses, and the only access type for which KCSAN simulates reordering. Thismeans reordering of marked accesses is not modeled.

A consequence of the above is that acquire operations do not require barrierinstrumentation (no prefetching). Furthermore, marked accesses introducingaddress or control dependencies do not require special handling (the markedaccess cannot be reordered, later dependent accesses cannot be prefetched).

Key Properties

  1. Memory Overhead: The overall memory overhead is only a few MiBdepending on configuration. The current implementation uses a small array oflongs to encode watchpoint information, which is negligible.

  2. Performance Overhead: KCSAN’s runtime aims to be minimal, using anefficient watchpoint encoding that does not require acquiring any sharedlocks in the fast-path. For kernel boot on a system with 8 CPUs:

    • 5.0x slow-down with the default KCSAN config;

    • 2.8x slow-down from runtime fast-path overhead only (set very largeKCSAN_SKIP_WATCH and unsetKCSAN_SKIP_WATCH_RANDOMIZE).

  3. Annotation Overheads: Minimal annotations are required outside the KCSANruntime. As a result, maintenance overheads are minimal as the kernelevolves.

  4. Detects Racy Writes from Devices: Due to checking data values uponsetting up watchpoints, racy writes from devices can also be detected.

  5. Memory Ordering: KCSAN is aware of only a subset of LKMM ordering rules;this may result in missed data races (false negatives).

  6. Analysis Accuracy: For observed executions, due to using a samplingstrategy, the analysis isunsound (false negatives possible), but aims tobe complete (no false positives).

Alternatives Considered

An alternative data race detection approach for the kernel can be found in theKernel Thread Sanitizer (KTSAN).KTSAN is a happens-before data race detector, which explicitly establishes thehappens-before order between memory operations, which can then be used todetermine data races as defined inData Races.

To build a correct happens-before relation, KTSAN must be aware of all orderingrules of the LKMM and synchronization primitives. Unfortunately, any omissionleads to large numbers of false positives, which is especially detrimental inthe context of the kernel which includes numerous custom synchronizationmechanisms. To track the happens-before relation, KTSAN’s implementationrequires metadata for each memory location (shadow memory), which for each pagecorresponds to 4 pages of shadow memory, and can translate into overhead oftens of GiB on a large system.