Kernel Electric-Fence (KFENCE)¶
Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safetyerror detector. KFENCE detects heap out-of-bounds access, use-after-free, andinvalid-free errors.
KFENCE is designed to be enabled in production kernels, and has near zeroperformance overhead. Compared to KASAN, KFENCE trades performance forprecision. The main motivation behind KFENCE’s design, is that with enoughtotal uptime KFENCE will detect bugs in code paths not typically exercised bynon-production test workloads. One way to quickly achieve a large enough totaluptime is when the tool is deployed across a large fleet of machines.
Usage¶
To enable KFENCE, configure the kernel with:
CONFIG_KFENCE=y
To build a kernel with KFENCE support, but disabled by default (to enable, setkfence.sample_interval to non-zero value), configure the kernel with:
CONFIG_KFENCE=yCONFIG_KFENCE_SAMPLE_INTERVAL=0
KFENCE provides several other configuration options to customize behaviour (seethe respective help text inlib/Kconfig.kfence for more info).
Tuning performance¶
The most important parameter is KFENCE’s sample interval, which can be set viathe kernel boot parameterkfence.sample_interval in milliseconds. Thesample interval determines the frequency with which heap allocations will beguarded by KFENCE. The default is configurable via the Kconfig optionCONFIG_KFENCE_SAMPLE_INTERVAL. Settingkfence.sample_interval=0disables KFENCE.
The sample interval controls a timer that sets up KFENCE allocations. Bydefault, to keep the real sample interval predictable, the normal timer alsocauses CPU wake-ups when the system is completely idle. This may be undesirableon power-constrained systems. The boot parameterkfence.deferrable=1instead switches to a “deferrable” timer which does not force CPU wake-ups onidle systems, at the risk of unpredictable sample intervals. The default isconfigurable via the Kconfig optionCONFIG_KFENCE_DEFERRABLE.
Warning
The KUnit test suite is very likely to fail when using a deferrable timersince it currently causes very unpredictable sample intervals.
By default KFENCE will only sample 1 heap allocation within each sampleinterval.Burst mode allows to sample successive heap allocations, where thekernel boot parameterkfence.burst can be set to a non-zero value whichdenotes theadditional successive allocations within a sample interval;settingkfence.burst=N means that1+N successive allocations areattempted through KFENCE for each sample interval.
The KFENCE memory pool is of fixed size, and if the pool is exhausted, nofurther KFENCE allocations occur. WithCONFIG_KFENCE_NUM_OBJECTS (default255), the number of available guarded objects can be controlled. Each objectrequires 2 pages, one for the object itself and the other one used as a guardpage; object pages are interleaved with guard pages, and every object page istherefore surrounded by two guard pages.
The total memory dedicated to the KFENCE memory pool can be computed as:
( #objects + 1 ) * 2 * PAGE_SIZE
Using the default config, and assuming a page size of 4 KiB, results indedicating 2 MiB to the KFENCE memory pool.
Note: On architectures that support huge pages, KFENCE will ensure that thepool is using pages of sizePAGE_SIZE. This will result in additional pagetables being allocated.
Error reports¶
A typical out-of-bounds access looks like this:
==================================================================BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72): test_out_of_bounds_read+0xa6/0x234 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32allocated by task 484 on cpu 0 at 32.919330s: test_alloc+0xfe/0x738 test_out_of_bounds_read+0x9b/0x234 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014==================================================================
The header of the report provides a short summary of the function involved inthe access. It is followed by more detailed information about the access andits origin. Note that, real kernel addresses are only shown when using thekernel command line optionno_hash_pointers.
Use-after-free accesses are reported as:
==================================================================BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79): test_use_after_free_read+0xb3/0x143 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32allocated by task 488 on cpu 2 at 33.871326s: test_alloc+0xfe/0x738 test_use_after_free_read+0x76/0x143 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30freed by task 488 on cpu 2 at 33.871358s: test_use_after_free_read+0xa8/0x143 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014==================================================================
KFENCE also reports on invalid frees, such as double-frees:
==================================================================BUG: KFENCE: invalid free in test_double_free+0xdc/0x171Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81): test_double_free+0xdc/0x171 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32allocated by task 490 on cpu 1 at 34.175321s: test_alloc+0xfe/0x738 test_double_free+0x76/0x171 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30freed by task 490 on cpu 1 at 34.175348s: test_double_free+0xa8/0x171 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014==================================================================
KFENCE also uses pattern-based redzones on the other side of an object’s guardpage, to detect out-of-bounds writes on the unprotected side of the object.These are reported on frees:
==================================================================BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156): test_kmalloc_aligned_oob_write+0xef/0x184 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96allocated by task 502 on cpu 7 at 42.159302s: test_alloc+0xfe/0x738 test_kmalloc_aligned_oob_write+0x57/0x184 kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x176/0x1b0 ret_from_fork+0x22/0x30CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014==================================================================
For such errors, the address where the corruption occurred as well as theinvalidly written bytes (offset from the address) are shown; in thisrepresentation, ‘.’ denote untouched bytes. In the example above0xac isthe value written to the invalid address at offset 0, and the remaining ‘.’denote that no following bytes have been touched. Note that, real values areonly shown if the kernel was booted withno_hash_pointers; to avoidinformation disclosure otherwise, ‘!’ is used instead to denote invalidlywritten bytes.
And finally, KFENCE may also report on invalid accesses to any protected pagewhere it was not possible to determine an associated object, e.g. if adjacentobject pages had not yet been allocated:
==================================================================BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0Invalid read at 0xffffffffb670b00a: test_invalid_access+0x26/0xe0 kunit_try_run_case+0x51/0x85 kunit_generic_run_threadfn_adapter+0x16/0x30 kthread+0x137/0x160 ret_from_fork+0x22/0x30CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014==================================================================
DebugFS interface¶
Some debugging information is exposed via debugfs:
The file
/sys/kernel/debug/kfence/statsprovides runtime statistics.The file
/sys/kernel/debug/kfence/objectsprovides a list of objectsallocated via KFENCE, including those already freed but protected.
Implementation Details¶
Guarded allocations are set up based on the sample interval. After expirationof the sample interval, the next allocation through the main allocator (SLAB orSLUB) returns a guarded allocation from the KFENCE object pool (allocationsizes up to PAGE_SIZE are supported). At this point, the timer is reset, andthe next allocation is set up after the expiration of the interval.
When usingCONFIG_KFENCE_STATIC_KEYS=y, KFENCE allocations are “gated”through the main allocator’s fast-path by relying on static branches via thestatic keys infrastructure. The static branch is toggled to redirect theallocation to KFENCE. Depending on sample interval, target workloads, andsystem architecture, this may perform better than the simple dynamic branch.Careful benchmarking is recommended.
KFENCE objects each reside on a dedicated page, at either the left or rightpage boundaries selected at random. The pages to the left and right of theobject page are “guard pages”, whose attributes are changed to a protectedstate, and cause page faults on any attempted access. Such page faults are thenintercepted by KFENCE, which handles the fault gracefully by reporting anout-of-bounds access, and marking the page as accessible so that the faultingcode can (wrongly) continue executing (setpanic_on_warn to panic instead).
To detect out-of-bounds writes to memory within the object’s page itself,KFENCE also uses pattern-based redzones. For each object page, a redzone is setup for all non-object memory. For typical alignments, the redzone is onlyrequired on the unguarded side of an object. Because KFENCE must honor thecache’s requested alignment, special alignments may result in unprotected gapson either side of an object, all of which are redzoned.
The following figure illustrates the page layout:
---+-----------+-----------+-----------+-----------+-----------+--- | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |---+-----------+-----------+-----------+-----------+-----------+---
Upon deallocation of a KFENCE object, the object’s page is again protected andthe object is marked as freed. Any further access to the object causes a faultand KFENCE reports a use-after-free access. Freed objects are inserted at thetail of KFENCE’s freelist, so that the least recently freed objects are reusedfirst, and the chances of detecting use-after-frees of recently freed objectsis increased.
If pool utilization reaches 75% (default) or above, to reduce the risk of thepool eventually being fully occupied by allocated objects yet ensure diversecoverage of allocations, KFENCE limits currently covered allocations of thesame source from further filling up the pool. The “source” of an allocation isbased on its partial allocation stack trace. A side-effect is that this alsolimits frequent long-lived allocations (e.g. pagecache) of the same sourcefilling up the pool permanently, which is the most common risk for the poolbecoming full and the sampled allocation rate dropping to zero. The thresholdat which to start limiting currently covered allocations can be configured viathe boot parameterkfence.skip_covered_thresh (pool usage%).
Interface¶
The following describes the functions which are used by allocators as well aspage handling code to set up and deal with KFENCE allocations.
- boolis_kfence_address(constvoid*addr)¶
check if an address belongs to KFENCE pool
Parameters
constvoid*addraddress to check
Return
true or false depending on whether the address is within the KFENCEobject range.
Description
KFENCE objects live in a separate page range and are not to be intermixedwith regular heap objects (e.g. KFENCE objects must never be added to theallocator freelists). Failing to do so may and will result in heapcorruptions, thereforeis_kfence_address() must be used to check whetheran object requires specific handling.
Note
This function may be used in fast-paths, and is performance critical.Future changes should take this into account; for instance, we want to avoidintroducing another load and therefore need to keep KFENCE_POOL_SIZE aconstant (until immediate patching support is added to the kernel).
- voidkfence_shutdown_cache(structkmem_cache*s)¶
handle
shutdown_cache()for KFENCE objects
Parameters
structkmem_cache*scache being shut down
Description
Before shutting down a cache, one must ensure there are no remaining objectsallocated from it. Because KFENCE objects are not referenced from the cachedirectly, we need to check them here.
Note thatshutdown_cache() is internal to SL*B, andkmem_cache_destroy() doesnot return if allocated objects still exist: it prints an error message andsimply aborts destruction of a cache, leaking memory.
If the only such objects are KFENCE objects, we will not leak the entirecache, but instead try to provide more useful debug info by making allocatedobjects “zombie allocations”. Objects may then still be used or freed (whichis handled gracefully), but usage will result in showing KFENCE error reportswhich include stack traces to the user of the object, the original allocationsite, and caller toshutdown_cache().
- void*kfence_alloc(structkmem_cache*s,size_tsize,gfp_tflags)¶
allocate a KFENCE object with a low probability
Parameters
structkmem_cache*sstructkmem_cachewith object requirementssize_tsizeexact size of the object to allocate (can be less thans->sizee.g. for kmalloc caches)
gfp_tflagsGFP flags
Return
NULL - must proceed with allocating as usual,
non-NULL - pointer to a KFENCE object.
Description
kfence_alloc() should be inserted into the heap allocation fast path,allowing it to transparently return KFENCE-allocated objects with a lowprobability using a static branch (the probability is controlled by thekfence.sample_interval boot parameter).
- size_tkfence_ksize(constvoid*addr)¶
get actual amount of memory allocated for a KFENCE object
Parameters
constvoid*addrpointer to a heap object
Return
0 - not a KFENCE object, must call
__ksize()instead,non-0 - this many bytes can be accessed without causing a memory error.
Description
kfence_ksize() returns the number of bytes requested for a KFENCE object atallocation time. This number may be less than the object size of thecorrespondingstructkmem_cache.
- void*kfence_object_start(constvoid*addr)¶
find the beginning of a KFENCE object
Parameters
constvoid*addraddress within a KFENCE-allocated object
Return
address of the beginning of the object.
Description
SL[AU]B-allocated objects are laid out within a page one by one, so it iseasy to calculate the beginning of an object given a pointer inside it andthe object size. The same is not true for KFENCE, which places a singleobject at either end of the page. This helper function is used to find thebeginning of a KFENCE-allocated object.
- void__kfence_free(void*addr)¶
release a KFENCE heap object to KFENCE pool
Parameters
void*addrobject to be freed
Description
Requires: is_kfence_address(addr)
Release a KFENCE object and mark it as freed.
- boolkfence_free(void*addr)¶
try to release an arbitrary heap object to KFENCE pool
Parameters
void*addrobject to be freed
Return
false - object doesn’t belong to KFENCE pool and was ignored,
true - object was released to KFENCE pool.
Description
Release a KFENCE object and mark it as freed. May be called on any object,even non-KFENCE objects, to simplify integration of the hooks into theallocator’s free codepath. The allocator must check the return value todetermine if it was a KFENCE object or not.
- boolkfence_handle_page_fault(unsignedlongaddr,boolis_write,structpt_regs*regs)¶
perform page fault handling for KFENCE pages
Parameters
unsignedlongaddrfaulting address
boolis_writeis access a write
structpt_regs*regscurrent
structpt_regs(can be NULL, but shows full stack trace)
Return
false - address outside KFENCE pool,
true - page fault handled by KFENCE, no additional handling required.
Description
A page fault inside KFENCE pool indicates a memory error, such as anout-of-bounds access, a use-after-free or an invalid memory access. In thesecases KFENCE prints an error message and marks the offending page aspresent, so that the kernel can proceed.
Related Tools¶
In userspace, a similar approach is taken byGWP-ASan. GWP-ASan also relies on guard pages anda sampling strategy to detect memory unsafety bugs at scale. KFENCE’s design isdirectly influenced by GWP-ASan, and can be seen as its kernel sibling. Anothersimilar but non-sampling approach, that also inspired the name “KFENCE”, can befound in the userspaceElectric Fence Malloc Debugger.
In the kernel, several tools exist to debug memory access errors, and inparticular KASAN can detect all bug classes that KFENCE can detect. While KASANis more precise, relying on compiler instrumentation, this comes at aperformance cost.
It is worth highlighting that KASAN and KFENCE are complementary, withdifferent target environments. For instance, KASAN is the better debugging-aid,where test cases or reproducers exists: due to the lower chance to detect theerror, it would require more effort using KFENCE to debug. Deployments at scalethat cannot afford to enable KASAN, however, would benefit from using KFENCE todiscover bugs due to code paths not exercised by test cases or fuzzers.