27.Bus lock detection and handling

Copyright:

© 2021 Intel Corporation

Authors:

27.1.Problem

A split lock is any atomic operation whose operand crosses two cache lines.Since the operand spans two cache lines and the operation must be atomic,the system locks the bus while the CPU accesses the two cache lines.

A bus lock is acquired through either split locked access to writeback (WB)memory or any locked access to non-WB memory. This is typically thousands ofcycles slower than an atomic operation within a cache line. It also disruptsperformance on other cores and brings the whole system to its knees.

27.2.Detection

Intel processors may support either or both of the following hardwaremechanisms to detect split locks and bus locks. Some AMD processors alsosupport bus lock detect.

27.2.1.#AC exception for split lock detection

Beginning with the Tremont Atom CPU split lock operations may raise anAlignment Check (#AC) exception when a split lock operation is attempted.

27.2.2.#DB exception for bus lock detection

Some CPUs have the ability to notify the kernel by an #DB trap after a userinstruction acquires a bus lock and is executed. This allows the kernel toterminate the application or to enforce throttling.

27.3.Software handling

The kernel #AC and #DB handlers handle bus lock based on the kernelparameter “split_lock_detect”. Here is a summary of different options:

split_lock_detect=

#AC for split lock

#DB for bus lock

off

Do nothing

Do nothing

warn(default)

Kernel OOPsWarn once per task, add adelay, add synchronizationto prevent more than onecore from executing asplit lock in parallel.sysctl split_lock_mitigatecan be used to avoid thedelay and synchronizationWhen both features aresupported, warn in #AC

Warn once per task andand continues to run.

fatal

Kernel OOPsSend SIGBUS to userWhen both features aresupported, fatal in #AC

Send SIGBUS to user.

ratelimit:N(0 < N <= 1000)

Do nothing

Limit bus lock rate toN bus locks per secondsystem wide and warn onbus locks.

27.4.Usages

Detecting and handling bus lock may find usages in various areas:

It is critical for real time system designers who build consolidated realtime systems. These systems run hard real time code on some cores and run“untrusted” user processes on other cores. The hard real time cannot affordto have any bus lock from the untrusted processes to hurt real timeperformance. To date the designers have been unable to deploy thesesolutions as they have no way to prevent the “untrusted” user code fromgenerating split lock and bus lock to block the hard real time code toaccess memory during bus locking.

It’s also useful for general computing to prevent guests or userapplications from slowing down the overall system by executing instructionswith bus lock.

27.5.Guidance

27.5.1.off

Disable checking for split lock and bus lock. This option can be useful ifthere are legacy applications that trigger these events at a low rate sothat mitigation is not needed.

27.5.2.warn

A warning is emitted when a bus lock is detected which allows to identifythe offending application. This is the default behavior.

27.5.3.fatal

In this case, the bus lock is not tolerated and the process is killed.

27.5.4.ratelimit

A system wide bus lock rate limit N is specified where 0 < N <= 1000. Thisallows a bus lock rate up to N bus locks per second. When the bus lock rateis exceeded then any task which is caught via the buslock #DB exception isthrottled by enforced sleeps until the rate goes under the limit again.

This is an effective mitigation in cases where a minimal impact can betolerated, but an eventual Denial of Service attack has to be prevented. Itallows to identify the offending processes and analyze whether they aremalicious or just badly written.

Selecting a rate limit of 1000 allows the bus to be locked for up to aboutseven million cycles each second (assuming 7000 cycles for each buslock). On a 2 GHz processor that would be about 0.35% system slowdown.