Lock types and their rules¶
Introduction¶
The kernel provides a variety of locking primitives which can be dividedinto three categories:
- Sleeping locks
- CPU local locks
- Spinning locks
This document conceptually describes these lock types and provides rulesfor their nesting, including the rules for use under PREEMPT_RT.
Lock categories¶
Sleeping locks¶
Sleeping locks can only be acquired in preemptible task context.
Although implementations allow try_lock() from other contexts, it isnecessary to carefully evaluate the safety of unlock() as well as oftry_lock(). Furthermore, it is also necessary to evaluate the debuggingversions of these primitives. In short, don’t acquire sleeping locks fromother contexts unless there is no other option.
Sleeping lock types:
- mutex
- rt_mutex
- semaphore
- rw_semaphore
- ww_mutex
- percpu_rw_semaphore
On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
- local_lock
- spinlock_t
- rwlock_t
CPU local locks¶
- local_lock
On non-PREEMPT_RT kernels, local_lock functions are wrappers aroundpreemption and interrupt disabling primitives. Contrary to other lockingmechanisms, disabling preemption or interrupts are pure CPU localconcurrency control mechanisms and not suited for inter-CPU concurrencycontrol.
Spinning locks¶
- raw_spinlock_t
- bit spinlocks
On non-PREEMPT_RT kernels, these lock types are also spinning locks:
- spinlock_t
- rwlock_t
Spinning locks implicitly disable preemption and the lock / unlock functionscan have suffixes which apply further protections:
_bh() Disable / enable bottom halves (soft interrupts) _irq() Disable / enable interrupts _irqsave/restore() Save and disable / restore interrupt disabled state
Owner semantics¶
The aforementioned lock types except semaphores have strict ownersemantics:
The context (task) that acquired the lock must release it.
rw_semaphores have a special interface which allows non-owner release forreaders.
rtmutex¶
RT-mutexes are mutexes with support for priority inheritance (PI).
PI has limitations on non-PREEMPT_RT kernels due to preemption andinterrupt disabled sections.
PI clearly cannot preempt preemption-disabled or interrupt-disabledregions of code, even on PREEMPT_RT kernels. Instead, PREEMPT_RT kernelsexecute most such regions of code in preemptible task context, especiallyinterrupt handlers and soft interrupts. This conversion allows spinlock_tand rwlock_t to be implemented via RT-mutexes.
semaphore¶
semaphore is a counting semaphore implementation.
Semaphores are often used for both serialization and waiting, but new usecases should instead use separate serialization and wait mechanisms, suchas mutexes and completions.
semaphores and PREEMPT_RT¶
PREEMPT_RT does not change the semaphore implementation because countingsemaphores have no concept of owners, thus preventing PREEMPT_RT fromproviding priority inheritance for semaphores. After all, an unknownowner cannot be boosted. As a consequence, blocking on semaphores canresult in priority inversion.
rw_semaphore¶
rw_semaphore is a multiple readers and single writer lock mechanism.
On non-PREEMPT_RT kernels the implementation is fair, thus preventingwriter starvation.
rw_semaphore complies by default with the strict owner semantics, but thereexist special-purpose interfaces that allow non-owner release for readers.These interfaces work independent of the kernel configuration.
rw_semaphore and PREEMPT_RT¶
PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-basedimplementation, thus changing the fairness:
Because an rw_semaphore writer cannot grant its priority to multiplereaders, a preempted low-priority reader will continue holding its lock,thus starving even high-priority writers. In contrast, because readerscan grant their priority to a writer, a preempted low-priority writer willhave its priority boosted until it releases the lock, thus preventing thatwriter from starving readers.
local_lock¶
local_lock provides a named scope to critical sections which are protectedby disabling preemption or interrupts.
On non-PREEMPT_RT kernels local_lock operations map to the preemption andinterrupt disabling and enabling primitives:
local_lock(&llock) preempt_disable() local_unlock(&llock) preempt_enable() local_lock_irq(&llock) local_irq_disable() local_unlock_irq(&llock) local_irq_enable() local_lock_irqsave(&llock) local_irq_save() local_unlock_irqrestore(&llock) local_irq_restore()
The named scope of local_lock has two advantages over the regularprimitives:
- The lock name allows static analysis and is also a clear documentationof the protection scope while the regular primitives are scopeless andopaque.
- If lockdep is enabled the local_lock gains a lockmap which allows tovalidate the correctness of the protection. This can detect cases wheree.g. a function using preempt_disable() as protection mechanism isinvoked from interrupt or soft-interrupt context. Aside of thatlockdep_assert_held(&llock) works as with any other locking primitive.
local_lock and PREEMPT_RT¶
PREEMPT_RT kernels map local_lock to a per-CPU spinlock_t, thus changingsemantics:
- All spinlock_t changes also apply to local_lock.
local_lock usage¶
local_lock should be used in situations where disabling preemption orinterrupts is the appropriate form of concurrency control to protectper-CPU data structures on a non PREEMPT_RT kernel.
local_lock is not suitable to protect against preemption or interrupts on aPREEMPT_RT kernel due to the PREEMPT_RT specific spinlock_t semantics.
raw_spinlock_t and spinlock_t¶
raw_spinlock_t¶
raw_spinlock_t is a strict spinning lock implementation regardless of thekernel configuration including PREEMPT_RT enabled kernels.
raw_spinlock_t is a strict spinning lock implementation in all kernels,including PREEMPT_RT kernels. Use raw_spinlock_t only in real criticalcore code, low-level interrupt handling and places where disablingpreemption or interrupts is required, for example, to safely accesshardware state. raw_spinlock_t can sometimes also be used when thecritical section is tiny, thus avoiding RT-mutex overhead.
spinlock_t¶
The semantics of spinlock_t change with the state of PREEMPT_RT.
On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and hasexactly the same semantics.
spinlock_t and PREEMPT_RT¶
On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementationbased on rt_mutex which changes the semantics:
Preemption is not disabled.
The hard interrupt related suffixes for spin_lock / spin_unlockoperations (_irq, _irqsave / _irqrestore) do not affect the CPU’sinterrupt disabled state.
The soft interrupt related suffix (_bh()) still disables softirqhandlers.
Non-PREEMPT_RT kernels disable preemption to get this effect.
PREEMPT_RT kernels use a per-CPU lock for serialization which keepspreemption disabled. The lock disables softirq handlers and alsoprevents reentrancy due to task preemption.
PREEMPT_RT kernels preserve all other spinlock_t semantics:
Tasks holding a spinlock_t do not migrate. Non-PREEMPT_RT kernelsavoid migration by disabling preemption. PREEMPT_RT kernels insteaddisable migration, which ensures that pointers to per-CPU variablesremain valid even if the task is preempted.
Task state is preserved across spinlock acquisition, ensuring that thetask-state rules apply to all kernel configurations. Non-PREEMPT_RTkernels leave task state untouched. However, PREEMPT_RT must changetask state if the task blocks during acquisition. Therefore, it savesthe current task state before blocking and the corresponding lock wakeuprestores it, as shown below:
task->state = TASK_INTERRUPTIBLE lock() block() task->saved_state = task->state task->state = TASK_UNINTERRUPTIBLE schedule() lock wakeup task->state = task->saved_stateOther types of wakeups would normally unconditionally set the task stateto RUNNING, but that does not work here because the task must remainblocked until the lock becomes available. Therefore, when a non-lockwakeup attempts to awaken a task blocked waiting for a spinlock, itinstead sets the saved state to RUNNING. Then, when the lockacquisition completes, the lock wakeup sets the task state to the savedstate, in this case setting it to RUNNING:
task->state = TASK_INTERRUPTIBLE lock() block() task->saved_state = task->state task->state = TASK_UNINTERRUPTIBLE schedule() non lock wakeup task->saved_state = TASK_RUNNING lock wakeup task->state = task->saved_stateThis ensures that the real wakeup cannot be lost.
rwlock_t¶
rwlock_t is a multiple readers and single writer lock mechanism.
Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and thesuffix rules of spinlock_t apply accordingly. The implementation is fair,thus preventing writer starvation.
rwlock_t and PREEMPT_RT¶
PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-basedimplementation, thus changing semantics:
- All the spinlock_t changes also apply to rwlock_t.
- Because an rwlock_t writer cannot grant its priority to multiplereaders, a preempted low-priority reader will continue holding its lock,thus starving even high-priority writers. In contrast, because readerscan grant their priority to a writer, a preempted low-priority writerwill have its priority boosted until it releases the lock, thuspreventing that writer from starving readers.
PREEMPT_RT caveats¶
local_lock on RT¶
The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a fewimplications. For example, on a non-PREEMPT_RT kernel the following codesequence works as expected:
local_lock_irq(&local_lock);raw_spin_lock(&lock);
and is fully equivalent to:
raw_spin_lock_irq(&lock);
On a PREEMPT_RT kernel this code sequence breaks because local_lock_irq()is mapped to a per-CPU spinlock_t which neither disables interrupts norpreemption. The following code sequence works perfectly correct on bothPREEMPT_RT and non-PREEMPT_RT kernels:
local_lock_irq(&local_lock);spin_lock(&lock);
Another caveat with local locks is that each local_lock has a specificprotection scope. So the following substitution is wrong:
func1(){ local_irq_save(flags); -> local_lock_irqsave(&local_lock_1, flags); func3(); local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_1, flags);}func2(){ local_irq_save(flags); -> local_lock_irqsave(&local_lock_2, flags); func3(); local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_2, flags);}func3(){ lockdep_assert_irqs_disabled(); access_protected_data();}On a non-PREEMPT_RT kernel this works correctly, but on a PREEMPT_RT kernellocal_lock_1 and local_lock_2 are distinct and cannot serialize the callersof func3(). Also the lockdep assert will trigger on a PREEMPT_RT kernelbecause local_lock_irqsave() does not disable interrupts due to thePREEMPT_RT-specific semantics of spinlock_t. The correct substitution is:
func1(){ local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags); func3(); local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);}func2(){ local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags); func3(); local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);}func3(){ lockdep_assert_held(&local_lock); access_protected_data();}spinlock_t and rwlock_t¶
The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernelshave a few implications. For example, on a non-PREEMPT_RT kernel thefollowing code sequence works as expected:
local_irq_disable();spin_lock(&lock);
and is fully equivalent to:
spin_lock_irq(&lock);
Same applies to rwlock_t and the _irqsave() suffix variants.
On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires afully preemptible context. Instead, use spin_lock_irq() orspin_lock_irqsave() and their unlock counterparts. In cases where theinterrupt disabling and locking must remain separate, PREEMPT_RT offers alocal_lock mechanism. Acquiring the local_lock pins the task to a CPU,allowing things like per-CPU interrupt disabled locks to be acquired.However, this approach should be used only where absolutely necessary.
A typical scenario is protection of per-CPU variables in thread context:
struct foo *p = get_cpu_ptr(&var1);spin_lock(&p->lock);p->count += this_cpu_read(var2);
This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT kernelthis breaks. The PREEMPT_RT-specific change of spinlock_t semantics doesnot allow to acquire p->lock because get_cpu_ptr() implicitly disablespreemption. The following substitution works on both kernels:
struct foo *p;migrate_disable();p = this_cpu_ptr(&var1);spin_lock(&p->lock);p->count += this_cpu_read(var2);
On a non-PREEMPT_RT kernel migrate_disable() maps to preempt_disable()which makes the above code fully equivalent. On a PREEMPT_RT kernelmigrate_disable() ensures that the task is pinned on the current CPU whichin turn guarantees that the per-CPU access to var1 and var2 are staying onthe same CPU.
The migrate_disable() substitution is not valid for the followingscenario:
func(){ struct foo *p; migrate_disable(); p = this_cpu_ptr(&var1); p->val = func2();While correct on a non-PREEMPT_RT kernel, this breaks on PREEMPT_RT becausehere migrate_disable() does not protect against reentrancy from apreempting task. A correct substitution for this case is:
func(){ struct foo *p; local_lock(&foo_lock); p = this_cpu_ptr(&var1); p->val = func2();On a non-PREEMPT_RT kernel this protects against reentrancy by disablingpreemption. On a PREEMPT_RT kernel this is achieved by acquiring theunderlying per-CPU spinlock.
raw_spinlock_t on RT¶
Acquiring a raw_spinlock_t disables preemption and possibly alsointerrupts, so the critical section must avoid acquiring a regularspinlock_t or rwlock_t, for example, the critical section must avoidallocating memory. Thus, on a non-PREEMPT_RT kernel the following codeworks perfectly:
raw_spin_lock(&lock);p = kmalloc(sizeof(*p), GFP_ATOMIC);
But this code fails on PREEMPT_RT kernels because the memory allocator isfully preemptible and therefore cannot be invoked from truly atomiccontexts. However, it is perfectly fine to invoke the memory allocatorwhile holding normal non-raw spinlocks because they do not disablepreemption on PREEMPT_RT kernels:
spin_lock(&lock);p = kmalloc(sizeof(*p), GFP_ATOMIC);
bit spinlocks¶
PREEMPT_RT cannot substitute bit spinlocks because a single bit is toosmall to accommodate an RT-mutex. Therefore, the semantics of bitspinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_tcaveats also apply to bit spinlocks.
Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RTusing conditional (#ifdef’ed) code changes at the usage site. In contrast,usage-site changes are not needed for the spinlock_t substitution.Instead, conditionals in header files and the core locking implemementationenable the compiler to do the substitution transparently.
Lock type nesting rules¶
The most basic rules are:
- Lock types of the same lock category (sleeping, CPU local, spinning)can nest arbitrarily as long as they respect the general lock orderingrules to prevent deadlocks.
- Sleeping lock types cannot nest inside CPU local and spinning lock types.
- CPU local and spinning lock types can nest inside sleeping lock types.
- Spinning lock types can nest inside all lock types
These constraints apply both in PREEMPT_RT and otherwise.
The fact that PREEMPT_RT changes the lock category of spinlock_t andrwlock_t from spinning to sleeping and substitutes local_lock with aper-CPU spinlock_t means that they cannot be acquired while holding a rawspinlock. This results in the following nesting ordering:
- Sleeping locks
- spinlock_t, rwlock_t, local_lock
- raw_spinlock_t and bit spinlocks
Lockdep will complain if these constraints are violated, both inPREEMPT_RT and otherwise.