Sequence counters and sequential locks¶
Introduction¶
Sequence counters are a reader-writer consistency mechanism withlockless readers (read-only retry loops), and no writer starvation. Theyare used for data that’s rarely written to (e.g. system time), where thereader wants a consistent set of information and is willing to retry ifthat information changes.
A data set is consistent when the sequence count at the beginning of theread side critical section is even and the same sequence count value isread again at the end of the critical section. The data in the set mustbe copied out inside the read side critical section. If the sequencecount has changed between the start and the end of the critical section,the reader must retry.
Writers increment the sequence count at the start and the end of theircritical section. After starting the critical section the sequence countis odd and indicates to the readers that an update is in progress. Atthe end of the write side critical section the sequence count becomeseven again which lets readers make progress.
A sequence counter write side critical section must never be preemptedor interrupted by read side sections. Otherwise the reader will spin forthe entire scheduler tick due to the odd sequence count value and theinterrupted writer. If that reader belongs to a real-time schedulingclass, it can spin forever and the kernel will livelock.
This mechanism cannot be used if the protected data contains pointers,as the writer can invalidate a pointer that the reader is following.
Sequence counters (seqcount_t)¶
This is the the raw counting mechanism, which does not protect againstmultiple writers. Write side critical sections must thus be serializedby an external lock.
If the write serialization primitive is not implicitly disablingpreemption, preemption must be explicitly disabled before entering thewrite side section. If the read section can be invoked from hardirq orsoftirq contexts, interrupts or bottom halves must also be respectivelydisabled before entering the write section.
If it’s desired to automatically handle the sequence counterrequirements of writer serialization and non-preemptibility, useSequential locks (seqlock_t) instead.
Initialization:
/* dynamic */seqcount_t foo_seqcount;seqcount_init(&foo_seqcount);/* static */static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);/* C99 struct init */struct { .seq = SEQCNT_ZERO(foo.seq),} foo;Write path:
/* Serialized context with disabled preemption */write_seqcount_begin(&foo_seqcount);/* ... [[write-side critical section]] ... */write_seqcount_end(&foo_seqcount);
Read path:
do { seq = read_seqcount_begin(&foo_seqcount); /* ... [[read-side critical section]] ... */} while (read_seqcount_retry(&foo_seqcount, seq));Sequence counters with associated locks (seqcount_LOCKTYPE_t)¶
As discussed atSequence counters (seqcount_t), sequence count write side criticalsections must be serialized and non-preemptible. This variant ofsequence counters associate the lock used for writer serialization atinitialization time, which enables lockdep to validate that the writeside critical sections are properly serialized.
This lock association is a NOOP if lockdep is disabled and has neitherstorage nor runtime overhead. If lockdep is enabled, the lock pointer isstored in struct seqcount and lockdep’s “lock is held” assertions areinjected at the beginning of the write side critical section to validatethat it is properly protected.
For lock types which do not implicitly disable preemption, preemptionprotection is enforced in the write side function.
The following sequence counters with associated locks are defined:
seqcount_spinlock_tseqcount_raw_spinlock_tseqcount_rwlock_tseqcount_mutex_tseqcount_ww_mutex_t
The plain seqcount read and write APIs branch out to the specificseqcount_LOCKTYPE_t implementation at compile-time. This avoids kernelAPI explosion per each new seqcount LOCKTYPE.
Initialization (replace “LOCKTYPE” with one of the supported locks):
/* dynamic */seqcount_LOCKTYPE_t foo_seqcount;seqcount_LOCKTYPE_init(&foo_seqcount, &lock);/* static */static seqcount_LOCKTYPE_t foo_seqcount = SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);/* C99 struct init */struct { .seq = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),} foo;Write path: same as inSequence counters (seqcount_t), while running from a contextwith the associated LOCKTYPE lock acquired.
Read path: same as inSequence counters (seqcount_t).
Sequential locks (seqlock_t)¶
This contains theSequence counters (seqcount_t) mechanism earlier discussed, plus anembedded spinlock for writer serialization and non-preemptibility.
If the read side section can be invoked from hardirq or softirq context,use the write side function variants which disable interrupts or bottomhalves respectively.
Initialization:
/* dynamic */seqlock_t foo_seqlock;seqlock_init(&foo_seqlock);/* static */static DEFINE_SEQLOCK(foo_seqlock);/* C99 struct init */struct { .seql = __SEQLOCK_UNLOCKED(foo.seql)} foo;Write path:
write_seqlock(&foo_seqlock);/* ... [[write-side critical section]] ... */write_sequnlock(&foo_seqlock);
Read path, three categories:
Normal Sequence readers which never block a writer but they mustretry if a writer is in progress by detecting change in the sequencenumber. Writers do not wait for a sequence reader:
do { seq = read_seqbegin(&foo_seqlock); /* ... [[read-side critical section]] ... */} while (read_seqretry(&foo_seqlock, seq));Locking readers which will wait if a writer or another locking readeris in progress. A locking reader in progress will also block a writerfrom entering its critical section. This read lock isexclusive. Unlike rwlock_t, only one locking reader can acquire it:
read_seqlock_excl(&foo_seqlock);/* ... [[read-side critical section]] ... */read_sequnlock_excl(&foo_seqlock);
Conditional lockless reader (as in 1), or locking reader (as in 2),according to a passed marker. This is used to avoid lockless readersstarvation (too much retry loops) in case of a sharp spike in writeactivity. First, a lockless read is tried (even marker passed). Ifthat trial fails (odd sequence counter is returned, which is used asthe next iteration marker), the lockless read is transformed to afull locking read and no retry loop is necessary:
/* marker; even initialization */int seq = 0;do { read_seqbegin_or_lock(&foo_seqlock, &seq); /* ... [[read-side critical section]] ... */} while (need_seqretry(&foo_seqlock, seq));done_seqretry(&foo_seqlock, seq);
API documentation¶
seqcount_init(s)¶runtime initializer for seqcount_t
Parameters
s- Pointer to the seqcount_t instance
SEQCNT_ZERO(name)¶static initializer for seqcount_t
Parameters
name- Name of the seqcount_t instance
- typedef
seqcount¶ sequence counter with LOCKTYPR associated
Description
A plain sequence counter with external writer synchronization by aspinlock. The spinlock is associated to the sequence count in thestatic initializer or init function. This enables lockdep to validatethat the write side critical section is properly serialized.
SEQCOUNT_LOCKTYPE_ZERO(seq_name,assoc_lock)¶static initializer for seqcount_LOCKNAME_t
Parameters
seq_name- undescribed
assoc_lock- undescribed
__read_seqcount_begin(s)¶begin a seqcount_t read section w/o barrier
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Description
__read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()barrier. Callers should ensure that smp_rmb() or equivalent ordering isprovided before actually loading any of the variables that are to beprotected in this critical section.
Use carefully, only in critical code, and comment how the barrier isprovided.
Return
count to be passed toread_seqcount_retry()
raw_read_seqcount_begin(s)¶begin a seqcount_t read section w/o lockdep
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Return
count to be passed toread_seqcount_retry()
read_seqcount_begin(s)¶begin a seqcount_t read critical section
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Return
count to be passed toread_seqcount_retry()
raw_read_seqcount(s)¶read the raw seqcount_t counter value
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Description
raw_read_seqcount opens a read critical section of the givenseqcount_t, without any lockdep checking, and without checking ormasking the sequence counter LSB. Calling code is responsible forhandling that.
Return
count to be passed toread_seqcount_retry()
raw_seqcount_begin(s)¶begin a seqcount_t read critical section w/o lockdep and w/o counter stabilization
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Description
raw_seqcount_begin opens a read critical section of the givenseqcount_t. Unlikeread_seqcount_begin(), this function will not waitfor the count to stabilize. If a writer is active when it begins, itwill fail theread_seqcount_retry() at the end of the read criticalsection instead of stabilizing at the beginning of it.
Use this only in special kernel hot paths where the read section issmall and has a high probability of success through other externalmeans. It will save a single branching instruction.
Return
count to be passed toread_seqcount_retry()
__read_seqcount_retry(s,start)¶end a seqcount_t read section w/o barrier
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
start- count, from
read_seqcount_begin()
Description
__read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()barrier. Callers should ensure that smp_rmb() or equivalent ordering isprovided before actually loading any of the variables that are to beprotected in this critical section.
Use carefully, only in critical code, and comment how the barrier isprovided.
Return
true if a read section retry is required, else false
read_seqcount_retry(s,start)¶end a seqcount_t read critical section
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
start- count, from
read_seqcount_begin()
Description
read_seqcount_retry closes the read critical section of givenseqcount_t. If the critical section was invalid, it must be ignored(and typically retried).
Return
true if a read section retry is required, else false
raw_write_seqcount_begin(s)¶start a seqcount_t write section w/o lockdep
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
raw_write_seqcount_end(s)¶end a seqcount_t write section w/o lockdep
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
write_seqcount_begin_nested(s,subclass)¶start a seqcount_t write section with custom lockdep nesting level
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
subclass- lockdep nesting level
Description
See Documentation/locking/lockdep-design.rst
write_seqcount_begin(s)¶start a seqcount_t write side critical section
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Description
write_seqcount_begin opens a write side critical section of the givenseqcount_t.
Context
seqcount_t write side critical sections must be serialized andnon-preemptible. If readers can be invoked from hardirq or softirqcontext, interrupts or bottom halves must be respectively disabled.
write_seqcount_end(s)¶end a seqcount_t write side critical section
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Description
The write section must’ve been opened withwrite_seqcount_begin().
raw_write_seqcount_barrier(s)¶do a seqcount_t write barrier
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Description
This can be used to provide an ordering guarantee instead of the usualconsistency guarantee. It is one wmb cheaper, because it can collapsethe two back-to-back wmb()s.
Note that writes surrounding the barrier should be declared atomic (e.g.via WRITE_ONCE): a) to ensure the writes become visible to other threadsatomically, avoiding compiler optimizations; b) to document which writes aremeant to propagate to the reader critical section. This is necessary becauseneither writes before and after the barrier are enclosed in a seq-writercritical section that would ensure readers are aware of ongoing writes:
seqcount_t seq;bool X = true, Y = false;void read(void){ bool x, y; do { int s = read_seqcount_begin(&seq); x = X; y = Y; } while (read_seqcount_retry(&seq, s)); BUG_ON(!x && !y);}void write(void){ WRITE_ONCE(Y, true); raw_write_seqcount_barrier(seq); WRITE_ONCE(X, false);}write_seqcount_invalidate(s)¶invalidate in-progress seqcount_t read side operations
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Description
After write_seqcount_invalidate, no seqcount_t read side operationswill complete successfully and see data older than this.
raw_read_seqcount_latch(s)¶pick even/odd seqcount_t latch data copy
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Description
Use seqcount_t latching to switch between two storage places protectedby a sequence counter. Doing so allows having interruptible, preemptible,seqcount_t write side critical sections.
Checkraw_write_seqcount_latch() for more details and a full reader andwriter usage example.
Return
sequence counter raw value. Use the lowest bit as an index forpicking which data copy to read. The full counter value must then bechecked withread_seqcount_retry().
raw_write_seqcount_latch(s)¶redirect readers to even/odd copy
Parameters
s- Pointer to seqcount_t or any of the seqcount_locktype_t variants
Description
The latch technique is a multiversion concurrency control method that allowsqueries during non-atomic modifications. If you can guarantee queries neverinterrupt the modification – e.g. the concurrency is strictly between CPUs– you most likely do not need this.
Where the traditional RCU/lockless data structures rely on atomicmodifications to ensure queries observe either the old or the new state thelatch allows the same for non-atomic updates. The trade-off is doubling thecost of storage; we have to maintain two copies of the entire datastructure.
Very simply put: we first modify one copy and then the other. This ensuresthere is always one copy in a stable state, ready to give us an answer.
The basic form is a data structure like:
struct latch_struct { seqcount_t seq; struct data_struct data[2];};Where a modification, which is assumed to be externally serialized, does thefollowing:
void latch_modify(struct latch_struct *latch, ...){ smp_wmb(); // Ensure that the last data[1] update is visible latch->seq++; smp_wmb(); // Ensure that the seqcount update is visible modify(latch->data[0], ...); smp_wmb(); // Ensure that the data[0] update is visible latch->seq++; smp_wmb(); // Ensure that the seqcount update is visible modify(latch->data[1], ...);}The query will have a form like:
struct entry *latch_query(struct latch_struct *latch, ...){ struct entry *entry; unsigned seq, idx; do { seq = raw_read_seqcount_latch(&latch->seq); idx = seq & 0x01; entry = data_query(latch->data[idx], ...); // read_seqcount_retry() includes needed smp_rmb() } while (read_seqcount_retry(&latch->seq, seq)); return entry;}So during the modification, queries are first redirected to data[1]. Then wemodify data[0]. When that is complete, we redirect queries back to data[0]and we can modify data[1].
NOTE
The non-requirement for atomic modifications does _NOT_ includethe publishing of new entries in the case where data is a dynamicdata structure.
An iteration might start in data[0] and get suspended long enoughto miss an entire modification sequence, once it resumes it mightobserve the new entry.
When data is a dynamic data structure; one should use regular RCUpatterns to manage the lifetimes of the objects within.
seqlock_init(sl)¶dynamic initializer for seqlock_t
Parameters
sl- Pointer to the seqlock_t instance
DEFINE_SEQLOCK(sl)¶Define a statically allocated seqlock_t
Parameters
sl- Name of the seqlock_t instance
- unsigned
read_seqbegin(const seqlock_t * sl)¶ start a seqlock_t read side critical section
Parameters
constseqlock_t*sl- Pointer to seqlock_t
Return
count, to be passed toread_seqretry()
- unsigned
read_seqretry(const seqlock_t * sl, unsigned start)¶ end a seqlock_t read side section
Parameters
constseqlock_t*sl- Pointer to seqlock_t
unsignedstart- count, from
read_seqbegin()
Description
read_seqretry closes the read side critical section of given seqlock_t.If the critical section was invalid, it must be ignored (and typicallyretried).
Return
true if a read section retry is required, else false
- void
write_seqlock(seqlock_t * sl)¶ start a seqlock_t write side critical section
Parameters
seqlock_t*sl- Pointer to seqlock_t
Description
write_seqlock opens a write side critical section for the givenseqlock_t. It also implicitly acquires the spinlock_t embedded insidethat sequential lock. All seqlock_t write side sections are thusautomatically serialized and non-preemptible.
Context
if the seqlock_t read section, or other write side criticalsections, can be invoked from hardirq or softirq contexts, use the_irqsave or _bh variants of this function instead.
- void
write_sequnlock(seqlock_t * sl)¶ end a seqlock_t write side critical section
Parameters
seqlock_t*sl- Pointer to seqlock_t
Description
write_sequnlock closes the (serialized and non-preemptible) write sidecritical section of given seqlock_t.
- void
write_seqlock_bh(seqlock_t * sl)¶ start a softirqs-disabled seqlock_t write section
Parameters
seqlock_t*sl- Pointer to seqlock_t
Description
_bh variant ofwrite_seqlock(). Use only if the read side section, orother write side sections, can be invoked from softirq contexts.
- void
write_sequnlock_bh(seqlock_t * sl)¶ end a softirqs-disabled seqlock_t write section
Parameters
seqlock_t*sl- Pointer to seqlock_t
Description
write_sequnlock_bh closes the serialized, non-preemptible, andsoftirqs-disabled, seqlock_t write side critical section opened withwrite_seqlock_bh().
- void
write_seqlock_irq(seqlock_t * sl)¶ start a non-interruptible seqlock_t write section
Parameters
seqlock_t*sl- Pointer to seqlock_t
Description
_irq variant ofwrite_seqlock(). Use only if the read side section, orother write sections, can be invoked from hardirq contexts.
- void
write_sequnlock_irq(seqlock_t * sl)¶ end a non-interruptible seqlock_t write section
Parameters
seqlock_t*sl- Pointer to seqlock_t
Description
write_sequnlock_irq closes the serialized and non-interruptibleseqlock_t write side section opened withwrite_seqlock_irq().
write_seqlock_irqsave(lock,flags)¶start a non-interruptible seqlock_t write section
Parameters
lock- Pointer to seqlock_t
flags- Stack-allocated storage for saving caller’s local interruptstate, to be passed to
write_sequnlock_irqrestore().
Description
_irqsave variant ofwrite_seqlock(). Use it only if the read sidesection, or other write sections, can be invoked from hardirq context.
- void
write_sequnlock_irqrestore(seqlock_t * sl, unsigned long flags)¶ end non-interruptible seqlock_t write section
Parameters
seqlock_t*sl- Pointer to seqlock_t
unsignedlongflags- Caller’s saved interrupt state, from
write_seqlock_irqsave()
Description
write_sequnlock_irqrestore closes the serialized and non-interruptibleseqlock_t write section previously opened withwrite_seqlock_irqsave().
- void
read_seqlock_excl(seqlock_t * sl)¶ begin a seqlock_t locking reader section
Parameters
seqlock_t*sl- Pointer to seqlock_t
Description
read_seqlock_excl opens a seqlock_t locking reader critical section. Alocking reader exclusively locks outboth other writersand otherlocking readers, but it does not update the embedded sequence number.
Locking readers act like a normal spin_lock()/spin_unlock().
The opened read section must be closed withread_sequnlock_excl().
Context
if the seqlock_t write section,or other read sections, canbe invoked from hardirq or softirq contexts, use the _irqsave or _bhvariant of this function instead.
- void
read_sequnlock_excl(seqlock_t * sl)¶ end a seqlock_t locking reader critical section
Parameters
seqlock_t*sl- Pointer to seqlock_t
- void
read_seqlock_excl_bh(seqlock_t * sl)¶ start a seqlock_t locking reader section with softirqs disabled
Parameters
seqlock_t*sl- Pointer to seqlock_t
Description
_bh variant ofread_seqlock_excl(). Use this variant only if theseqlock_t write side section,or other read sections, can be invokedfrom softirq contexts.
- void
read_sequnlock_excl_bh(seqlock_t * sl)¶ stop a seqlock_t softirq-disabled locking reader section
Parameters
seqlock_t*sl- Pointer to seqlock_t
- void
read_seqlock_excl_irq(seqlock_t * sl)¶ start a non-interruptible seqlock_t locking reader section
Parameters
seqlock_t*sl- Pointer to seqlock_t
Description
_irq variant ofread_seqlock_excl(). Use this only if the seqlock_twrite side section,or other read sections, can be invoked from ahardirq context.
- void
read_sequnlock_excl_irq(seqlock_t * sl)¶ end an interrupts-disabled seqlock_t locking reader section
Parameters
seqlock_t*sl- Pointer to seqlock_t
read_seqlock_excl_irqsave(lock,flags)¶start a non-interruptible seqlock_t locking reader section
Parameters
lock- Pointer to seqlock_t
flags- Stack-allocated storage for saving caller’s local interruptstate, to be passed to
read_sequnlock_excl_irqrestore().
Description
_irqsave variant ofread_seqlock_excl(). Use this only if the seqlock_twrite side section,or other read sections, can be invoked from ahardirq context.
- void
read_sequnlock_excl_irqrestore(seqlock_t * sl, unsigned long flags)¶ end non-interruptible seqlock_t locking reader section
Parameters
seqlock_t*sl- Pointer to seqlock_t
unsignedlongflags- Caller saved interrupt state, from
read_seqlock_excl_irqsave()
- void
read_seqbegin_or_lock(seqlock_t * lock, int * seq)¶ begin a seqlock_t lockless or locking reader
Parameters
seqlock_t*lock- Pointer to seqlock_t
int*seq- Marker and return parameter. If the passed value is even, thereader will become alockless seqlock_t reader as in
read_seqbegin().If the passed value is odd, the reader will become alocking readeras inread_seqlock_excl(). In the first call to this function, thecallermust initialize and pass an even value toseq; this way, alockless read can be optimistically tried first.
Description
read_seqbegin_or_lock is an API designed to optimistically try a normallockless seqlock_t read section first. If an odd counter is found, thelockless read trial has failed, and the next read iteration transformsitself into a full seqlock_t locking reader.
This is typically used to avoid seqlock_t lockless readers starvation(too much retry loops) in the case of a sharp spike in write sideactivity.
Check Documentation/locking/seqlock.rst for template example code.
Context
if the seqlock_t write section,or other read sections, canbe invoked from hardirq or softirq contexts, use the _irqsave or _bhvariant of this function instead.
Return
the encountered sequence counter value, through theseqparameter, which is overloaded as a return parameter. This returnedvalue must be checked withneed_seqretry(). If the read section need tobe retried, this returned value must also be passed as theseqparameter of the nextread_seqbegin_or_lock() iteration.
- int
need_seqretry(seqlock_t * lock, int seq)¶ validate seqlock_t “locking or lockless” read section
Parameters
seqlock_t*lock- Pointer to seqlock_t
intseq- sequence count, from
read_seqbegin_or_lock()
Return
true if a read section retry is required, false otherwise
- void
done_seqretry(seqlock_t * lock, int seq)¶ end seqlock_t “locking or lockless” reader section
Parameters
seqlock_t*lock- Pointer to seqlock_t
intseq- count, from
read_seqbegin_or_lock()
Description
done_seqretry finishes the seqlock_t read side critical section startedwithread_seqbegin_or_lock() and validated byneed_seqretry().
- unsigned long
read_seqbegin_or_lock_irqsave(seqlock_t * lock, int * seq)¶ begin a seqlock_t lockless reader, or a non-interruptible locking reader
Parameters
seqlock_t*lock- Pointer to seqlock_t
int*seq- Marker and return parameter. Check
read_seqbegin_or_lock().
Description
This is the _irqsave variant ofread_seqbegin_or_lock(). Use it only ifthe seqlock_t write section,or other read sections, can be invokedfrom hardirq context.
Note
Interrupts will be disabled only for “locking reader” mode.
Return
- The saved local interrupts state in case of a locking reader, tobe passed to
done_seqretry_irqrestore().- The encountered sequence counter value, returned throughseqoverloaded as a return parameter. Check
read_seqbegin_or_lock().
- void
done_seqretry_irqrestore(seqlock_t * lock, int seq, unsigned long flags)¶ end a seqlock_t lockless reader, or a non-interruptible locking reader section
Parameters
seqlock_t*lock- Pointer to seqlock_t
intseq- Count, from
read_seqbegin_or_lock_irqsave() unsignedlongflags- Caller’s saved local interrupt state in case of a lockingreader, also from
read_seqbegin_or_lock_irqsave()
Description
This is the _irqrestore variant ofdone_seqretry(). The read sectionmust’ve been opened withread_seqbegin_or_lock_irqsave(), and validatedbyneed_seqretry().