The errseq_t datatype¶
An errseq_t is a way of recording errors in one place, and allowing anynumber of “subscribers” to tell whether it has changed since a previouspoint where it was sampled.
The initial use case for this is tracking errors for filesynchronization syscalls (fsync, fdatasync, msync and sync_file_range),but it may be usable in other situations.
It’s implemented as an unsigned 32-bit value. The low order bits aredesignated to hold an error code (between 1 and MAX_ERRNO). The upper bitsare used as a counter. This is done with atomics instead of locking so thatthese functions can be called from any context.
Note that there is a risk of collisions if new errors are being recordedfrequently, since we have so few bits to use as a counter.
To mitigate this, the bit between the error value and counter is used asa flag to tell whether the value has been sampled since a new value wasrecorded. That allows us to avoid bumping the counter if no one hassampled it since the last time an error was recorded.
Thus we end up with a value that looks something like this:
31..13 | 12 | 11..0 |
counter | SF | errno |
The general idea is for “watchers” to sample an errseq_t value and keepit as a running cursor. That value can later be used to tell whetherany new errors have occurred since that sampling was done, and atomicallyrecord the state at the time that it was checked. This allows us torecord errors in one place, and then have a number of “watchers” thatcan tell whether the value has changed since they last checked it.
A new errseq_t should always be zeroed out. An errseq_t value of all zeroesis the special (but common) case where there has never been an error. An allzero value thus serves as the “epoch” if one wishes to know whether therehas ever been an error set since it was first initialized.
API usage¶
Let me tell you a story about a worker drone. Now, he’s a good workeroverall, but the company is a little...management heavy. He has toreport to 77 supervisors today, and tomorrow the “big boss” is coming infrom out of town and he’s sure to test the poor fellow too.
They’re all handing him work to do -- so much he can’t keep track of whohanded him what, but that’s not really a big problem. The supervisorsjust want to know when he’s finished all of the work they’ve handed him sofar and whether he made any mistakes since they last asked.
He might have made the mistake on work they didn’t actually hand him,but he can’t keep track of things at that level of detail, all he canremember is the most recent mistake that he made.
Here’s our worker_drone representation:
struct worker_drone { errseq_t wd_err; /* for recording errors */};Every day, the worker_drone starts out with a blank slate:
struct worker_drone wd;wd.wd_err = (errseq_t)0;
The supervisors come in and get an initial read for the day. Theydon’t care about anything that happened before their watch begins:
struct supervisor { errseq_t s_wd_err; /* private "cursor" for wd_err */ spinlock_t s_wd_err_lock; /* protects s_wd_err */}struct supervisor su;su.s_wd_err = errseq_sample(&wd.wd_err);spin_lock_init(&su.s_wd_err_lock);Now they start handing him tasks to do. Every few minutes they ask him tofinish up all of the work they’ve handed him so far. Then they ask himwhether he made any mistakes on any of it:
spin_lock(&su.su_wd_err_lock);err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);spin_unlock(&su.su_wd_err_lock);
Up to this point, that just keeps returning 0.
Now, the owners of this company are quite miserly and have given himsubstandard equipment with which to do his job. Occasionally itglitches and he makes a mistake. He sighs a heavy sigh, and marks itdown:
errseq_set(&wd.wd_err, -EIO);
...and then gets back to work. The supervisors eventually poll againand they each get the error when they next check. Subsequent calls willreturn 0, until another error is recorded, at which point it’s reportedto each of them once.
Note that the supervisors can’t tell how many mistakes he made, onlywhether one was made since they last checked, and the latest valuerecorded.
Occasionally the big boss comes in for a spot check and asks the workerto do a one-off job for him. He’s not really watching the workerfull-time like the supervisors, but he does need to know whether amistake occurred while his job was processing.
He can just sample the current errseq_t in the worker, and then use thatto tell whether an error has occurred later:
errseq_t since = errseq_sample(&wd.wd_err);/* submit some work and wait for it to complete */err = errseq_check(&wd.wd_err, since);
Since he’s just going to discard “since” after that point, he doesn’tneed to advance it here. He also doesn’t need any locking since it’snot usable by anyone else.
Serializing errseq_t cursor updates¶
Note that the errseq_t API does not protect the errseq_t cursor during acheck_and_advance_operation. Only the canonical error code is handledatomically. In a situation where more than one task might be using thesame errseq_t cursor at the same time, it’s important to serializeupdates to that cursor.
If that’s not done, then it’s possible for the cursor to go backwardin which case the same error could be reported more than once.
Because of this, it’s often advantageous to first do an errseq_check tosee if anything has changed, and only later do anerrseq_check_and_advance after taking the lock. e.g.:
if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) { /* su.s_wd_err is protected by s_wd_err_lock */ spin_lock(&su.s_wd_err_lock); err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err); spin_unlock(&su.s_wd_err_lock);}That avoids the spinlock in the common case where nothing has changedsince the last time it was checked.
Functions¶
- errseq_terrseq_set(errseq_t*eseq,interr)¶
set a errseq_t for later reporting
Parameters
errseq_t*eseqerrseq_t field that should be set
interrerror to set (must be between -1 and -MAX_ERRNO)
Description
This function sets the error ineseq, and increments the sequence counterif the last sequence was sampled at some point in the past.
Any error set will always overwrite an existing error.
Return
The previous value, primarily for debugging purposes. Thereturn value should not be used as a previously sampled value in latercalls as it will not have the SEEN flag set.
- errseq_terrseq_sample(errseq_t*eseq)¶
Grab current errseq_t value.
Parameters
errseq_t*eseqPointer to errseq_t to be sampled.
Description
This function allows callers to initialise their errseq_t variable.If the error has been “seen”, new callers will not see an old error.If there is an unseen error ineseq, the caller of this function willsee it the next time it checks for an error.
Context
Any context.
Return
The current errseq value.
- interrseq_check(errseq_t*eseq,errseq_tsince)¶
Has an error occurred since a particular sample point?
Parameters
errseq_t*eseqPointer to errseq_t value to be checked.
errseq_tsincePreviously-sampled errseq_t from which to check.
Description
Grab the value that eseq points to, and see if it has changedsincethe given value was sampled. Thesince value is not advanced, so thereis no need to mark the value as seen.
Return
The latest error set in the errseq_t or 0 if it hasn’t changed.
- interrseq_check_and_advance(errseq_t*eseq,errseq_t*since)¶
Check an errseq_t and advance to current value.
Parameters
errseq_t*eseqPointer to value being checked and reported.
errseq_t*sincePointer to previously-sampled errseq_t to check against and advance.
Description
Grab the eseq value, and see whether it matches the value thatsincepoints to. If it does, then just return 0.
If it doesn’t, then the value has changed. Set the “seen” flag, and try toswap it into place as the new eseq value. Then, set that value as the new“since” value, and return whatever the error portion is set to.
Note that no locking is provided here for concurrent updates to the “since”value. The caller must provide that if necessary. Because of this, callersmay want to do a lockless errseq_check before taking the lock and callingthis.
Return
Negative errno if one has been stored, or 0 if no new error hasoccurred.