The Linux Kernel API¶
Basic C Library Functions¶
When writing drivers, you cannot in general use routines which are fromthe C Library. Some of the functions have been found generally usefuland they are listed below. The behaviour of these functions may varyslightly from those defined by ANSI, and these deviations are noted inthe text.
String Conversions¶
- unsignedlonglongsimple_strtoull(constchar*cp,char**endp,unsignedintbase)¶
convert a string to an unsigned long long
Parameters
constchar*cpThe start of the string
char**endpA pointer to the end of the parsed string will be placed here
unsignedintbaseThe number base to use
Description
This function has caveats. Please use kstrtoull instead.
- unsignedlongsimple_strtoul(constchar*cp,char**endp,unsignedintbase)¶
convert a string to an unsigned long
Parameters
constchar*cpThe start of the string
char**endpA pointer to the end of the parsed string will be placed here
unsignedintbaseThe number base to use
Description
This function has caveats. Please use kstrtoul instead.
- longsimple_strtol(constchar*cp,char**endp,unsignedintbase)¶
convert a string to a signed long
Parameters
constchar*cpThe start of the string
char**endpA pointer to the end of the parsed string will be placed here
unsignedintbaseThe number base to use
Description
This function has caveats. Please use kstrtol instead.
- longlongsimple_strtoll(constchar*cp,char**endp,unsignedintbase)¶
convert a string to a signed long long
Parameters
constchar*cpThe start of the string
char**endpA pointer to the end of the parsed string will be placed here
unsignedintbaseThe number base to use
Description
This function has caveats. Please use kstrtoll instead.
- intvsnprintf(char*buf,size_tsize,constchar*fmt_str,va_listargs)¶
Format a string and place it in a buffer
Parameters
char*bufThe buffer to place the result into
size_tsizeThe size of the buffer, including the trailing null space
constchar*fmt_strThe format string to use
va_listargsArguments for the format string
Description
This function generally follows C99 vsnprintf, but has someextensions and a few limitations:
``n``is unsupported
``p*``is handled bypointer()
Seepointer() orHow to get printk format specifiers right for moreextensive description.
Please update the documentation in both places when making changes
The return value is the number of characters which wouldbe generated for the given input, excluding the trailing‘0’, as per ISO C99. If you want to have the exactnumber of characters written intobuf as return value(not including the trailing ‘0’), usevscnprintf(). If thereturn is greater than or equal tosize, the resultingstring is truncated.
If you’re not already dealing with a va_list consider usingsnprintf().
- intvscnprintf(char*buf,size_tsize,constchar*fmt,va_listargs)¶
Format a string and place it in a buffer
Parameters
char*bufThe buffer to place the result into
size_tsizeThe size of the buffer, including the trailing null space
constchar*fmtThe format string to use
va_listargsArguments for the format string
Description
The return value is the number of characters which have been written intothebuf not including the trailing ‘0’. Ifsize is == 0 the functionreturns 0.
If you’re not already dealing with a va_list consider usingscnprintf().
See thevsnprintf() documentation for format string extensions over C99.
- intsnprintf(char*buf,size_tsize,constchar*fmt,...)¶
Format a string and place it in a buffer
Parameters
char*bufThe buffer to place the result into
size_tsizeThe size of the buffer, including the trailing null space
constchar*fmtThe format string to use
...Arguments for the format string
Description
The return value is the number of characters which would begenerated for the given input, excluding the trailing null,as per ISO C99. If the return is greater than or equal tosize, the resulting string is truncated.
See thevsnprintf() documentation for format string extensions over C99.
- intscnprintf(char*buf,size_tsize,constchar*fmt,...)¶
Format a string and place it in a buffer
Parameters
char*bufThe buffer to place the result into
size_tsizeThe size of the buffer, including the trailing null space
constchar*fmtThe format string to use
...Arguments for the format string
Description
The return value is the number of characters written intobuf not includingthe trailing ‘0’. Ifsize is == 0 the function returns 0.
- intvsprintf(char*buf,constchar*fmt,va_listargs)¶
Format a string and place it in a buffer
Parameters
char*bufThe buffer to place the result into
constchar*fmtThe format string to use
va_listargsArguments for the format string
Description
The return value is the number of characters written intobuf not includingthe trailing ‘0’. Usevsnprintf() orvscnprintf() in order to avoidbuffer overflows.
If you’re not already dealing with a va_list consider usingsprintf().
See thevsnprintf() documentation for format string extensions over C99.
- intsprintf(char*buf,constchar*fmt,...)¶
Format a string and place it in a buffer
Parameters
char*bufThe buffer to place the result into
constchar*fmtThe format string to use
...Arguments for the format string
Description
The return value is the number of characters written intobuf not includingthe trailing ‘0’. Usesnprintf() orscnprintf() in order to avoidbuffer overflows.
See thevsnprintf() documentation for format string extensions over C99.
- intvbin_printf(u32*bin_buf,size_tsize,constchar*fmt_str,va_listargs)¶
Parse a format string and place args’ binary value in a buffer
Parameters
u32*bin_bufThe buffer to place args’ binary value
size_tsizeThe size of the buffer(by words(32bits), not characters)
constchar*fmt_strThe format string to use
va_listargsArguments for the format string
Description
The format follows C99 vsnprintf, exceptn is ignored, and its argumentis skipped.
The return value is the number of words(32bits) which would be generated forthe given input.
NOTE
If the return value is greater thansize, the resulting bin_buf is NOTvalid forbstr_printf().
- intbstr_printf(char*buf,size_tsize,constchar*fmt_str,constu32*bin_buf)¶
Format a string from binary arguments and place it in a buffer
Parameters
char*bufThe buffer to place the result into
size_tsizeThe size of the buffer, including the trailing null space
constchar*fmt_strThe format string to use
constu32*bin_bufBinary arguments for the format string
Description
This function like C99 vsnprintf, but the difference is that vsnprintf getsarguments from stack, and bstr_printf gets arguments frombin_buf which isa binary buffer that generated by vbin_printf.
- The format follows C99 vsnprintf, but has some extensions:
see vsnprintf comment for details.
The return value is the number of characters which wouldbe generated for the given input, excluding the trailing‘0’, as per ISO C99. If you want to have the exactnumber of characters written intobuf as return value(not including the trailing ‘0’), usevscnprintf(). If thereturn is greater than or equal tosize, the resultingstring is truncated.
- intvsscanf(constchar*buf,constchar*fmt,va_listargs)¶
Unformat a buffer into a list of arguments
Parameters
constchar*bufinput buffer
constchar*fmtformat of buffer
va_listargsarguments
- intsscanf(constchar*buf,constchar*fmt,...)¶
Unformat a buffer into a list of arguments
Parameters
constchar*bufinput buffer
constchar*fmtformatting of buffer
...resulting arguments
- intkstrtoul(constchar*s,unsignedintbase,unsignedlong*res)¶
convert a string to an unsigned long
Parameters
constchar*sThe start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbaseThe number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedlong*resWhere to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoul(). Return code must be checked.
- intkstrtol(constchar*s,unsignedintbase,long*res)¶
convert a string to a long
Parameters
constchar*sThe start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbaseThe number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
long*resWhere to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtol(). Return code must be checked.
- intkstrtoull(constchar*s,unsignedintbase,unsignedlonglong*res)¶
convert a string to an unsigned long long
Parameters
constchar*sThe start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbaseThe number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedlonglong*resWhere to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoull(). Return code must be checked.
- intkstrtoll(constchar*s,unsignedintbase,longlong*res)¶
convert a string to a long long
Parameters
constchar*sThe start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbaseThe number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
longlong*resWhere to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoll(). Return code must be checked.
- intkstrtouint(constchar*s,unsignedintbase,unsignedint*res)¶
convert a string to an unsigned int
Parameters
constchar*sThe start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbaseThe number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedint*resWhere to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoul(). Return code must be checked.
- intkstrtoint(constchar*s,unsignedintbase,int*res)¶
convert a string to an int
Parameters
constchar*sThe start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbaseThe number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
int*resWhere to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtol(). Return code must be checked.
- intkstrtobool(constchar*s,bool*res)¶
convert common user inputs into boolean values
Parameters
constchar*sinput string
bool*resresult
Description
This routine returns 0 iff the first character is one of ‘YyTt1NnFf0’, or[oO][NnFf] for “on” and “off”. Otherwise it will return -EINVAL. Valuepointed to by res is updated upon finding a match.
- intstring_get_size(u64size,u64blk_size,constenumstring_size_unitsunits,char*buf,intlen)¶
get the size in the specified units
Parameters
u64sizeThe size to be converted in blocks
u64blk_sizeSize of the block (use 1 for size in bytes)
constenumstring_size_unitsunitsUnits to use (powers of 1000 or 1024), whether to include space separator
char*bufbuffer to format to
intlenlength of buffer
Description
This function returns a string formatted to 3 significant figuresgiving the size in the required units.buf should have room forat least 9 bytes and will always be zero terminated.
Return value: number of characters of output that would have been written(which may be greater than len, if output was truncated).
- intparse_int_array_user(constchar__user*from,size_tcount,int**array)¶
Split string into a sequence of integers
Parameters
constchar__user*fromThe user space buffer to read from
size_tcountThe maximum number of bytes to read
int**arrayReturned pointer to sequence of integers
Description
On successarray is allocated and initialized with a sequence ofintegers extracted from thefrom plus an additional element thatbegins the sequence and specifies the integers count.
Caller takes responsibility for freeingarray when it is no longerneeded.
- intstring_unescape(char*src,char*dst,size_tsize,unsignedintflags)¶
unquote characters in the given string
Parameters
char*srcsource buffer (escaped)
char*dstdestination buffer (unescaped)
size_tsizesize of the destination buffer (0 to unlimit)
unsignedintflagscombination of the flags.
Description
The function unquotes characters in the given string.
Because the size of the output will be the same as or less than the size ofthe input, the transformation may be performed in place.
Caller must provide valid source and destination pointers. Be aware thatdestination buffer will always be NULL-terminated. Source string must beNULL-terminated as well. The supported flags are:
UNESCAPE_SPACE: '\f' - form feed '\n' - new line '\r' - carriage return '\t' - horizontal tab '\v' - vertical tabUNESCAPE_OCTAL: '\NNN' - byte with octal value NNN (1 to 3 digits)UNESCAPE_HEX: '\xHH' - byte with hexadecimal value HH (1 to 2 digits)UNESCAPE_SPECIAL: '\"' - double quote '\\' - backslash '\a' - alert (BEL) '\e' - escapeUNESCAPE_ANY: all previous together
Return
The amount of the characters processed to the destination buffer excludingtrailing ‘0’ is returned.
- intstring_escape_mem(constchar*src,size_tisz,char*dst,size_tosz,unsignedintflags,constchar*only)¶
quote characters in the given memory buffer
Parameters
constchar*srcsource buffer (unescaped)
size_tiszsource buffer size
char*dstdestination buffer (escaped)
size_toszdestination buffer size
unsignedintflagscombination of the flags
constchar*onlyNULL-terminated string containing characters used to limitthe selected escape class. If characters are included inonlythat would not normally be escaped by the classes selectedinflags, they will be copied todst unescaped.
Description
The process of escaping byte buffer includes several parts. They are appliedin the following sequence.
The character is not matched to the one fromonly string and thusmust go as-is to the output.
The character is matched to the printable and ASCII classes, if asked,and in case of match it passes through to the output.
The character is matched to the printable or ASCII class, if asked,and in case of match it passes through to the output.
The character is checked if it falls into the class given byflags.
ESCAPE_OCTALandESCAPE_HEXare going last since they cover anycharacter. Note that they actually can’t go together, otherwiseESCAPE_HEXwill be ignored.
Caller must provide valid source and destination pointers. Be aware thatdestination buffer will not be NULL-terminated, thus caller have to appendit if needs. The supported flags are:
%ESCAPE_SPACE: (special white space, not space itself) '\f' - form feed '\n' - new line '\r' - carriage return '\t' - horizontal tab '\v' - vertical tab%ESCAPE_SPECIAL: '\"' - double quote '\\' - backslash '\a' - alert (BEL) '\e' - escape%ESCAPE_NULL: '\0' - null%ESCAPE_OCTAL: '\NNN' - byte with octal value NNN (3 digits)%ESCAPE_ANY: all previous together%ESCAPE_NP: escape only non-printable characters, checked by isprint()%ESCAPE_ANY_NP: all previous together%ESCAPE_HEX: '\xHH' - byte with hexadecimal value HH (2 digits)%ESCAPE_NA: escape only non-ascii characters, checked by isascii()%ESCAPE_NAP: escape only non-printable or non-ascii characters%ESCAPE_APPEND: append characters from @only to be escaped by the given classes
ESCAPE_APPEND would help to pass additional characters to the escaped, whenone ofESCAPE_NP,ESCAPE_NA, orESCAPE_NAP is provided.
One notable caveat, theESCAPE_NAP,ESCAPE_NP andESCAPE_NA have thehigher priority than the rest of the flags (ESCAPE_NAP is the highest).It doesn’t make much sense to use either of them withoutESCAPE_OCTALorESCAPE_HEX, because they cover most of the other character classes.ESCAPE_NAP can utilizeESCAPE_SPACE orESCAPE_SPECIAL in addition tothe above.
Return
The total size of the escaped output that would be generated forthe given input and flags. To check whether the output wastruncated, compare the return value to osz. There is room left indst for a ‘0’ terminator if and only if ret < osz.
- char**kasprintf_strarray(gfp_tgfp,constchar*prefix,size_tn)¶
allocate and fill array of sequential strings
Parameters
gfp_tgfpflags for the slab allocator
constchar*prefixprefix to be used
size_tnamount of lines to be allocated and filled
Description
Allocates and fillsn strings using pattern “s-````zu”, where prefixis provided by caller. The caller is responsible to free them withkfree_strarray() after use.
Returns array of strings or NULL when memory can’t be allocated.
- voidkfree_strarray(char**array,size_tn)¶
free a number of dynamically allocated strings contained in an array and the array itself
Parameters
char**arrayDynamically allocated array of strings to free.
size_tnNumber of strings (starting from the beginning of the array) to free.
Description
Passing a non-NULLarray andn == 0 as well as NULLarray are validuse-cases. Ifarray is NULL, the function does nothing.
- char*skip_spaces(constchar*str)¶
Removes leading whitespace fromstr.
Parameters
constchar*strThe string to be stripped.
Description
Returns a pointer to the first non-whitespace character instr.
- char*strim(char*s)¶
Removes leading and trailing whitespace froms.
Parameters
char*sThe string to be stripped.
Description
Note that the first trailing whitespace is replaced with aNUL-terminatorin the given strings. Returns a pointer to the first non-whitespacecharacter ins.
- boolsysfs_streq(constchar*s1,constchar*s2)¶
return true if strings are equal, modulo trailing newline
Parameters
constchar*s1one string
constchar*s2another string
Description
This routine returns true iff two strings are equal, treating bothNUL and newline-then-NUL as equivalent string terminations. It’sgeared for use with sysfs input strings, which generally terminatewith newlines but are compared against values without newlines.
- intmatch_string(constchar*const*array,size_tn,constchar*string)¶
matches given string in an array
Parameters
constchar*const*arrayarray of strings
size_tnnumber of strings in the array or -1 for NULL terminated arrays
constchar*stringstring to match with
Description
This routine will look for a string in an array of strings up to then-th element in the array or until the first NULL element.
Historically the value of -1 forn, was used to search in arrays thatare NULL terminated. However, the function does not make a distinctionwhen finishing the search: eithern elements have been compared ORthe first NULL element was found.
Return
index of astring in thearray if matches, or-EINVAL otherwise.
- int__sysfs_match_string(constchar*const*array,size_tn,constchar*str)¶
matches given string in an array
Parameters
constchar*const*arrayarray of strings
size_tnnumber of strings in the array or -1 for NULL terminated arrays
constchar*strstring to match with
Description
Returns index ofstr in thearray or -EINVAL, just likematch_string().Uses sysfs_streq instead of strcmp for matching.
This routine will look for a string in an array of strings up to then-th element in the array or until the first NULL element.
Historically the value of -1 forn, was used to search in arrays thatare NULL terminated. However, the function does not make a distinctionwhen finishing the search: eithern elements have been compared ORthe first NULL element was found.
- char*strreplace(char*str,charold,charnew)¶
Replace all occurrences of character in string.
Parameters
char*strThe string to operate on.
charoldThe character being replaced.
charnewThe characterold is replaced with.
Description
Replaces the eachold character with anew one in the given stringstr.
Return
pointer to the stringstr itself.
- voidmemcpy_and_pad(void*dest,size_tdest_len,constvoid*src,size_tcount,intpad)¶
Copy one buffer to another with padding
Parameters
void*destWhere to copy to
size_tdest_lenThe destination buffer size
constvoid*srcWhere to copy from
size_tcountThe number of bytes to copy
intpadCharacter to use for padding if space is left in destination.
String Manipulation¶
- unsafe_memcpy¶
unsafe_memcpy(dst,src,bytes,justification)
memcpy implementation with no FORTIFY bounds checking
Parameters
dstDestination memory address to write to
srcSource memory address to read from
bytesHow many bytes to write todst fromsrc
justificationFree-form text or comment describing why the use is needed
Description
This should be used for corner cases where the compiler cannot do theright thing, or during transitions between APIs, etc. It should be usedvery rarely, and includes a place for justification detailing where boundschecking has happened, and why existing solutions cannot be employed.
- char*strncpy(char*constp,constchar*q,__kernel_size_tsize)¶
Copy a string to memory with non-guaranteed NUL padding
Parameters
char*constppointer to destination of copy
constchar*qpointer to NUL-terminated source string to copy
__kernel_size_tsizebytes to write atp
Description
If strlen(q) >=size, the copy ofq will stop aftersize bytes,andp will NOT be NUL-terminated
If strlen(q) <size, following the copy ofq, trailing NUL byteswill be written top untilsize total bytes have been written.
Do not use this function. While FORTIFY_SOURCE tries to avoidover-reads ofq, it cannot defend against writing unterminatedresults top. Usingstrncpy() remains ambiguous and fragile.Instead, please choose an alternative, so that the expectationofp’s contents is unambiguous:
p needs to be: | padded tosize | not padded |
|---|---|---|
NUL-terminated | ||
not NUL-terminated |
Note strscpy*()’s differing return values for detecting truncation,and strtomem*()’s expectation that the destination is marked with__nonstring when it is a character array.
- __kernel_size_tstrnlen(constchar*constp,__kernel_size_tmaxlen)¶
Return bounded count of characters in a NUL-terminated string
Parameters
constchar*constppointer to NUL-terminated string to count.
__kernel_size_tmaxlenmaximum number of characters to count.
Description
Returns number of characters inp (NOT including the final NUL), ormaxlen, if no NUL has been found up to there.
- strlen¶
strlen(p)
Return count of characters in a NUL-terminated string
Parameters
ppointer to NUL-terminated string to count.
Description
Do not use this function unless the string length is known atcompile-time. Whenp is unterminated, this function may crashor return unexpected counts that could lead to memory contentexposures. Preferstrnlen().
Returns number of characters inp (NOT including the final NUL).
- size_tstrlcat(char*constp,constchar*constq,size_tavail)¶
Append a string to an existing string
Parameters
char*constppointer to
NUL-terminatedstring to append toconstchar*constqpointer to
NUL-terminatedstring to append fromsize_tavailMaximum bytes available inp
Description
AppendsNUL-terminated stringq after theNUL-terminatedstring atp, but will not write beyondavail bytes total,potentially truncating the copy fromq.p will stayNUL-terminated only if aNUL already existed withintheavail bytes ofp. If so, the resulting number ofbytes copied fromq will be at most “avail - strlen(p) - 1”.
Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when the sizesofp andq are known to the compiler. Prefer building thestring with formatting, viascnprintf(), seq_buf, or similar.
Returns total bytes that _would_ have been contained bypregardless of truncation, similar tosnprintf(). If returnvalue is >=avail, the string has been truncated.
- char*strcat(char*constp,constchar*q)¶
Append a string to an existing string
Parameters
char*constppointer to NUL-terminated string to append to
constchar*qpointer to NUL-terminated source string to append from
Description
Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when thedestination buffer size is known to the compiler. Preferbuilding the string with formatting, viascnprintf() or similar.At the very least, usestrncat().
Returnsp.
- char*strncat(char*constp,constchar*constq,__kernel_size_tcount)¶
Append a string to an existing string
Parameters
char*constppointer to NUL-terminated string to append to
constchar*constqpointer to source string to append from
__kernel_size_tcountMaximum bytes to read fromq
Description
Appends at mostcount bytes fromq (stopping at the firstNUL byte) after the NUL-terminated string atp.p will beNUL-terminated.
Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when the sizesofp andq are known to the compiler. Prefer building thestring with formatting, viascnprintf() or similar.
Returnsp.
- char*strcpy(char*constp,constchar*constq)¶
Copy a string into another string buffer
Parameters
char*constppointer to destination of copy
constchar*constqpointer to NUL-terminated source string to copy
Description
Do not use this function. While FORTIFY_SOURCE tries to avoidoverflows, this is only possible when the sizes ofq andp areknown to the compiler. Preferstrscpy(), though note its differentreturn values for detecting truncation.
Returnsp.
- intstrncasecmp(constchar*s1,constchar*s2,size_tlen)¶
Case insensitive, length-limited string comparison
Parameters
constchar*s1One string
constchar*s2The other string
size_tlenthe maximum number of characters to compare
- char*stpcpy(char*__restrict__dest,constchar*__restrict__src)¶
copy a string from src to dest returning a pointer to the new end of dest, including src’s
NUL-terminator. May overrun dest.
Parameters
char*__restrict__destpointer to end of string being copied into. Must be large enoughto receive copy.
constchar*__restrict__srcpointer to the beginning of string being copied from. Must not overlapdest.
Description
stpcpy differs from strcpy in a key way: the return value is a pointerto the newNUL-terminating character indest. (For strcpy, the returnvalue is a pointer to the start ofdest). This interface is consideredunsafe as it doesn’t perform bounds checking of the inputs. As such it’snot recommended for usage. Instead, its definition is provided in casethe compiler lowers other libcalls to stpcpy.
- intstrcmp(constchar*cs,constchar*ct)¶
Compare two strings
Parameters
constchar*csOne string
constchar*ctAnother string
- intstrncmp(constchar*cs,constchar*ct,size_tcount)¶
Compare two length-limited strings
Parameters
constchar*csOne string
constchar*ctAnother string
size_tcountThe maximum number of bytes to compare
- char*strchr(constchar*s,intc)¶
Find the first occurrence of a character in a string
Parameters
constchar*sThe string to be searched
intcThe character to search for
Description
Note that theNUL-terminator is considered part of the string, and canbe searched for.
- char*strchrnul(constchar*s,intc)¶
Find and return a character in a string, or end of string
Parameters
constchar*sThe string to be searched
intcThe character to search for
Description
Returns pointer to first occurrence of ‘c’ in s. If c is not found, thenreturn a pointer to the null byte at the end of s.
- char*strrchr(constchar*s,intc)¶
Find the last occurrence of a character in a string
Parameters
constchar*sThe string to be searched
intcThe character to search for
- char*strnchr(constchar*s,size_tcount,intc)¶
Find a character in a length limited string
Parameters
constchar*sThe string to be searched
size_tcountThe number of characters to be searched
intcThe character to search for
Description
Note that theNUL-terminator is considered part of the string, and canbe searched for.
- size_tstrspn(constchar*s,constchar*accept)¶
Calculate the length of the initial substring ofs which only contain letters inaccept
Parameters
constchar*sThe string to be searched
constchar*acceptThe string to search for
- size_tstrcspn(constchar*s,constchar*reject)¶
Calculate the length of the initial substring ofs which does not contain letters inreject
Parameters
constchar*sThe string to be searched
constchar*rejectThe string to avoid
- char*strpbrk(constchar*cs,constchar*ct)¶
Find the first occurrence of a set of characters
Parameters
constchar*csThe string to be searched
constchar*ctThe characters to search for
- char*strsep(char**s,constchar*ct)¶
Split a string into tokens
Parameters
char**sThe string to be searched
constchar*ctThe characters to search for
Description
strsep() updatess to point after the token, ready for the next call.
It returns empty tokens, too, behaving exactly like the libc functionof that name. In fact, it was stolen from glibc2 and de-fancy-fied.Same semantics, slimmer shape. ;)
- void*memset(void*s,intc,size_tcount)¶
Fill a region of memory with the given value
Parameters
void*sPointer to the start of the area.
intcThe byte to fill the area with
size_tcountThe size of the area.
Description
Do not usememset() to access IO space, usememset_io() instead.
- void*memset16(uint16_t*s,uint16_tv,size_tcount)¶
Fill a memory area with a uint16_t
Parameters
uint16_t*sPointer to the start of the area.
uint16_tvThe value to fill the area with
size_tcountThe number of values to store
Description
Differs frommemset() in that it fills with a uint16_t insteadof a byte. Remember thatcount is the number of uint16_ts tostore, not the number of bytes.
- void*memset32(uint32_t*s,uint32_tv,size_tcount)¶
Fill a memory area with a uint32_t
Parameters
uint32_t*sPointer to the start of the area.
uint32_tvThe value to fill the area with
size_tcountThe number of values to store
Description
Differs frommemset() in that it fills with a uint32_t insteadof a byte. Remember thatcount is the number of uint32_ts tostore, not the number of bytes.
- void*memset64(uint64_t*s,uint64_tv,size_tcount)¶
Fill a memory area with a uint64_t
Parameters
uint64_t*sPointer to the start of the area.
uint64_tvThe value to fill the area with
size_tcountThe number of values to store
Description
Differs frommemset() in that it fills with a uint64_t insteadof a byte. Remember thatcount is the number of uint64_ts tostore, not the number of bytes.
- void*memcpy(void*dest,constvoid*src,size_tcount)¶
Copy one area of memory to another
Parameters
void*destWhere to copy to
constvoid*srcWhere to copy from
size_tcountThe size of the area.
Description
You should not use this function to access IO space, usememcpy_toio()ormemcpy_fromio() instead.
- void*memmove(void*dest,constvoid*src,size_tcount)¶
Copy one area of memory to another
Parameters
void*destWhere to copy to
constvoid*srcWhere to copy from
size_tcountThe size of the area.
Description
- __visibleintmemcmp(constvoid*cs,constvoid*ct,size_tcount)¶
Compare two areas of memory
Parameters
constvoid*csOne area of memory
constvoid*ctAnother area of memory
size_tcountThe size of the area.
- intbcmp(constvoid*a,constvoid*b,size_tlen)¶
returns 0 if and only if the buffers have identical contents.
Parameters
constvoid*apointer to first buffer.
constvoid*bpointer to second buffer.
size_tlensize of buffers.
Description
The sign or magnitude of a non-zero return value has no particularmeaning, and architectures may implement their own more efficientbcmp(). Sowhile this particular implementation is a simple (tail) call to memcmp, donot rely on anything but whether the return value is zero or non-zero.
- void*memscan(void*addr,intc,size_tsize)¶
Find a character in an area of memory.
Parameters
void*addrThe memory area
intcThe byte to search for
size_tsizeThe size of the area.
Description
returns the address of the first occurrence ofc, or 1 byte pastthe area ifc is not found
- char*strstr(constchar*s1,constchar*s2)¶
Find the first substring in a
NULterminated string
Parameters
constchar*s1The string to be searched
constchar*s2The string to search for
- char*strnstr(constchar*s1,constchar*s2,size_tlen)¶
Find the first substring in a length-limited string
Parameters
constchar*s1The string to be searched
constchar*s2The string to search for
size_tlenthe maximum number of characters to search
- void*memchr(constvoid*s,intc,size_tn)¶
Find a character in an area of memory.
Parameters
constvoid*sThe memory area
intcThe byte to search for
size_tnThe size of the area.
Description
returns the address of the first occurrence ofc, orNULLifc is not found
- void*memchr_inv(constvoid*start,intc,size_tbytes)¶
Find an unmatching character in an area of memory.
Parameters
constvoid*startThe memory area
intcFind a character other than c
size_tbytesThe size of the area.
Description
returns the address of the first character other thanc, orNULLif the whole buffer contains justc.
- void*memdup_array_user(constvoid__user*src,size_tn,size_tsize)¶
duplicate array from user space
Parameters
constvoid__user*srcsource address in user space
size_tnnumber of array members to copy
size_tsizesize of one array member
Return
anERR_PTR() on failure. Result is physicallycontiguous, to be freed bykfree().
- void*vmemdup_array_user(constvoid__user*src,size_tn,size_tsize)¶
duplicate array from user space
Parameters
constvoid__user*srcsource address in user space
size_tnnumber of array members to copy
size_tsizesize of one array member
Return
anERR_PTR() on failure. Result may be notphysically contiguous. Usekvfree() to free.
- strscpy¶
strscpy(dst,src,...)
Copy a C-string into a sized buffer
Parameters
dstWhere to copy the string to
srcWhere to copy the string from
...Size of destination buffer (optional)
Description
Copy the source stringsrc, or as much of it as fits, into thedestinationdst buffer. The behavior is undefined if the stringbuffers overlap. The destinationdst buffer is always NUL terminated,unless it’s zero-sized.
The size argument... is only required whendst is not an array, orwhen the copy needs to be smaller than sizeof(dst).
Preferred tostrncpy() since it always returns a valid string, anddoesn’t unnecessarily force the tail of the destination buffer to bezero padded. If padding is desired please usestrscpy_pad().
Returns the number of characters copied indst (not including thetrailingNUL) or -E2BIG ifsize is 0 or the copy fromsrc wastruncated.
- strscpy_pad¶
strscpy_pad(dst,src,...)
Copy a C-string into a sized buffer
Parameters
dstWhere to copy the string to
srcWhere to copy the string from
...Size of destination buffer
Description
Copy the string, or as much of it as fits, into the dest buffer. Thebehavior is undefined if the string buffers overlap. The destinationbuffer is alwaysNUL terminated, unless it’s zero-sized.
If the source string is shorter than the destination buffer, theremaining bytes in the buffer will be filled withNUL bytes.
For full explanation of why you may want to consider using the‘strscpy’ functions please see the function docstring forstrscpy().
Return
The number of characters copied (not including the trailing
NULs)-E2BIG if count is 0 orsrc was truncated.
- boolmem_is_zero(constvoid*s,size_tn)¶
Check if an area of memory is all 0’s.
Parameters
constvoid*sThe memory area
size_tnThe size of the area
Return
True if the area of memory is all 0’s.
- sysfs_match_string¶
sysfs_match_string(_a,_s)
matches given string in an array
Parameters
_aarray of strings
_sstring to match with
Description
Helper for__sysfs_match_string(). Calculates the size ofa automatically.
- voidmemzero_explicit(void*s,size_tcount)¶
Fill a region of memory (e.g. sensitive keying data) with 0s.
Parameters
void*sPointer to the start of the area.
size_tcountThe size of the area.
Note
usually usingmemset() is just fine (!), but in caseswhere clearing out _local_ data at the end of a scope isnecessary,memzero_explicit() should be used instead inorder to prevent the compiler from optimising away zeroing.
memzero_explicit() doesn’t need an arch-specific version asit just invokes the one ofmemset() implicitly.
- constchar*kbasename(constchar*path)¶
return the last part of a pathname.
Parameters
constchar*pathpath to extract the filename from.
Return
Pointer to the filename portion insidepath. If no ‘/’ exists,returnspath unchanged.
- strtomem_pad¶
strtomem_pad(dest,src,pad)
Copy NUL-terminated string to non-NUL-terminated buffer
Parameters
destPointer of destination character array (marked as __nonstring)
srcPointer to NUL-terminated string
padPadding character to fill any remaining bytes ofdest after copy
Description
This is a replacement forstrncpy() uses where the destination is nota NUL-terminated string, but with bounds checking on the source size, andan explicit padding character. If padding is not required, usestrtomem().
Note that the size ofdest is not an argument, as the length ofdestmust be discoverable by the compiler.
- strtomem¶
strtomem(dest,src)
Copy NUL-terminated string to non-NUL-terminated buffer
Parameters
destPointer of destination character array (marked as __nonstring)
srcPointer to NUL-terminated string
Description
This is a replacement forstrncpy() uses where the destination is nota NUL-terminated string, but with bounds checking on the source size, andwithout trailing padding. If padding is required, usestrtomem_pad().
Note that the size ofdest is not an argument, as the length ofdestmust be discoverable by the compiler.
- memtostr¶
memtostr(dest,src)
Copy a possibly non-NUL-term string to a NUL-term string
Parameters
destPointer to destination NUL-terminates string
srcPointer to character array (likely marked as __nonstring)
Description
This is a replacement forstrncpy() uses where the source is nota NUL-terminated string.
Note that sizes ofdest andsrc must be known at compile-time.
- memtostr_pad¶
memtostr_pad(dest,src)
Copy a possibly non-NUL-term string to a NUL-term string with NUL padding in the destination
Parameters
destPointer to destination NUL-terminates string
srcPointer to character array (likely marked as __nonstring)
Description
This is a replacement forstrncpy() uses where the source is nota NUL-terminated string.
Note that sizes ofdest andsrc must be known at compile-time.
- memset_after¶
memset_after(obj,v,member)
Set a value after a
structmemberto the end of a struct
Parameters
objAddress of target
structinstancevByte value to repeatedly write
memberafter which
structmemberto start writing bytes
Description
This is good for clearing padding following the given member.
- memset_startat¶
memset_startat(obj,v,member)
Set a value starting at a member to the end of a struct
Parameters
objAddress of target
structinstancevByte value to repeatedly write
memberstructmemberto start writing at
Description
Note that if there is padding between the prior member and the targetmember,memset_after() should be used to clear the prior padding.
- size_tstr_has_prefix(constchar*str,constchar*prefix)¶
Test if a string has a given prefix
Parameters
constchar*strThe string to test
constchar*prefixThe string to see ifstr starts with
Description
- A common way to test a prefix of a string is to do:
strncmp(str, prefix, sizeof(prefix) - 1)
But this can lead to bugs due to typos, or if prefix is a pointerand not a constant. Instead usestr_has_prefix().
Return
strlen(prefix) ifstr starts withprefix
0 ifstr does not start withprefix
- boolstrstarts(constchar*str,constchar*prefix)¶
doesstr start withprefix?
Parameters
constchar*strstring to examine
constchar*prefixprefix to look for.
Return
True ifstr begins withprefix. False in all other cases.
- boolstrends(constchar*str,constchar*suffix)¶
Check if a string ends with another string.
Parameters
constchar*strNULL-terminated string to check againstsuffix
constchar*suffixNULL-terminated string defining the suffix to look for instr
Return
True ifstr ends withsuffix. False in all other cases.
- char*kstrdup(constchar*s,gfp_tgfp)¶
allocate space for and copy an existing string
Parameters
constchar*sthe string to duplicate
gfp_tgfpthe GFP mask used in the
kmalloc()call when allocating memory
Return
newly allocated copy ofs orNULL in case of error
- constchar*kstrdup_const(constchar*s,gfp_tgfp)¶
conditionally duplicate an existing const string
Parameters
constchar*sthe string to duplicate
gfp_tgfpthe GFP mask used in the
kmalloc()call when allocating memory
Note
Strings allocated by kstrdup_const should be freed by kfree_const andmust not be passed tokrealloc().
Return
source string if it is in .rodata section otherwisefallback to kstrdup.
- char*kstrndup(constchar*s,size_tmax,gfp_tgfp)¶
allocate space for and copy an existing string
Parameters
constchar*sthe string to duplicate
size_tmaxread at mostmax chars froms
gfp_tgfpthe GFP mask used in the
kmalloc()call when allocating memory
Note
Usekmemdup_nul() instead if the size is known exactly.
Return
newly allocated copy ofs orNULL in case of error
- void*kmemdup(constvoid*src,size_tlen,gfp_tgfp)¶
duplicate region of memory
Parameters
constvoid*srcmemory region to duplicate
size_tlenmemory region length
gfp_tgfpGFP mask to use
Return
newly allocated copy ofsrc orNULL in case of error,result is physically contiguous. Usekfree() to free.
- char*kmemdup_nul(constchar*s,size_tlen,gfp_tgfp)¶
Create a NUL-terminated string from unterminated data
Parameters
constchar*sThe data to stringify
size_tlenThe size of the data
gfp_tgfpthe GFP mask used in the
kmalloc()call when allocating memory
Return
newly allocated copy ofs with NUL-termination orNULL incase of error
- void*memdup_user(constvoid__user*src,size_tlen)¶
duplicate memory region from user space
Parameters
constvoid__user*srcsource address in user space
size_tlennumber of bytes to copy
Return
anERR_PTR() on failure. Result is physicallycontiguous, to be freed bykfree().
- void*vmemdup_user(constvoid__user*src,size_tlen)¶
duplicate memory region from user space
Parameters
constvoid__user*srcsource address in user space
size_tlennumber of bytes to copy
Return
anERR_PTR() on failure. Result may be notphysically contiguous. Usekvfree() to free.
- char*strndup_user(constchar__user*s,longn)¶
duplicate an existing string from user space
Parameters
constchar__user*sThe string to duplicate
longnMaximum number of bytes to copy, including the trailing NUL.
Return
newly allocated copy ofs or anERR_PTR() in case of error
- void*memdup_user_nul(constvoid__user*src,size_tlen)¶
duplicate memory region from user space and NUL-terminate
Parameters
constvoid__user*srcsource address in user space
size_tlennumber of bytes to copy
Return
anERR_PTR() on failure.
Basic Kernel Library Functions¶
The Linux kernel provides more basic utility functions.
Bit Operations¶
- voidset_bit(longnr,volatileunsignedlong*addr)¶
Atomically set a bit in memory
Parameters
longnrthe bit to set
volatileunsignedlong*addrthe address to start counting from
Description
This is a relaxed atomic operation (no implied memory barriers).
Note thatnr may be almost arbitrarily large; this function is notrestricted to acting on a single-word quantity.
- voidclear_bit(longnr,volatileunsignedlong*addr)¶
Clears a bit in memory
Parameters
longnrBit to clear
volatileunsignedlong*addrAddress to start counting from
Description
This is a relaxed atomic operation (no implied memory barriers).
- voidchange_bit(longnr,volatileunsignedlong*addr)¶
Toggle a bit in memory
Parameters
longnrBit to change
volatileunsignedlong*addrAddress to start counting from
Description
This is a relaxed atomic operation (no implied memory barriers).
Note thatnr may be almost arbitrarily large; this function is notrestricted to acting on a single-word quantity.
- booltest_and_set_bit(longnr,volatileunsignedlong*addr)¶
Set a bit and return its old value
Parameters
longnrBit to set
volatileunsignedlong*addrAddress to count from
Description
This is an atomic fully-ordered operation (implied full memory barrier).
- booltest_and_clear_bit(longnr,volatileunsignedlong*addr)¶
Clear a bit and return its old value
Parameters
longnrBit to clear
volatileunsignedlong*addrAddress to count from
Description
This is an atomic fully-ordered operation (implied full memory barrier).
- booltest_and_change_bit(longnr,volatileunsignedlong*addr)¶
Change a bit and return its old value
Parameters
longnrBit to change
volatileunsignedlong*addrAddress to count from
Description
This is an atomic fully-ordered operation (implied full memory barrier).
- void___set_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Set a bit in memory
Parameters
unsignedlongnrthe bit to set
volatileunsignedlong*addrthe address to start counting from
Description
Unlikeset_bit(), this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.
- void___clear_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Clears a bit in memory
Parameters
unsignedlongnrthe bit to clear
volatileunsignedlong*addrthe address to start counting from
Description
Unlikeclear_bit(), this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.
- void___change_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Toggle a bit in memory
Parameters
unsignedlongnrthe bit to change
volatileunsignedlong*addrthe address to start counting from
Description
Unlikechange_bit(), this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.
- bool___test_and_set_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Set a bit and return its old value
Parameters
unsignedlongnrBit to set
volatileunsignedlong*addrAddress to count from
Description
This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.
- bool___test_and_clear_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Clear a bit and return its old value
Parameters
unsignedlongnrBit to clear
volatileunsignedlong*addrAddress to count from
Description
This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.
- bool___test_and_change_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Change a bit and return its old value
Parameters
unsignedlongnrBit to change
volatileunsignedlong*addrAddress to count from
Description
This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.
- bool_test_bit(unsignedlongnr,volatileconstunsignedlong*addr)¶
Determine whether a bit is set
Parameters
unsignedlongnrbit number to test
constvolatileunsignedlong*addrAddress to start counting from
- bool_test_bit_acquire(unsignedlongnr,volatileconstunsignedlong*addr)¶
Determine, with acquire semantics, whether a bit is set
Parameters
unsignedlongnrbit number to test
constvolatileunsignedlong*addrAddress to start counting from
- voidclear_bit_unlock(longnr,volatileunsignedlong*addr)¶
Clear a bit in memory, for unlock
Parameters
longnrthe bit to set
volatileunsignedlong*addrthe address to start counting from
Description
This operation is atomic and provides release barrier semantics.
- void__clear_bit_unlock(longnr,volatileunsignedlong*addr)¶
Clears a bit in memory
Parameters
longnrBit to clear
volatileunsignedlong*addrAddress to start counting from
Description
This is a non-atomic operation but implies a release barrier before thememory operation. It can be used for an unlock if no other CPUs canconcurrently modify other bits in the word.
- booltest_and_set_bit_lock(longnr,volatileunsignedlong*addr)¶
Set a bit and return its old value, for lock
Parameters
longnrBit to set
volatileunsignedlong*addrAddress to count from
Description
This operation is atomic and provides acquire barrier semantics ifthe returned value is 0.It can be used to implement bit locks.
- boolxor_unlock_is_negative_byte(unsignedlongmask,volatileunsignedlong*addr)¶
XOR a single byte in memory and test if it is negative, for unlock.
Parameters
unsignedlongmaskChange the bits which are set in this mask.
volatileunsignedlong*addrThe address of the word containing the byte to change.
Description
Changes some of bits 0-6 in the word pointed to byaddr.This operation is atomic and provides release barrier semantics.Used to optimise some folio operations which are commonly pairedwith an unlock or end of writeback. Bit 7 is used as PG_waiters toindicate whether anybody is waiting for the unlock.
Return
Whether the top bit of the byte is set.
Bitmap Operations¶
bitmaps provide an array of bits, implemented using anarray of unsigned longs. The number of valid bits in agiven bitmap does _not_ need to be an exact multiple ofBITS_PER_LONG.
The possible unused bits in the last, partially used wordof a bitmap are ‘don’t care’. The implementation makesno particular effort to keep them zero. It ensures thattheir value will not affect the results of any operation.The bitmap operations that return Boolean (bitmap_empty,for example) or scalar (bitmap_weight, for example) resultscarefully filter out these unused bits from impacting theirresults.
The byte ordering of bitmaps is more natural on littleendian architectures. See the big-endian headersinclude/asm-ppc64/bitops.h and include/asm-s390/bitops.hfor the best explanations of this ordering.
The DECLARE_BITMAP(name,bits) macro, in linux/types.h, can be usedto declare an array named ‘name’ of just enough unsigned longs tocontain all bit positions from 0 to ‘bits’ - 1.
The available bitmap operations and their rough meaning in thecase that the bitmap is a single unsigned long are thus:
The generated code is more efficient when nbits is known atcompile-time and at most BITS_PER_LONG.
bitmap_zero(dst, nbits) *dst = 0ULbitmap_fill(dst, nbits) *dst = ~0ULbitmap_copy(dst, src, nbits) *dst = *srcbitmap_and(dst, src1, src2, nbits) *dst = *src1 & *src2bitmap_or(dst, src1, src2, nbits) *dst = *src1 | *src2bitmap_weighted_or(dst, src1, src2, nbits) *dst = *src1 | *src2. Returns Hamming Weight of dstbitmap_xor(dst, src1, src2, nbits) *dst = *src1 ^ *src2bitmap_andnot(dst, src1, src2, nbits) *dst = *src1 & ~(*src2)bitmap_complement(dst, src, nbits) *dst = ~(*src)bitmap_equal(src1, src2, nbits) Are *src1 and *src2 equal?bitmap_intersects(src1, src2, nbits) Do *src1 and *src2 overlap?bitmap_subset(src1, src2, nbits) Is *src1 a subset of *src2?bitmap_empty(src, nbits) Are all bits zero in *src?bitmap_full(src, nbits) Are all bits set in *src?bitmap_weight(src, nbits) Hamming Weight: number set bitsbitmap_weight_and(src1, src2, nbits) Hamming Weight of and'ed bitmapbitmap_weight_andnot(src1, src2, nbits) Hamming Weight of andnot'ed bitmapbitmap_set(dst, pos, nbits) Set specified bit areabitmap_clear(dst, pos, nbits) Clear specified bit areabitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free areabitmap_find_next_zero_area_off(buf, len, pos, n, mask, mask_off) as abovebitmap_shift_right(dst, src, n, nbits) *dst = *src >> nbitmap_shift_left(dst, src, n, nbits) *dst = *src << nbitmap_cut(dst, src, first, n, nbits) Cut n bits from first, copy restbitmap_replace(dst, old, new, mask, nbits) *dst = (*old & ~(*mask)) | (*new & *mask)bitmap_scatter(dst, src, mask, nbits) *dst = map(dense, sparse)(src)bitmap_gather(dst, src, mask, nbits) *dst = map(sparse, dense)(src)bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)bitmap_bitremap(oldbit, old, new, nbits) newbit = map(old, new)(oldbit)bitmap_onto(dst, orig, relmap, nbits) *dst = orig relative to relmapbitmap_fold(dst, orig, sz, nbits) dst bits = orig bits mod szbitmap_parse(buf, buflen, dst, nbits) Parse bitmap dst from kernel bufbitmap_parse_user(ubuf, ulen, dst, nbits) Parse bitmap dst from user bufbitmap_parselist(buf, dst, nbits) Parse bitmap dst from kernel bufbitmap_parselist_user(buf, dst, nbits) Parse bitmap dst from user bufbitmap_find_free_region(bitmap, bits, order) Find and allocate bit regionbitmap_release_region(bitmap, pos, order) Free specified bit regionbitmap_allocate_region(bitmap, pos, order) Allocate specified bit regionbitmap_from_arr32(dst, buf, nbits) Copy nbits from u32[] buf to dstbitmap_from_arr64(dst, buf, nbits) Copy nbits from u64[] buf to dstbitmap_to_arr32(buf, src, nbits) Copy nbits from buf to u32[] dstbitmap_to_arr64(buf, src, nbits) Copy nbits from buf to u64[] dstbitmap_get_value8(map, start) Get 8bit value from map at startbitmap_set_value8(map, value, start) Set 8bit value to map at startbitmap_read(map, start, nbits) Read an nbits-sized value from map at startbitmap_write(map, value, start, nbits) Write an nbits-sized value to map at start
Note,bitmap_zero() andbitmap_fill() operate over the region ofunsigned longs, that is, bits behind bitmap till the unsigned longboundary will be zeroed or filled as well. Consider to usebitmap_clear() orbitmap_set() to make explicit zeroing or fillingrespectively.
Also the following operations in asm/bitops.h apply to bitmaps.:
set_bit(bit, addr) *addr |= bitclear_bit(bit, addr) *addr &= ~bitchange_bit(bit, addr) *addr ^= bittest_bit(bit, addr) Is bit set in *addr?test_and_set_bit(bit, addr) Set bit and return old valuetest_and_clear_bit(bit, addr) Clear bit and return old valuetest_and_change_bit(bit, addr) Change bit and return old valuefind_first_zero_bit(addr, nbits) Position first zero bit in *addrfind_first_bit(addr, nbits) Position first set bit in *addrfind_next_zero_bit(addr, nbits, bit) Position next zero bit in *addr >= bitfind_next_bit(addr, nbits, bit) Position next set bit in *addr >= bitfind_next_and_bit(addr1, addr2, nbits, bit) Same as find_next_bit, but in (*addr1 & *addr2)
- void__bitmap_shift_right(unsignedlong*dst,constunsignedlong*src,unsignedshift,unsignednbits)¶
logical right shift of the bits in a bitmap
Parameters
unsignedlong*dstdestination bitmap
constunsignedlong*srcsource bitmap
unsignedshiftshift by this many bits
unsignednbitsbitmap size, in bits
Description
Shifting right (dividing) means moving bits in the MS -> LS bitdirection. Zeros are fed into the vacated MS positions and theLS bits shifted off the bottom are lost.
- void__bitmap_shift_left(unsignedlong*dst,constunsignedlong*src,unsignedintshift,unsignedintnbits)¶
logical left shift of the bits in a bitmap
Parameters
unsignedlong*dstdestination bitmap
constunsignedlong*srcsource bitmap
unsignedintshiftshift by this many bits
unsignedintnbitsbitmap size, in bits
Description
Shifting left (multiplying) means moving bits in the LS -> MSdirection. Zeros are fed into the vacated LS bit positionsand those MS bits shifted off the top are lost.
- voidbitmap_cut(unsignedlong*dst,constunsignedlong*src,unsignedintfirst,unsignedintcut,unsignedintnbits)¶
remove bit region from bitmap and right shift remaining bits
Parameters
unsignedlong*dstdestination bitmap, might overlap with src
constunsignedlong*srcsource bitmap
unsignedintfirststart bit of region to be removed
unsignedintcutnumber of bits to remove
unsignedintnbitsbitmap size, in bits
Description
Set the n-th bit ofdst iff the n-th bit ofsrc is set andn is less thanfirst, or the m-th bit ofsrc is set for anym such thatfirst <= n < nbits, and m = n +cut.
In pictures, example for a big-endian 32-bit architecture:
Thesrc bitmap is:
31 63| |10000000 11000001 11110010 00010101 10000000 11000001 01110010 00010101 | | | | 16 14 0 32
ifcut is 3, andfirst is 14, bits 14-16 insrc are cut anddst is:
31 63| |10110000 00011000 00110010 00010101 00010000 00011000 00101110 01000010 | | | 14 (bit 17 0 32 from @src)
Note thatdst andsrc might overlap partially or entirely.
This is implemented in the obvious way, with a shift and carrystep for each moved bit. Optimisation is left as an exercisefor the compiler.
- unsignedlongbitmap_find_next_zero_area_off(unsignedlong*map,unsignedlongsize,unsignedlongstart,unsignedintnr,unsignedlongalign_mask,unsignedlongalign_offset)¶
find a contiguous aligned zero area
Parameters
unsignedlong*mapThe address to base the search on
unsignedlongsizeThe bitmap size in bits
unsignedlongstartThe bitnumber to start searching at
unsignedintnrThe number of zeroed bits we’re looking for
unsignedlongalign_maskAlignment mask for zero area
unsignedlongalign_offsetAlignment offset for zero area.
Description
Thealign_mask should be one less than a power of 2; the effect is thatthe bit offset of all zero areas this function finds plusalign_offsetis multiple of that power of 2.
- voidbitmap_remap(unsignedlong*dst,constunsignedlong*src,constunsignedlong*old,constunsignedlong*new,unsignedintnbits)¶
Apply map defined by a pair of bitmaps to another bitmap
Parameters
unsignedlong*dstremapped result
constunsignedlong*srcsubset to be remapped
constunsignedlong*olddefines domain of map
constunsignedlong*newdefines range of map
unsignedintnbitsnumber of bits in each of these bitmaps
Description
Letold andnew define a mapping of bit positions, such thatwhatever position is held by the n-th set bit inold is mappedto the n-th set bit innew. In the more general case, allowingfor the possibility that the weight ‘w’ ofnew is less than theweight ofold, map the position of the n-th set bit inold tothe position of the m-th set bit innew, where m == n % w.
If either of theold andnew bitmaps are empty, or ifsrc anddst point to the same location, then this routine copiessrctodst.
The positions of unset bits inold are mapped to themselves(the identity map).
Apply the above specified mapping tosrc, placing the result indst, clearing any bits previously set indst.
For example, lets say thatold has bits 4 through 7 set, andnew has bits 12 through 15 set. This defines the mapping of bitposition 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all otherbit positions unchanged. So if saysrc comes into this routinewith bits 1, 5 and 7 set, thendst should leave with bits 1,13 and 15 set.
- intbitmap_bitremap(intoldbit,constunsignedlong*old,constunsignedlong*new,intbits)¶
Apply map defined by a pair of bitmaps to a single bit
Parameters
intoldbitbit position to be mapped
constunsignedlong*olddefines domain of map
constunsignedlong*newdefines range of map
intbitsnumber of bits in each of these bitmaps
Description
Letold andnew define a mapping of bit positions, such thatwhatever position is held by the n-th set bit inold is mappedto the n-th set bit innew. In the more general case, allowingfor the possibility that the weight ‘w’ ofnew is less than theweight ofold, map the position of the n-th set bit inold tothe position of the m-th set bit innew, where m == n % w.
The positions of unset bits inold are mapped to themselves(the identity map).
Apply the above specified mapping to bit positionoldbit, returningthe new bit position.
For example, lets say thatold has bits 4 through 7 set, andnew has bits 12 through 15 set. This defines the mapping of bitposition 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all otherbit positions unchanged. So if sayoldbit is 5, then this routinereturns 13.
- voidbitmap_from_arr32(unsignedlong*bitmap,constu32*buf,unsignedintnbits)¶
copy the contents of u32 array of bits to bitmap
Parameters
unsignedlong*bitmaparray of unsigned longs, the destination bitmap
constu32*bufarray of u32 (in host byte order), the source bitmap
unsignedintnbitsnumber of bits inbitmap
- voidbitmap_to_arr32(u32*buf,constunsignedlong*bitmap,unsignedintnbits)¶
copy the contents of bitmap to a u32 array of bits
Parameters
u32*bufarray of u32 (in host byte order), the dest bitmap
constunsignedlong*bitmaparray of unsigned longs, the source bitmap
unsignedintnbitsnumber of bits inbitmap
- voidbitmap_from_arr64(unsignedlong*bitmap,constu64*buf,unsignedintnbits)¶
copy the contents of u64 array of bits to bitmap
Parameters
unsignedlong*bitmaparray of unsigned longs, the destination bitmap
constu64*bufarray of u64 (in host byte order), the source bitmap
unsignedintnbitsnumber of bits inbitmap
- voidbitmap_to_arr64(u64*buf,constunsignedlong*bitmap,unsignedintnbits)¶
copy the contents of bitmap to a u64 array of bits
Parameters
u64*bufarray of u64 (in host byte order), the dest bitmap
constunsignedlong*bitmaparray of unsigned longs, the source bitmap
unsignedintnbitsnumber of bits inbitmap
- intbitmap_pos_to_ord(constunsignedlong*buf,unsignedintpos,unsignedintnbits)¶
find ordinal of set bit at given position in bitmap
Parameters
constunsignedlong*bufpointer to a bitmap
unsignedintposa bit position inbuf (0 <=pos <nbits)
unsignedintnbitsnumber of valid bit positions inbuf
Description
Map the bit at positionpos inbuf (of lengthnbits) to theordinal of which set bit it is. If it is not set or ifposis not a valid bit position, map to -1.
If for example, just bits 4 through 7 are set inbuf, thenposvalues 4 through 7 will get mapped to 0 through 3, respectively,and otherpos values will get mapped to -1. Whenpos value 7gets mapped to (returns)ord value 3 in this example, that meansthat bit 7 is the 3rd (starting with 0th) set bit inbuf.
The bit positions 0 throughbits are valid positions inbuf.
- voidbitmap_onto(unsignedlong*dst,constunsignedlong*orig,constunsignedlong*relmap,unsignedintbits)¶
translate one bitmap relative to another
Parameters
unsignedlong*dstresulting translated bitmap
constunsignedlong*origoriginal untranslated bitmap
constunsignedlong*relmapbitmap relative to which translated
unsignedintbitsnumber of bits in each of these bitmaps
Description
Set the n-th bit ofdst iff there exists some m such that then-th bit ofrelmap is set, the m-th bit oforig is set, andthe n-th bit ofrelmap is also the m-th _set_ bit ofrelmap.(If you understood the previous sentence the first time yourread it, you’re overqualified for your current job.)
In other words,orig is mapped onto (surjectively)dst,using the map { <n, m> | the n-th bit ofrelmap is them-th set bit ofrelmap }.
Any set bits inorig above bit number W, where W is theweight of (number of set bits in)relmap are mapped nowhere.In particular, if for all bits m set inorig, m >= W, thendst will end up empty. In situations where the possibilityof such an empty result is not desired, one way to avoid it isto use thebitmap_fold() operator, below, to first fold theorig bitmap over itself so that all its set bits x are in therange 0 <= x < W. Thebitmap_fold() operator does this bysetting the bit (m % W) indst, for each bit (m) set inorig.
- Example [1] for bitmap_onto():
Let’s sayrelmap has bits 30-39 set, andorig has bits1, 3, 5, 7, 9 and 11 set. Then on return from this routine,dst will have bits 31, 33, 35, 37 and 39 set.
When bit 0 is set inorig, it means turn on the bit indst corresponding to whatever is the first bit (if any)that is turned on inrelmap. Since bit 0 was off in theabove example, we leave off that bit (bit 30) indst.
When bit 1 is set inorig (as in the above example), itmeans turn on the bit indst corresponding to whateveris the second bit that is turned on inrelmap. The secondbit inrelmap that was turned on in the above example wasbit 31, so we turned on bit 31 indst.
Similarly, we turned on bits 33, 35, 37 and 39 indst,because they were the 4th, 6th, 8th and 10th set bitsset inrelmap, and the 4th, 6th, 8th and 10th bits oforig (i.e. bits 3, 5, 7 and 9) were also set.
When bit 11 is set inorig, it means turn on the bit indst corresponding to whatever is the twelfth bit that isturned on inrelmap. In the above example, there wereonly ten bits turned on inrelmap (30..39), so that bit11 was set inorig had no affect ondst.
- Example [2] for bitmap_fold() + bitmap_onto():
Let’s sayrelmap has these ten bits set:
40 41 42 43 45 48 53 61 74 95
(for the curious, that’s 40 plus the first ten terms of theFibonacci sequence.)
Further lets say we use the following code, invoking
bitmap_fold()then bitmap_onto, as suggested above toavoid the possibility of an emptydst result:unsigned long *tmp; // a temporary bitmap's bitsbitmap_fold(tmp, orig, bitmap_weight(relmap, bits), bits);bitmap_onto(dst, tmp, relmap, bits);
Then this table shows what various values ofdst would be, forvariousorig’s. I list the zero-based positions of each set bit.The tmp column shows the intermediate result, as computed byusing
bitmap_fold()to fold theorig bitmap modulo ten(the weight ofrelmap):
For these marked lines, if we hadn’t first donebitmap_fold()into tmp, then thedst result would have been empty.
If either oforig orrelmap is empty (no set bits), thendstwill be returned empty.
If (as explained above) the only set bits inorig are in positionsm where m >= W, (where W is the weight ofrelmap) thendst willonce again be returned empty.
All bits indst not set by the above rule are cleared.
- voidbitmap_fold(unsignedlong*dst,constunsignedlong*orig,unsignedintsz,unsignedintnbits)¶
fold larger bitmap into smaller, modulo specified size
Parameters
unsignedlong*dstresulting smaller bitmap
constunsignedlong*origoriginal larger bitmap
unsignedintszspecified size
unsignedintnbitsnumber of bits in each of these bitmaps
Description
For each bit oldbit inorig, set bit oldbit modsz indst.Clear all other bits indst. See further the comment andExample [2] forbitmap_onto() for why and how to use this.
- unsignedlongbitmap_find_next_zero_area(unsignedlong*map,unsignedlongsize,unsignedlongstart,unsignedintnr,unsignedlongalign_mask)¶
find a contiguous aligned zero area
Parameters
unsignedlong*mapThe address to base the search on
unsignedlongsizeThe bitmap size in bits
unsignedlongstartThe bitnumber to start searching at
unsignedintnrThe number of zeroed bits we’re looking for
unsignedlongalign_maskAlignment mask for zero area
Description
Thealign_mask should be one less than a power of 2; the effect is thatthe bit offset of all zero areas this function finds is multiples of thatpower of 2. Aalign_mask of 0 means no alignment is required.
- boolbitmap_or_equal(constunsignedlong*src1,constunsignedlong*src2,constunsignedlong*src3,unsignedintnbits)¶
Check whether the or of two bitmaps is equal to a third
Parameters
constunsignedlong*src1Pointer to bitmap 1
constunsignedlong*src2Pointer to bitmap 2 will be or’ed with bitmap 1
constunsignedlong*src3Pointer to bitmap 3. Compare to the result of*src1 |*src2
unsignedintnbitsnumber of bits in each of these bitmaps
Return
True if (*src1 |*src2) ==*src3, false otherwise
- voidbitmap_scatter(unsignedlong*dst,constunsignedlong*src,constunsignedlong*mask,unsignedintnbits)¶
Scatter a bitmap according to the given mask
Parameters
unsignedlong*dstscattered bitmap
constunsignedlong*srcgathered bitmap
constunsignedlong*maskmask representing bits to assign to in the scattered bitmap
unsignedintnbitsnumber of bits in each of these bitmaps
Description
Scatters bitmap with sequential bits according to the givenmask.
Example
Ifsrc bitmap = 0x005a, withmask = 0x1313,dst will be 0x0302.
Or in binary formsrcmaskdst0000000001011010 0001001100010011 0000001100000010
(Bits 0, 1, 2, 3, 4, 5 are copied to the bits 0, 1, 4, 8, 9, 12)
A more ‘visual’ description of the operation:
src: 0000000001011010 |||||| +------+||||| | +----+|||| | |+----+||| | || +-+|| | || | ||mask: ...v..vv...v..vv ...0..11...0..10dst: 0000001100000010
A relationship exists betweenbitmap_scatter() andbitmap_gather(). Seebitmap_gather() for the bitmap gather detailed operations. TL;DR:bitmap_gather() can be seen as the ‘reverse’bitmap_scatter() operation.
- voidbitmap_gather(unsignedlong*dst,constunsignedlong*src,constunsignedlong*mask,unsignedintnbits)¶
Gather a bitmap according to given mask
Parameters
unsignedlong*dstgathered bitmap
constunsignedlong*srcscattered bitmap
constunsignedlong*maskmask representing bits to extract from in the scattered bitmap
unsignedintnbitsnumber of bits in each of these bitmaps
Description
Gathers bitmap with sparse bits according to the givenmask.
Example
Ifsrc bitmap = 0x0302, withmask = 0x1313,dst will be 0x001a.
Or in binary formsrcmaskdst0000001100000010 0001001100010011 0000000000011010
(Bits 0, 1, 4, 8, 9, 12 are copied to the bits 0, 1, 2, 3, 4, 5)
A more ‘visual’ description of the operation:
mask: ...v..vv...v..vvsrc: 0000001100000010 ^ ^^ ^ 0 | || | 10 | || > 010 | |+--> 1010 | +--> 11010 +----> 011010dst: 0000000000011010
A relationship exists betweenbitmap_gather() andbitmap_scatter(). Seebitmap_scatter() for the bitmap scatter detailed operations. TL;DR:bitmap_scatter() can be seen as the ‘reverse’bitmap_gather() operation.
Suppose scattered computed using bitmap_scatter(scattered, src, mask, n).The operation bitmap_gather(result, scattered, mask, n) leads to a resultequal or equivalent to src.
The result can be ‘equivalent’ becausebitmap_scatter() andbitmap_gather()are not bijective.The result and src values are equivalent in that sense that a call tobitmap_scatter(res, src, mask, n) and a call tobitmap_scatter(res, result, mask, n) will lead to the same res value.
- voidbitmap_release_region(unsignedlong*bitmap,unsignedintpos,intorder)¶
release allocated bitmap region
Parameters
unsignedlong*bitmaparray of unsigned longs corresponding to the bitmap
unsignedintposbeginning of bit region to release
intorderregion size (log base 2 of number of bits) to release
Description
This is the complement to__bitmap_find_free_region() and releasesthe found region (by clearing it in the bitmap).
- intbitmap_allocate_region(unsignedlong*bitmap,unsignedintpos,intorder)¶
allocate bitmap region
Parameters
unsignedlong*bitmaparray of unsigned longs corresponding to the bitmap
unsignedintposbeginning of bit region to allocate
intorderregion size (log base 2 of number of bits) to allocate
Description
Allocate (set bits in) a specified region of a bitmap.
Return
0 on success, or-EBUSY if specified region wasn’tfree (not all bits were zero).
- intbitmap_find_free_region(unsignedlong*bitmap,unsignedintbits,intorder)¶
find a contiguous aligned mem region
Parameters
unsignedlong*bitmaparray of unsigned longs corresponding to the bitmap
unsignedintbitsnumber of bits in the bitmap
intorderregion size (log base 2 of number of bits) to find
Description
Find a region of free (zero) bits in abitmap ofbits bits andallocate them (set them to one). Only consider regions of lengtha power (order) of two, aligned to that power of two, whichmakes the search algorithm much faster.
Return
the bit offset in bitmap of the allocated region,or -errno on failure.
- BITMAP_FROM_U64¶
BITMAP_FROM_U64(n)
Represent u64 value in the format suitable for bitmap.
Parameters
nu64 value
Description
Linux bitmaps are internally arrays of unsigned longs, i.e. 32-bitintegers in 32-bit environment, and 64-bit integers in 64-bit one.
There are four combinations of endianness and length of the word in linuxABIs: LE64, BE64, LE32 and BE32.
On 64-bit kernels 64-bit LE and BE numbers are naturally ordered inbitmaps and therefore don’t require any special handling.
On 32-bit kernels 32-bit LE ABI orders lo word of 64-bit number in memoryprior to hi, and 32-bit BE orders hi word prior to lo. The bitmap on theother hand is represented as an array of 32-bit words and the position ofbit N may therefore be calculated as: word #(N/32) and bit #(N``32``) in thatword. For example, bit #42 is located at 10th position of 2nd word.It matches 32-bit LE ABI, and we can simply let the compiler store 64-bitvalues in memory as it usually does. But for BE we need to swap hi and lowords manually.
With all that, the macroBITMAP_FROM_U64() does explicit reordering of hi andlo parts of u64. For LE32 it does nothing, and for BE environment it swapshi and lo words, as is expected by bitmap.
- voidbitmap_from_u64(unsignedlong*dst,u64mask)¶
Check and swap words within u64.
Parameters
unsignedlong*dstdestination bitmap
u64masksource bitmap
Description
In 32-bit Big Endian kernel, when using(u32*)(:c:type:`val`)[*]to read u64 mask, we will get the wrong word.That is(u32*)(:c:type:`val`)[0] gets the upper 32 bits,but we expect the lower 32-bits of u64.
- unsignedlongbitmap_read(constunsignedlong*map,unsignedlongstart,unsignedlongnbits)¶
read a value of n-bits from the memory region
Parameters
constunsignedlong*mapaddress to the bitmap memory region
unsignedlongstartbit offset of the n-bit value
unsignedlongnbitssize of value in bits, nonzero, up to BITS_PER_LONG
Return
value ofnbits bits located at thestart bit offset within themap memory region. Fornbits = 0 andnbits > BITS_PER_LONG the returnvalue is undefined.
- voidbitmap_write(unsignedlong*map,unsignedlongvalue,unsignedlongstart,unsignedlongnbits)¶
write n-bit value within a memory region
Parameters
unsignedlong*mapaddress to the bitmap memory region
unsignedlongvaluevalue to write, clamped to nbits
unsignedlongstartbit offset of the n-bit value
unsignedlongnbitssize of value in bits, nonzero, up to BITS_PER_LONG.
Description
bitmap_write() behaves as-if implemented asnbits calls of__assign_bit(),i.e. bits beyondnbits are ignored:
- for (bit = 0; bit < nbits; bit++)
__assign_bit(start + bit, bitmap, val & BIT(bit));
Fornbits == 0 andnbits > BITS_PER_LONG no writes are performed.
Command-line Parsing¶
- intget_option(char**str,int*pint)¶
Parse integer from an option string
Parameters
char**stroption string
int*pint(optional output) integer value parsed fromstr
Description
Read an int from an option string; if available accept a subsequentcomma as well.
Whenpint is NULL the function can be used as a validator ofthe current option in the string.
Return values:0 - no int in string1 - int found, no subsequent comma2 - int found including a subsequent comma3 - hyphen found to denote a range
Leading hyphen without integer is no integer case, but we consume itfor the sake of simplification.
- char*get_options(constchar*str,intnints,int*ints)¶
Parse a string into a list of integers
Parameters
constchar*strString to be parsed
intnintssize of integer array
int*intsinteger array (must have room for at least one element)
Description
This function parses a string containing a comma-separatedlist of integers, a hyphen-separated range of _positive_ integers,or a combination of both. The parse halts when the array isfull, or when no more numbers can be retrieved from thestring.
Whennints is 0, the function just validates the givenstr andreturns the amount of parseable integers as described below.
The first element is filled by the number of collected integersin the range. The rest is what was parsed from thestr.
Return value is the character in the string which causedthe parse to end (typically a null terminator, ifstr iscompletely parseable).
- unsignedlonglongmemparse(constchar*ptr,char**retptr)¶
parse a string with mem suffixes into a number
Parameters
constchar*ptrWhere parse begins
char**retptr(output) Optional pointer to next char after parse completes
Description
Parses a string into a number. The number stored atptr ispotentially suffixed with K, M, G, T, P, E.
Error Pointers¶
- IS_ERR_VALUE¶
IS_ERR_VALUE(x)
Detect an error pointer.
Parameters
xThe pointer to check.
Description
LikeIS_ERR(), but does not generate a compiler warning if result is unused.
- void*ERR_PTR(longerror)¶
Create an error pointer.
Parameters
longerrorA negative error code.
Description
Encodeserror into a pointer value. Users should consider the resultopaque and not assume anything about how the error is encoded.
Return
A pointer witherror encoded within its value.
- INIT_ERR_PTR¶
INIT_ERR_PTR(error)
Init a const error pointer.
Parameters
errorA negative error code.
Description
LikeERR_PTR(), but usable to initialize static variables.
- longPTR_ERR(__forceconstvoid*ptr)¶
Extract the error code from an error pointer.
Parameters
__forceconstvoid*ptrAn error pointer.
Return
The error code withinptr.
- boolIS_ERR(__forceconstvoid*ptr)¶
Detect an error pointer.
Parameters
__forceconstvoid*ptrThe pointer to check.
Return
true ifptr is an error pointer, false otherwise.
- boolIS_ERR_OR_NULL(__forceconstvoid*ptr)¶
Detect an error pointer or a null pointer.
Parameters
__forceconstvoid*ptrThe pointer to check.
Description
LikeIS_ERR(), but also returns true for a null pointer.
- void*ERR_CAST(__forceconstvoid*ptr)¶
Explicitly cast an error-valued pointer to another pointer type
Parameters
__forceconstvoid*ptrThe pointer to cast.
Description
Explicitly cast an error-valued pointer to another pointer type in such away as to make it clear that’s what’s going on.
- intPTR_ERR_OR_ZERO(__forceconstvoid*ptr)¶
Extract the error code from a pointer if it has one.
Parameters
__forceconstvoid*ptrA potential error pointer.
Description
Convenience function that can be used inside a function that returnsan error code to propagate errors received as error pointers.For example,returnPTR_ERR_OR_ZERO(ptr); replaces:
if(IS_ERR(ptr))returnPTR_ERR(ptr);elsereturn0;
Return
The error code withinptr if it is an error pointer; 0 otherwise.
Sorting¶
- voidsort_r(void*base,size_tnum,size_tsize,cmp_r_func_tcmp_func,swap_r_func_tswap_func,constvoid*priv)¶
sort an array of elements
Parameters
void*basepointer to data to sort
size_tnumnumber of elements
size_tsizesize of each element
cmp_r_func_tcmp_funcpointer to comparison function
swap_r_func_tswap_funcpointer to swap function or NULL
constvoid*privthird argument passed to comparison function
Description
This function does a heapsort on the given array. You may providea swap_func function if you need to do something more than a memorycopy (e.g. fix up pointers or auxiliary data), but the built-in swapavoids a slow retpoline and so is significantly faster.
The comparison function must adhere to specific mathematicalproperties to ensure correct and stable sorting:- Antisymmetry: cmp_func(a, b) must return the opposite sign ofcmp_func(b, a).- Transitivity: if cmp_func(a, b) <= 0 and cmp_func(b, c) <= 0, thencmp_func(a, c) <= 0.
Sorting time is O(n log n) both on average and worst-case. Whilequicksort is slightly faster on average, it suffers from exploitableO(n*n) worst-case behavior and extra memory requirements that makeit less suitable for kernel use.
- voidsort_r_nonatomic(void*base,size_tnum,size_tsize,cmp_r_func_tcmp_func,swap_r_func_tswap_func,constvoid*priv)¶
sort an array of elements, with cond_resched
Parameters
void*basepointer to data to sort
size_tnumnumber of elements
size_tsizesize of each element
cmp_r_func_tcmp_funcpointer to comparison function
swap_r_func_tswap_funcpointer to swap function or NULL
constvoid*privthird argument passed to comparison function
Description
Same as sort_r, but preferred for larger arrays as it does a periodiccond_resched().
- voidlist_sort(void*priv,structlist_head*head,list_cmp_func_tcmp)¶
sort a list
Parameters
void*privprivate data, opaque to
list_sort(), passed tocmpstructlist_head*headthe list to sort
list_cmp_func_tcmpthe elements comparison function
Description
The comparison functioncmp must return > 0 ifa should sort afterb (”a >b” if you want an ascending sort), and <= 0 ifa shouldsort beforebor their original order should be preserved. It isalways called with the element that came first in the input ina,and list_sort is a stable sort, so it is not necessary to distinguishthea <b anda ==b cases.
The comparison function must adhere to specific mathematical propertiesto ensure correct and stable sorting:- Antisymmetry: cmp(a,b) must return the opposite sign ofcmp(b,a).- Transitivity: if cmp(a,b) <= 0 and cmp(b,c) <= 0, thencmp(a,c) <= 0.
This is compatible with two styles ofcmp function:- The traditional style which returns <0 / =0 / >0, or- Returning a boolean 0/1.The latter offers a chance to save a few cycles in the comparison(which is used by e.g.plug_ctx_cmp() in block/blk-mq.c).
A good way to write a multi-word comparison is:
if (a->high != b->high) return a->high > b->high;if (a->middle != b->middle) return a->middle > b->middle;return a->low > b->low;
This mergesort is as eager as possible while always performing at least2:1 balanced merges. Given two pending sublists of size 2^k, they aremerged to a size-2^(k+1) list as soon as we have 2^k following elements.
Thus, it will avoid cache thrashing as long as 3*2^k elements canfit into the cache. Not quite as good as a fully-eager bottom-upmergesort, but it does use 0.2*n fewer comparisons, so is faster inthe common case that everything fits into L1.
The merging is controlled by “count”, the number of elements in thepending lists. This is beautifully simple code, but rather subtle.
Each time we increment “count”, we set one bit (bit k) and clearbits k-1 .. 0. Each time this happens (except the very first timefor each bit, when count increments to 2^k), we merge two lists ofsize 2^k into one list of size 2^(k+1).
This merge happens exactly when the count reaches an odd multiple of2^k, which is when we have 2^k elements pending in smaller lists,so it’s safe to merge away two lists of size 2^k.
After this happens twice, we have created two lists of size 2^(k+1),which will be merged into a list of size 2^(k+2) before we createa third list of size 2^(k+1), so there are never more than two pending.
The number of pending lists of size 2^k is determined by thestate of bit k of “count” plus two extra pieces of information:
The state of bit k-1 (when k == 0, consider bit -1 always set), and
Whether the higher-order bits are zero or non-zero (i.e.is count >= 2^(k+1)).
There are six states we distinguish. “x” represents some arbitrarybits, and “y” represents some arbitrary non-zero bits:0: 00x: 0 pending of size 2^k; x pending of sizes < 2^k1: 01x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k2: x10x: 0 pending of size 2^k; 2^k + x pending of sizes < 2^k3: x11x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k4: y00x: 1 pending of size 2^k; 2^k + x pending of sizes < 2^k5: y01x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k(merge and loop back to state 2)
We gain lists of size 2^k in the 2->3 and 4->5 transitions (becausebit k-1 is set while the more significant bits are non-zero) andmerge them away in the 5->2 transition. Note in particular that justbefore the 5->2 transition, all lower-order bits are 11 (state 3),so there is one list of each smaller size.
When we reach the end of the input, we merge all the pendinglists, from smallest to largest. If you work through cases 2 to5 above, you can see that the number of elements we merge with a listof size 2^k varies from 2^(k-1) (cases 3 and 5 when x == 0) to2^(k+1) - 1 (second merge of case 5 when x == 2^(k-1) - 1).
Text Searching¶
INTRODUCTION
The textsearch infrastructure provides text searching facilities forboth linear and non-linear data. Individual search algorithms areimplemented in modules and chosen by the user.
ARCHITECTURE
User +----------------+ | finish()|<--------------(6)-----------------+ |get_next_block()|<--------------(5)---------------+ | | | Algorithm | | | | +------------------------------+ | | | init() find() destroy() | | | +------------------------------+ | | Core API ^ ^ ^ | | +---------------+ (2) (4) (8) | (1)|----->| prepare() |---+ | | | (3)|----->| find()/next() |-----------+ | | (7)|----->| destroy() |----------------------+ +----------------+ +---------------+(1) User configures a search by calling textsearch_prepare() specifying the search parameters such as the pattern and algorithm name.(2) Core requests the algorithm to allocate and initialize a search configuration according to the specified parameters.(3) User starts the search(es) by calling textsearch_find() or textsearch_next() to fetch subsequent occurrences. A state variable is provided to the algorithm to store persistent variables.(4) Core eventually resets the search offset and forwards the find() request to the algorithm.(5) Algorithm calls get_next_block() provided by the user continuously to fetch the data to be searched in block by block.(6) Algorithm invokes finish() after the last call to get_next_block to clean up any leftovers from get_next_block. (Optional)(7) User destroys the configuration by calling textsearch_destroy().(8) Core notifies the algorithm to destroy algorithm specific allocations. (Optional)
USAGE
Before a search can be performed, a configuration must be createdby calling
textsearch_prepare()specifying the searching algorithm,the pattern to look for and flags. As a flag, you can set TS_IGNORECASEto perform case insensitive matching. But it might slow downperformance of algorithm, so you should use it at own your risk.The returned configuration may then be used for an arbitraryamount of times and even in parallel as long as a separatestructts_statevariable is provided to every instance.The actual search is performed by either calling
textsearch_find_continuous()for linear data or by providingan ownget_next_block()implementation andcallingtextsearch_find(). Both functions returnthe position of the first occurrence of the pattern or UINT_MAX ifno match was found. Subsequent occurrences can be found by callingtextsearch_next()regardless of the linearity of the data.Once you’re done using a configuration it must be given back viatextsearch_destroy.
EXAMPLE:
int pos;struct ts_config *conf;struct ts_state state;const char *pattern = "chicken";const char *example = "We dance the funky chicken";conf = textsearch_prepare("kmp", pattern, strlen(pattern), GFP_KERNEL, TS_AUTOLOAD);if (IS_ERR(conf)) { err = PTR_ERR(conf); goto errout;}pos = textsearch_find_continuous(conf, &state, example, strlen(example));if (pos != UINT_MAX) panic("Oh my god, dancing chickens at %d\n", pos);textsearch_destroy(conf);- inttextsearch_register(structts_ops*ops)¶
register a textsearch module
Parameters
structts_ops*opsoperations lookup table
Description
This function must be called by textsearch modules to announcetheir presence. The specified &**ops** must havename set to aunique identifier and the callbacksfind(),init(),get_pattern(),andget_pattern_len() must be implemented.
Returns 0 or -EEXISTS if another module has already registeredwith same name.
- inttextsearch_unregister(structts_ops*ops)¶
unregister a textsearch module
Parameters
structts_ops*opsoperations lookup table
Description
This function must be called by textsearch modules to announcetheir disappearance for examples when the module gets unloaded.Theops parameter must be the same as the one during theregistration.
Returns 0 on success or -ENOENT if no matching textsearchregistration was found.
- unsignedinttextsearch_find_continuous(structts_config*conf,structts_state*state,constvoid*data,unsignedintlen)¶
search a pattern in continuous/linear data
Parameters
structts_config*confsearch configuration
structts_state*statesearch state
constvoid*datadata to search in
unsignedintlenlength of data
Description
A simplified version oftextsearch_find() for continuous/linear data.Calltextsearch_next() to retrieve subsequent matches.
Returns the position of first occurrence of the pattern orUINT_MAX if no occurrence was found.
- structts_config*textsearch_prepare(constchar*algo,constvoid*pattern,unsignedintlen,gfp_tgfp_mask,intflags)¶
Prepare a search
Parameters
constchar*algoname of search algorithm
constvoid*patternpattern data
unsignedintlenlength of pattern
gfp_tgfp_maskallocation mask
intflagssearch flags
Description
Looks up the search algorithm module and creates a new textsearchconfiguration for the specified pattern.
Note
- The format of the pattern may not be compatible between
the various search algorithms.
Returns a new textsearch configuration according to the specifiedparameters or aERR_PTR(). If a zero length pattern is passed, thisfunction returns EINVAL.
- voidtextsearch_destroy(structts_config*conf)¶
destroy a search configuration
Parameters
structts_config*confsearch configuration
Description
Releases all references of the configuration and freesup the memory.
- unsignedinttextsearch_next(structts_config*conf,structts_state*state)¶
continue searching for a pattern
Parameters
structts_config*confsearch configuration
structts_state*statesearch state
Description
Continues a search looking for more occurrences of the pattern.textsearch_find() must be called to find the first occurrencein order to reset the state.
Returns the position of the next occurrence of the pattern orUINT_MAX if not match was found.
- unsignedinttextsearch_find(structts_config*conf,structts_state*state)¶
start searching for a pattern
Parameters
structts_config*confsearch configuration
structts_state*statesearch state
Description
Returns the position of first occurrence of the pattern orUINT_MAX if no match was found.
- void*textsearch_get_pattern(structts_config*conf)¶
return head of the pattern
Parameters
structts_config*confsearch configuration
- unsignedinttextsearch_get_pattern_len(structts_config*conf)¶
return length of the pattern
Parameters
structts_config*confsearch configuration
CRC and Math Functions in Linux¶
Arithmetic Overflow Checking¶
- check_add_overflow¶
check_add_overflow(a,b,d)
Calculate addition with overflow checking
Parameters
afirst addend
bsecond addend
dpointer to store sum
Description
Returns true on wrap-around, false otherwise.
*d holds the results of the attempted addition, regardless of whetherwrap-around occurred.
- wrapping_add¶
wrapping_add(type,a,b)
Intentionally perform a wrapping addition
Parameters
typetype for result of calculation
afirst addend
bsecond addend
Description
Return the potentially wrapped-around addition withouttripping any wrap-around sanitizers that may be enabled.
- wrapping_assign_add¶
wrapping_assign_add(var,offset)
Intentionally perform a wrapping increment assignment
Parameters
varvariable to be incremented
offsetamount to add
Description
Incrementsvar byoffset with wrap-around. Returns the resultingvalue ofvar. Will not trip any wrap-around sanitizers.
Returns the new value ofvar.
- check_sub_overflow¶
check_sub_overflow(a,b,d)
Calculate subtraction with overflow checking
Parameters
aminuend; value to subtract from
bsubtrahend; value to subtract froma
dpointer to store difference
Description
Returns true on wrap-around, false otherwise.
*d holds the results of the attempted subtraction, regardless of whetherwrap-around occurred.
- wrapping_sub¶
wrapping_sub(type,a,b)
Intentionally perform a wrapping subtraction
Parameters
typetype for result of calculation
aminuend; value to subtract from
bsubtrahend; value to subtract froma
Description
Return the potentially wrapped-around subtraction withouttripping any wrap-around sanitizers that may be enabled.
- wrapping_assign_sub¶
wrapping_assign_sub(var,offset)
Intentionally perform a wrapping decrement assign
Parameters
varvariable to be decremented
offsetamount to subtract
Description
Decrementsvar byoffset with wrap-around. Returns the resultingvalue ofvar. Will not trip any wrap-around sanitizers.
Returns the new value ofvar.
- check_mul_overflow¶
check_mul_overflow(a,b,d)
Calculate multiplication with overflow checking
Parameters
afirst factor
bsecond factor
dpointer to store product
Description
Returns true on wrap-around, false otherwise.
*d holds the results of the attempted multiplication, regardless of whetherwrap-around occurred.
- wrapping_mul¶
wrapping_mul(type,a,b)
Intentionally perform a wrapping multiplication
Parameters
typetype for result of calculation
afirst factor
bsecond factor
Description
Return the potentially wrapped-around multiplication withouttripping any wrap-around sanitizers that may be enabled.
- check_shl_overflow¶
check_shl_overflow(a,s,d)
Calculate a left-shifted value and check overflow
Parameters
aValue to be shifted
sHow many bits left to shift
dPointer to where to store the result
Description
Computes*d = (a <<s)
Returns true if ‘*d’ cannot hold the result or when ‘a <<s’ doesn’tmake sense. Example conditions:
‘a <<s’ causes bits to be lost when stored in*d.
‘s’ is garbage (e.g. negative) or so large that the result of‘a <<s’ is guaranteed to be 0.
‘a’ is negative.
‘a <<s’ sets the sign bit, if any, in ‘*d’.
‘*d’ will hold the results of the attempted shift, but is notconsidered “safe for use” if true is returned.
- overflows_type¶
overflows_type(n,T)
helper for checking the overflows between value, variables, or data type
Parameters
nsource constant value or variable to be checked
Tdestination variable or data type proposed to storex
Description
Compares thex expression for whether or not it can safely fit inthe storage of the type inT.x andT can have different types.Ifx is a constant expression, this will also resolve to a constantexpression.
Return
true if overflow can occur, false otherwise.
- range_overflows¶
range_overflows(start,size,max)
Check if a range is out of bounds
Parameters
startStart of the range.
sizeSize of the range.
maxExclusive upper boundary.
Description
A strict check to determine if the range [start,start +size) isinvalid with respect to the allowable range [0,max). Any rangestarting at or beyondmax is considered an overflow, even ifsize is 0.
Return
true if the range is out of bounds.
- range_overflows_t¶
range_overflows_t(type,start,size,max)
Check if a range is out of bounds
Parameters
typeData type to use.
startStart of the range.
sizeSize of the range.
maxExclusive upper boundary.
Description
Same asrange_overflows() but forcing the parameters totype.
Return
true if the range is out of bounds.
- range_end_overflows¶
range_end_overflows(start,size,max)
Check if a range’s endpoint is out of bounds
Parameters
startStart of the range.
sizeSize of the range.
maxExclusive upper boundary.
Description
Checks only if the endpoint of a range (start +size) exceedsmax.Unlikerange_overflows(), a zero-sized range at the boundary (start ==max)is not considered an overflow. Useful for iterator-style checks.
Return
true if the endpoint exceeds the boundary.
- range_end_overflows_t¶
range_end_overflows_t(type,start,size,max)
Check if a range’s endpoint is out of bounds
Parameters
typeData type to use.
startStart of the range.
sizeSize of the range.
maxExclusive upper boundary.
Description
Same asrange_end_overflows() but forcing the parameters totype.
Return
true if the endpoint exceeds the boundary.
- castable_to_type¶
castable_to_type(n,T)
like
__same_type(), but also allows for casted literals
Parameters
nvariable or constant value
Tvariable or data type
Description
Unlike the__same_type() macro, this allows a constant value as thefirst argument. If this value would not overflow into an assignmentof the second argument’s type, it returns true. Otherwise, this fallsback to__same_type().
- size_tsize_mul(size_tfactor1,size_tfactor2)¶
Calculate size_t multiplication with saturation at SIZE_MAX
Parameters
size_tfactor1first factor
size_tfactor2second factor
Return
calculatefactor1 *factor2, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Thelvalue must be size_t to avoid implicit type conversion.
- size_tsize_add(size_taddend1,size_taddend2)¶
Calculate size_t addition with saturation at SIZE_MAX
Parameters
size_taddend1first addend
size_taddend2second addend
Return
calculateaddend1 +addend2, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Thelvalue must be size_t to avoid implicit type conversion.
- size_tsize_sub(size_tminuend,size_tsubtrahend)¶
Calculate size_t subtraction with saturation at SIZE_MAX
Parameters
size_tminuendvalue to subtract from
size_tsubtrahendvalue to subtract fromminuend
Return
calculateminuend -subtrahend, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Forcomposition with thesize_add() andsize_mul() helpers, neitherargument may be SIZE_MAX (or the result with be forced to SIZE_MAX).The lvalue must be size_t to avoid implicit type conversion.
- array_size¶
array_size(a,b)
Calculate size of 2-dimensional array.
Parameters
adimension one
bdimension two
Description
Calculates size of 2-dimensional array:a *b.
Return
number of bytes needed to represent the array or SIZE_MAX onoverflow.
- array3_size¶
array3_size(a,b,c)
Calculate size of 3-dimensional array.
Parameters
adimension one
bdimension two
cdimension three
Description
Calculates size of 3-dimensional array:a *b *c.
Return
number of bytes needed to represent the array or SIZE_MAX onoverflow.
- flex_array_size¶
flex_array_size(p,member,count)
Calculate size of a flexible array member within an enclosing structure.
Parameters
pPointer to the structure.
memberName of the flexible array member.
countNumber of elements in the array.
Description
Calculates size of a flexible array ofcount number ofmemberelements, at the end of structurep.
Return
number of bytes needed or SIZE_MAX on overflow.
- struct_size¶
struct_size(p,member,count)
Calculate size of structure with trailing flexible array.
Parameters
pPointer to the structure.
memberName of the array member.
countNumber of elements in the array.
Description
Calculates size of memory needed for structure ofp followed by anarray ofcount number ofmember elements.
Return
number of bytes needed or SIZE_MAX on overflow.
- struct_size_t¶
struct_size_t(type,member,count)
Calculate size of structure with trailing flexible array
Parameters
typestructure type name.
memberName of the array member.
countNumber of elements in the array.
Description
Calculates size of memory needed for structuretype followed by anarray ofcount number ofmember elements. Prefer usingstruct_size()when possible instead, to keep calculations associated with a specificinstance variable of typetype.
Return
number of bytes needed or SIZE_MAX on overflow.
- struct_offset¶
struct_offset(p,member)
Calculate the offset of a member within a struct
Parameters
pPointer to the struct
memberName of the member to get the offset of
Description
Calculates the offset of a particularmember of the structure pointedto byp.
Return
number of bytes to the location ofmember.
- __DEFINE_FLEX¶
__DEFINE_FLEX(type,name,member,count,trailer...)
helper macro for
DEFINE_FLEX()family. Enables caller macro to pass arbitrary trailing expressions
Parameters
typestructure type name, including “struct” keyword.
nameName for a variable to define.
memberName of the array member.
countNumber of elements in the array; must be compile-time const.
trailer...Trailing expressions for attributes and/or initializers.
- _DEFINE_FLEX¶
_DEFINE_FLEX(type,name,member,count,initializer...)
helper macro for
DEFINE_FLEX()family. Enables caller macro to pass (different) initializer.
Parameters
typestructure type name, including “struct” keyword.
nameName for a variable to define.
memberName of the array member.
countNumber of elements in the array; must be compile-time const.
initializer...Initializer expression (e.g., pass= { } at minimum).
- DEFINE_RAW_FLEX¶
DEFINE_RAW_FLEX(type,name,member,count)
Define an on-stack instance of structure with a trailing flexible array member, when it does not have a __counted_by annotation.
Parameters
typestructure type name, including “struct” keyword.
nameName for a variable to define.
memberName of the array member.
countNumber of elements in the array; must be compile-time const.
Description
Define a zeroed, on-stack, instance oftype structure with a trailingflexible array member.Use __struct_size(name) to get compile-time size of it afterwards.Use __member_size(name->member) to get compile-time size ofname members.Use STACK_FLEX_ARRAY_SIZE(name,member) to get compile-time number ofelements in arraymember.
- DEFINE_FLEX¶
DEFINE_FLEX(TYPE,NAME,MEMBER,COUNTER,COUNT)
Define an on-stack instance of structure with a trailing flexible array member.
Parameters
TYPEstructure type name, including “struct” keyword.
NAMEName for a variable to define.
MEMBERName of the array member.
COUNTERName of the __counted_by member.
COUNTNumber of elements in the array; must be compile-time const.
Description
Define a zeroed, on-stack, instance ofTYPE structure with a trailingflexible array member.Use __struct_size(NAME) to get compile-time size of it afterwards.Use __member_size(NAME->member) to get compile-time size ofNAME members.Use STACK_FLEX_ARRAY_SIZE(name,member) to get compile-time number ofelements in arraymember.
- STACK_FLEX_ARRAY_SIZE¶
STACK_FLEX_ARRAY_SIZE(name,array)
helper macro for
DEFINE_FLEX()family. Returns the number of elements inarray.
Parameters
nameName for a variable defined in
DEFINE_RAW_FLEX()/DEFINE_FLEX().arrayName of the array member.
CRC Functions¶
- uint8_tcrc4(uint8_tc,uint64_tx,intbits)¶
calculate the 4-bit crc of a value.
Parameters
uint8_tcstarting crc4
uint64_txvalue to checksum
intbitsnumber of bits inx to checksum
Description
Returns the crc4 value ofx, using polynomial 0b10111.
Thex value is treated as left-aligned, and bits abovebits are ignoredin the crc calculations.
- u8crc7_be(u8crc,constu8*buffer,size_tlen)¶
update the CRC7 for the data buffer
Parameters
u8crcprevious CRC7 value
constu8*bufferdata pointer
size_tlennumber of bytes in the buffer
Context
any
Description
Returns the updated CRC7 value.The CRC7 is left-aligned in the byte (the lsbit is always 0), as thatmakes the computation easier, and all callers want it in that form.
- voidcrc8_populate_msb(u8table[CRC8_TABLE_SIZE],u8polynomial)¶
fill crc table for given polynomial in reverse bit order.
Parameters
u8table[CRC8_TABLE_SIZE]table to be filled.
u8polynomialpolynomial for which table is to be filled.
- voidcrc8_populate_lsb(u8table[CRC8_TABLE_SIZE],u8polynomial)¶
fill crc table for given polynomial in regular bit order.
Parameters
u8table[CRC8_TABLE_SIZE]table to be filled.
u8polynomialpolynomial for which table is to be filled.
- u8crc8(constu8table[CRC8_TABLE_SIZE],constu8*pdata,size_tnbytes,u8crc)¶
calculate a crc8 over the given input data.
Parameters
constu8table[CRC8_TABLE_SIZE]crc table used for calculation.
constu8*pdatapointer to data buffer.
size_tnbytesnumber of bytes in data buffer.
u8crcprevious returned crc8 value.
- u16crc16(u16crc,constu8*p,size_tlen)¶
compute the CRC-16 for the data buffer
Parameters
u16crcprevious CRC value
constu8*pdata pointer
size_tlennumber of bytes in the buffer
Description
Returns the updated CRC value.
- u16crc_ccitt(u16crc,u8const*buffer,size_tlen)¶
recompute the CRC (CRC-CCITT variant) for the data buffer
Parameters
u16crcprevious CRC value
u8const*bufferdata pointer
size_tlennumber of bytes in the buffer
- u16crc_itu_t(u16crc,constu8*buffer,size_tlen)¶
Compute the CRC-ITU-T for the data buffer
Parameters
u16crcprevious CRC value
constu8*bufferdata pointer
size_tlennumber of bytes in the buffer
Description
Returns the updated CRC value
- u32crc32_le(u32crc,constvoid*p,size_tlen)¶
Compute least-significant-bit-first IEEE CRC-32
Parameters
u32crcInitial CRC value. ~0 (recommended) or 0 for a new CRC computation, orthe previous CRC value if computing incrementally.
constvoid*pPointer to the data buffer
size_tlenLength of data in bytes
Description
This implements the CRC variant that is often known as the IEEE CRC-32, orsimply CRC-32, and is widely used in Ethernet and other applications:
- Polynomial: x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 +
x^7 + x^5 + x^4 + x^2 + x^1 + x^0
Bit order: Least-significant-bit-first
Polynomial in integer form: 0xedb88320
This doesnot invert the CRC at the beginning or end. The caller isexpected to do that if it needs to. Inverting at both ends is recommended.
For new applications, prefer to use CRC-32C instead. Seecrc32c().
Context
Any context
Return
The new CRC value
- u32crc32_be(u32crc,constvoid*p,size_tlen)¶
Compute most-significant-bit-first IEEE CRC-32
Parameters
u32crcInitial CRC value. ~0 (recommended) or 0 for a new CRC computation, orthe previous CRC value if computing incrementally.
constvoid*pPointer to the data buffer
size_tlenLength of data in bytes
Description
crc32_be() is the same ascrc32_le() except thatcrc32_be() computes themost-significant-bit-first variant of the CRC. I.e., within each byte, themost significant bit is processed first (treated as highest order polynomialcoefficient). The same bit order is also used for the CRC value itself:
- Polynomial: x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 +
x^7 + x^5 + x^4 + x^2 + x^1 + x^0
Bit order: Most-significant-bit-first
Polynomial in integer form: 0x04c11db7
Context
Any context
Return
The new CRC value
- u32crc32c(u32crc,constvoid*p,size_tlen)¶
Compute CRC-32C
Parameters
u32crcInitial CRC value. ~0 (recommended) or 0 for a new CRC computation, orthe previous CRC value if computing incrementally.
constvoid*pPointer to the data buffer
size_tlenLength of data in bytes
Description
This implements CRC-32C, i.e. the Castagnoli CRC. This is the recommendedCRC variant to use in new applications that want a 32-bit CRC.
- Polynomial: x^32 + x^28 + x^27 + x^26 + x^25 + x^23 + x^22 + x^20 + x^19 +
x^18 + x^14 + x^13 + x^11 + x^10 + x^9 + x^8 + x^6 + x^0
Bit order: Least-significant-bit-first
Polynomial in integer form: 0x82f63b78
This doesnot invert the CRC at the beginning or end. The caller isexpected to do that if it needs to. Inverting at both ends is recommended.
Context
Any context
Return
The new CRC value
- u64crc64_be(u64crc,constvoid*p,size_tlen)¶
Calculate bitwise big-endian ECMA-182 CRC64
Parameters
u64crcseed value for computation. 0 or (u64)~0 for a new CRC calculation,or the previous crc64 value if computing incrementally.
constvoid*ppointer to buffer over which CRC64 is run
size_tlenlength of bufferp
- u64crc64_nvme(u64crc,constvoid*p,size_tlen)¶
Calculate CRC64-NVME
Parameters
u64crcseed value for computation. 0 for a new CRC calculation, or theprevious crc64 value if computing incrementally.
constvoid*ppointer to buffer over which CRC64 is run
size_tlenlength of bufferp
Description
This computes the CRC64 defined in the NVME NVM Command Set Specification,including the bitwise inversion at the beginning and end.
Base 2 log and power Functions¶
- boolis_power_of_2(unsignedlongn)¶
check if a value is a power of two
Parameters
unsignedlongnthe value to check
Description
Determine whether some value is a power of two, where zero isnot considered a power of two.
Return
true ifn is a power of 2, otherwise false.
- unsignedlong__roundup_pow_of_two(unsignedlongn)¶
round up to nearest power of two
Parameters
unsignedlongnvalue to round up
- unsignedlong__rounddown_pow_of_two(unsignedlongn)¶
round down to nearest power of two
Parameters
unsignedlongnvalue to round down
- const_ilog2¶
const_ilog2(n)
log base 2 of 32-bit or a 64-bit constant unsigned value
Parameters
nparameter
Description
Use this where sparse expects a true constant expression, e.g. for arrayindices.
- ilog2¶
ilog2(n)
log base 2 of 32-bit or a 64-bit unsigned value
Parameters
nparameter
Description
constant-capable log of base 2 calculation- this can be used to initialise global variables from constant data, hencethe massive ternary operator construction
selects the appropriately-sized optimised version depending on sizeof(n)
- roundup_pow_of_two¶
roundup_pow_of_two(n)
round the given value up to nearest power of two
Parameters
nparameter
Description
round the given value up to the nearest power of two- the result is undefined when n == 0- this can be used to initialise global variables from constant data
- rounddown_pow_of_two¶
rounddown_pow_of_two(n)
round the given value down to nearest power of two
Parameters
nparameter
Description
round the given value down to the nearest power of two- the result is undefined when n == 0- this can be used to initialise global variables from constant data
- order_base_2¶
order_base_2(n)
calculate the (rounded up) base 2 order of the argument
Parameters
nparameter
Description
- The first few values calculated by this routine:
ob2(0) = 0ob2(1) = 0ob2(2) = 1ob2(3) = 2ob2(4) = 2ob2(5) = 3... and so on.
- bits_per¶
bits_per(n)
calculate the number of bits required for the argument
Parameters
nparameter
Description
This is constant-capable and can be used for compile timeinitializations, e.g bitfields.
The first few values calculated by this routine:bf(0) = 1bf(1) = 1bf(2) = 2bf(3) = 2bf(4) = 3... and so on.
- unsignedintmax_pow_of_two_factor(unsignedintn)¶
return highest power-of-2 factor
Parameters
unsignedintnparameter
Description
find highest power-of-2 which is evenly divisible into n.0 is returned for n == 0 or 1.
Integer log and power Functions¶
- unsignedintintlog2(u32value)¶
computes log2 of a value; the result is shifted left by 24 bits
Parameters
u32valueThe value (must be != 0)
Description
to use rational values you can use the following method:
intlog2(value) = intlog2(value * 2^x) - x * 2^24
Some usecase examples:
intlog2(8) will give 3 << 24 = 3 * 2^24
intlog2(9) will give 3 << 24 + ... = 3.16... * 2^24
intlog2(1.5) = intlog2(3) - 2^24 = 0.584... * 2^24
Return
log2(value) * 2^24
- unsignedintintlog10(u32value)¶
computes log10 of a value; the result is shifted left by 24 bits
Parameters
u32valueThe value (must be != 0)
Description
to use rational values you can use the following method:
intlog10(value) = intlog10(value * 10^x) - x * 2^24
An usecase example:
intlog10(1000) will give 3 << 24 = 3 * 2^24
due to the implementation intlog10(1000) might be not exactly 3 * 2^24
look at intlog2 for similar examples
Return
log10(value) * 2^24
- u64int_pow(u64base,unsignedintexp)¶
computes the exponentiation of the given base and exponent
Parameters
u64basebase which will be raised to the given power
unsignedintexppower to be raised to
Description
Computes: pow(base, exp), i.e.base raised to theexp power
- unsignedlongint_sqrt(unsignedlongx)¶
computes the integer square root
Parameters
unsignedlongxinteger of which to calculate the sqrt
Description
Computes: floor(sqrt(x))
- u32int_sqrt64(u64x)¶
strongly typed int_sqrt function when minimum 64 bit input is expected.
Parameters
u64x64bit integer of which to calculate the sqrt
Division Functions¶
- do_div¶
do_div(n,base)
returns 2 values: calculate remainder and update new dividend
Parameters
nuint64_t dividend (will be updated)
baseuint32_t divisor
Description
Summary:uint32_tremainder=n%base;n=n/base;
Return
(uint32_t)remainder
NOTE
macro parametern is evaluated multiple times,beware of side effects!
- u64div_u64_rem(u64dividend,u32divisor,u32*remainder)¶
unsigned 64bit divide with 32bit divisor with remainder
Parameters
u64dividendunsigned 64bit dividend
u32divisorunsigned 32bit divisor
u32*remainderpointer to unsigned 32bit remainder
Return
sets*remainder, then returns dividend / divisor
Description
This is commonly provided by 32bit archs to provide an optimized 64bitdivide.
- s64div_s64_rem(s64dividend,s32divisor,s32*remainder)¶
signed 64bit divide with 32bit divisor with remainder
Parameters
s64dividendsigned 64bit dividend
s32divisorsigned 32bit divisor
s32*remainderpointer to signed 32bit remainder
Return
sets*remainder, then returns dividend / divisor
- u64div64_u64_rem(u64dividend,u64divisor,u64*remainder)¶
unsigned 64bit divide with 64bit divisor and remainder
Parameters
u64dividendunsigned 64bit dividend
u64divisorunsigned 64bit divisor
u64*remainderpointer to unsigned 64bit remainder
Return
sets*remainder, then returns dividend / divisor
- u64div64_u64(u64dividend,u64divisor)¶
unsigned 64bit divide with 64bit divisor
Parameters
u64dividendunsigned 64bit dividend
u64divisorunsigned 64bit divisor
Return
dividend / divisor
- s64div64_s64(s64dividend,s64divisor)¶
signed 64bit divide with 64bit divisor
Parameters
s64dividendsigned 64bit dividend
s64divisorsigned 64bit divisor
Return
dividend / divisor
- u64div_u64(u64dividend,u32divisor)¶
unsigned 64bit divide with 32bit divisor
Parameters
u64dividendunsigned 64bit dividend
u32divisorunsigned 32bit divisor
Description
This is the most common 64bit divide and should be used if possible,as many 32bit archs can optimize this variant better than a full 64bitdivide.
Return
dividend / divisor
- s64div_s64(s64dividend,s32divisor)¶
signed 64bit divide with 32bit divisor
Parameters
s64dividendsigned 64bit dividend
s32divisorsigned 32bit divisor
Return
dividend / divisor
- u64mul_u64_add_u64_div_u64(u64a,u64b,u64c,u64d)¶
unsigned 64bit multiply, add, and divide
Parameters
u64afirst unsigned 64bit multiplicand
u64bsecond unsigned 64bit multiplicand
u64cunsigned 64bit addend
u64dunsigned 64bit divisor
Description
Multiply two 64bit values together to generate a 128bit productadd a third value and then divide by a fourth.The Generic code divides by 0 ifd is zero and returns ~0 on overflow.Architecture specific code may trap on zero or overflow.
Return
(a *b +c) /d
- mul_u64_u64_div_u64¶
mul_u64_u64_div_u64(a,b,d)
unsigned 64bit multiply and divide
Parameters
afirst unsigned 64bit multiplicand
bsecond unsigned 64bit multiplicand
dunsigned 64bit divisor
Description
Multiply two 64bit values together to generate a 128bit productand then divide by a third value.The Generic code divides by 0 ifd is zero and returns ~0 on overflow.Architecture specific code may trap on zero or overflow.
Return
a *b /d
- mul_u64_u64_div_u64_roundup¶
mul_u64_u64_div_u64_roundup(a,b,d)
unsigned 64bit multiply and divide rounded up
Parameters
afirst unsigned 64bit multiplicand
bsecond unsigned 64bit multiplicand
dunsigned 64bit divisor
Description
Multiply two 64bit values together to generate a 128bit productand then divide and round up.The Generic code divides by 0 ifd is zero and returns ~0 on overflow.Architecture specific code may trap on zero or overflow.
Return
(a *b +d - 1) /d
- DIV64_U64_ROUND_UP¶
DIV64_U64_ROUND_UP(ll,d)
unsigned 64bit divide with 64bit divisor rounded up
Parameters
llunsigned 64bit dividend
dunsigned 64bit divisor
Description
Divide unsigned 64bit dividend by unsigned 64bit divisorand round up.
Return
dividend / divisor rounded up
- DIV_U64_ROUND_UP¶
DIV_U64_ROUND_UP(ll,d)
unsigned 64bit divide with 32bit divisor rounded up
Parameters
llunsigned 64bit dividend
dunsigned 32bit divisor
Description
Divide unsigned 64bit dividend by unsigned 32bit divisorand round up.
Return
dividend / divisor rounded up
- DIV64_U64_ROUND_CLOSEST¶
DIV64_U64_ROUND_CLOSEST(dividend,divisor)
unsigned 64bit divide with 64bit divisor rounded to nearest integer
Parameters
dividendunsigned 64bit dividend
divisorunsigned 64bit divisor
Description
Divide unsigned 64bit dividend by unsigned 64bit divisorand round to closest integer.
Return
dividend / divisor rounded to nearest integer
- DIV_U64_ROUND_CLOSEST¶
DIV_U64_ROUND_CLOSEST(dividend,divisor)
unsigned 64bit divide with 32bit divisor rounded to nearest integer
Parameters
dividendunsigned 64bit dividend
divisorunsigned 32bit divisor
Description
Divide unsigned 64bit dividend by unsigned 32bit divisorand round to closest integer.
Return
dividend / divisor rounded to nearest integer
- DIV_S64_ROUND_CLOSEST¶
DIV_S64_ROUND_CLOSEST(dividend,divisor)
signed 64bit divide with 32bit divisor rounded to nearest integer
Parameters
dividendsigned 64bit dividend
divisorsigned 32bit divisor
Description
Divide signed 64bit dividend by signed 32bit divisorand round to closest integer.
Return
dividend / divisor rounded to nearest integer
- u64roundup_u64(u64x,u32y)¶
Round up a 64bit value to the next specified 32bit multiple
Parameters
u64xthe value to up
u32y32bit multiple to round up to
Description
Roundsx to the next multiple ofy. For 32bitx values, see roundup andthe fasterround_up() for powers of 2.
Return
rounded up value.
- unsignedlonggcd(unsignedlonga,unsignedlongb)¶
calculate and return the greatest common divisor of 2 unsigned longs
Parameters
unsignedlongafirst value
unsignedlongbsecond value
UUID/GUID¶
- voidgenerate_random_uuid(unsignedcharuuid[16])¶
generate a random UUID
Parameters
unsignedcharuuid[16]where to put the generated UUID
Description
Random UUID interface
Used to create a Boot ID or a filesystem UUID/GUID, but can beuseful for other kernel drivers.
- booluuid_is_valid(constchar*uuid)¶
checks if a UUID string is valid
Parameters
constchar*uuidUUID string to check
Description
- It checks if the UUID string is following the format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
where x is a hex digit.
Return
true if input is valid UUID string.
Kernel IPC facilities¶
IPC utilities¶
- intipc_init(void)¶
initialise ipc subsystem
Parameters
voidno arguments
Description
The various sysv ipc resources (semaphores, messages and sharedmemory) are initialised.
A callback routine is registered into the memory hotplug notifierchain: since msgmni scales to lowmem this callback routine will becalled upon successful memory add / remove to recompute msmgni.
- voidipc_init_ids(structipc_ids*ids)¶
initialise ipc identifiers
Parameters
structipc_ids*idsipc identifier set
Description
Set up the sequence range to use for the ipc identifier range (limitedbelow ipc_mni) then initialise the keys hashtable and ids idr.
- voidipc_init_proc_interface(constchar*path,constchar*header,intids,int(*show)(structseq_file*,void*))¶
create a proc interface for sysipc types using a seq_file interface.
Parameters
constchar*pathPath in procfs
constchar*headerBanner to be printed at the beginning of the file.
intidsipc id table to iterate.
int(*show)(structseq_file*,void*)show routine.
- structkern_ipc_perm*ipc_findkey(structipc_ids*ids,key_tkey)¶
find a key in an ipc identifier set
Parameters
structipc_ids*idsipc identifier set
key_tkeykey to find
Description
Returns the locked pointer to the ipc structure if found or NULLotherwise. If key is found ipc points to the owning ipc structure
Called with writer ipc_ids.rwsem held.
- intipc_addid(structipc_ids*ids,structkern_ipc_perm*new,intlimit)¶
add an ipc identifier
Parameters
structipc_ids*idsipc identifier set
structkern_ipc_perm*newnew ipc permission set
intlimitlimit for the number of used ids
Description
Add an entry ‘new’ to the ipc ids idr. The permissions object isinitialised and the first free entry is set up and the index assignedis returned. The ‘new’ entry is returned in a locked state on success.
On failure the entry is not locked and a negative err-code is returned.The caller must useipc_rcu_putref() to free the identifier.
Called with writer ipc_ids.rwsem held.
- intipcget_new(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)¶
create a new ipc object
Parameters
structipc_namespace*nsipc namespace
structipc_ids*idsipc identifier set
conststructipc_ops*opsthe actual creation routine to call
structipc_params*paramsits parameters
Description
This routine is called by sys_msgget,sys_semget() andsys_shmget()when the key is IPC_PRIVATE.
- intipc_check_perms(structipc_namespace*ns,structkern_ipc_perm*ipcp,conststructipc_ops*ops,structipc_params*params)¶
check security and permissions for an ipc object
Parameters
structipc_namespace*nsipc namespace
structkern_ipc_perm*ipcpipc permission set
conststructipc_ops*opsthe actual security routine to call
structipc_params*paramsits parameters
Description
This routine is called bysys_msgget(),sys_semget() andsys_shmget()when the key is not IPC_PRIVATE and that key already exists in theds IDR.
On success, the ipc id is returned.
It is called with ipc_ids.rwsem and ipcp->lock held.
- intipcget_public(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)¶
get an ipc object or create a new one
Parameters
structipc_namespace*nsipc namespace
structipc_ids*idsipc identifier set
conststructipc_ops*opsthe actual creation routine to call
structipc_params*paramsits parameters
Description
This routine is called by sys_msgget,sys_semget() andsys_shmget()when the key is not IPC_PRIVATE.It adds a new entry if the key is not found and does some permission/ security checkings if the key is found.
On success, the ipc id is returned.
- voidipc_kht_remove(structipc_ids*ids,structkern_ipc_perm*ipcp)¶
remove an ipc from the key hashtable
Parameters
structipc_ids*idsipc identifier set
structkern_ipc_perm*ipcpipc perm structure containing the key to remove
Description
ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.
- intipc_search_maxidx(structipc_ids*ids,intlimit)¶
search for the highest assigned index
Parameters
structipc_ids*idsipc identifier set
intlimitknown upper limit for highest assigned index
Description
The function determines the highest assigned index inids. It is intendedto be called when ids->max_idx needs to be updated.Updating ids->max_idx is necessary when the current highest index ipcobject is deleted.If no ipc object is allocated, then -1 is returned.
ipc_ids.rwsem needs to be held by the caller.
- voidipc_rmid(structipc_ids*ids,structkern_ipc_perm*ipcp)¶
remove an ipc identifier
Parameters
structipc_ids*idsipc identifier set
structkern_ipc_perm*ipcpipc perm structure containing the identifier to remove
Description
ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.
- voidipc_set_key_private(structipc_ids*ids,structkern_ipc_perm*ipcp)¶
switch the key of an existing ipc to IPC_PRIVATE
Parameters
structipc_ids*idsipc identifier set
structkern_ipc_perm*ipcpipc perm structure containing the key to modify
Description
ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.
- intipcperms(structipc_namespace*ns,structkern_ipc_perm*ipcp,shortflag)¶
check ipc permissions
Parameters
structipc_namespace*nsipc namespace
structkern_ipc_perm*ipcpipc permission set
shortflagdesired permission set
Description
Check user, group, other permissions for accessto ipc resources. return 0 if allowed
flag will most probably be 0 orS_...UGO from <linux/stat.h>
- voidkernel_to_ipc64_perm(structkern_ipc_perm*in,structipc64_perm*out)¶
convert kernel ipc permissions to user
Parameters
structkern_ipc_perm*inkernel permissions
structipc64_perm*outnew style ipc permissions
Description
Turn the kernel objectin into a set of permissions descriptionsfor returning to userspace (out).
- voidipc64_perm_to_ipc_perm(structipc64_perm*in,structipc_perm*out)¶
convert new ipc permissions to old
Parameters
structipc64_perm*innew style ipc permissions
structipc_perm*outold style ipc permissions
Description
Turn the new style permissions objectin into a compatibilityobject and store it into theout pointer.
- structkern_ipc_perm*ipc_obtain_object_idr(structipc_ids*ids,intid)¶
Look for an id in the ipc ids idr and return associated ipc object.
Parameters
structipc_ids*idsipc identifier set
intidipc id to look for
Description
Call inside the RCU critical section.The ipc object isnot locked on exit.
- structkern_ipc_perm*ipc_obtain_object_check(structipc_ids*ids,intid)¶
Similar to
ipc_obtain_object_idr()but also checks the ipc object sequence number.
Parameters
structipc_ids*idsipc identifier set
intidipc id to look for
Description
Call inside the RCU critical section.The ipc object isnot locked on exit.
- intipcget(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)¶
Common sys_*
get()code
Parameters
structipc_namespace*nsnamespace
structipc_ids*idsipc identifier set
conststructipc_ops*opsoperations to be called on ipc object creation, permission checksand further checks
structipc_params*paramsthe parameters needed by the previous operations.
Description
Common routine called bysys_msgget(),sys_semget() andsys_shmget().
- intipc_update_perm(structipc64_perm*in,structkern_ipc_perm*out)¶
update the permissions of an ipc object
Parameters
structipc64_perm*inthe permission given as input.
structkern_ipc_perm*outthe permission of the ipc to set.
- structkern_ipc_perm*ipcctl_obtain_check(structipc_namespace*ns,structipc_ids*ids,intid,intcmd,structipc64_perm*perm,intextra_perm)¶
retrieve an ipc object and check permissions
Parameters
structipc_namespace*nsipc namespace
structipc_ids*idsthe table of ids where to look for the ipc
intidthe id of the ipc to retrieve
intcmdthe cmd to check
structipc64_perm*permthe permission to set
intextra_permone extra permission parameter used by msq
Description
This function does some common audit and permissions check for some IPC_XXXcmd and is called from semctl_down, shmctl_down and msgctl_down.
- It:
retrieves the ipc object with the given id in the given table.
performs some audit and permission check, depending on the given cmd
returns a pointer to the ipc object or otherwise, the correspondingerror.
Call holding the both the rwsem and the rcu read lock.
- intipc_parse_version(int*cmd)¶
ipc call version
Parameters
int*cmdpointer to command
Description
Return IPC_64 for new style IPC and IPC_OLD for old style IPC.Thecmd value is turned from an encoding command and version intojust the command code.
- structkern_ipc_perm*sysvipc_find_ipc(structipc_ids*ids,loff_t*pos)¶
Find and lock the ipc structure based on seq pos
Parameters
structipc_ids*idsipc identifier set
loff_t*posexpected position
Description
The function finds an ipc structure, based on the sequence filepositionpos. If there is no ipc structure at positionpos, thenthe successor is selected.If a structure is found, then it is locked (bothrcu_read_lock() andipc_lock_object()) andpos is set to the position needed to locatethe found ipc structure.If nothing is found (i.e. EOF),pos is not modified.
The function returns the found ipc structure, or NULL at EOF.
FIFO Buffer¶
kfifo interface¶
- DECLARE_KFIFO_PTR¶
DECLARE_KFIFO_PTR(fifo,type)
macro to declare a fifo pointer object
Parameters
fifoname of the declared fifo
typetype of the fifo elements
- DECLARE_KFIFO¶
DECLARE_KFIFO(fifo,type,size)
macro to declare a fifo object
Parameters
fifoname of the declared fifo
typetype of the fifo elements
sizethe number of elements in the fifo, this must be a power of 2
- INIT_KFIFO¶
INIT_KFIFO(fifo)
Initialize a fifo declared by DECLARE_KFIFO
Parameters
fifoname of the declared fifo datatype
- DEFINE_KFIFO¶
DEFINE_KFIFO(fifo,type,size)
macro to define and initialize a fifo
Parameters
fifoname of the declared fifo datatype
typetype of the fifo elements
sizethe number of elements in the fifo, this must be a power of 2
Note
the macro can be used for global and local fifo data type variables.
- kfifo_initialized¶
kfifo_initialized(fifo)
Check if the fifo is initialized
Parameters
fifoaddress of the fifo to check
Description
Returntrue if fifo is initialized, otherwisefalse.Assumes the fifo was 0 before.
- kfifo_esize¶
kfifo_esize(fifo)
returns the size of the element managed by the fifo
Parameters
fifoaddress of the fifo to be used
- kfifo_recsize¶
kfifo_recsize(fifo)
returns the size of the record length field
Parameters
fifoaddress of the fifo to be used
- kfifo_size¶
kfifo_size(fifo)
returns the size of the fifo in elements
Parameters
fifoaddress of the fifo to be used
- kfifo_reset¶
kfifo_reset(fifo)
removes the entire fifo content
Parameters
fifoaddress of the fifo to be used
Note
usage ofkfifo_reset() is dangerous. It should be only called when thefifo is exclusived locked or when it is secured that no other thread isaccessing the fifo.
- kfifo_reset_out¶
kfifo_reset_out(fifo)
skip fifo content
Parameters
fifoaddress of the fifo to be used
Note
The usage ofkfifo_reset_out() is safe until it will be only calledfrom the reader thread and there is only one concurrent reader. Otherwiseit is dangerous and must be handled in the same way askfifo_reset().
- kfifo_len¶
kfifo_len(fifo)
returns the number of used elements in the fifo
Parameters
fifoaddress of the fifo to be used
- kfifo_is_empty¶
kfifo_is_empty(fifo)
returns true if the fifo is empty
Parameters
fifoaddress of the fifo to be used
- kfifo_is_empty_spinlocked¶
kfifo_is_empty_spinlocked(fifo,lock)
returns true if the fifo is empty using a spinlock for locking
Parameters
fifoaddress of the fifo to be used
lockspinlock to be used for locking
- kfifo_is_empty_spinlocked_noirqsave¶
kfifo_is_empty_spinlocked_noirqsave(fifo,lock)
returns true if the fifo is empty using a spinlock for locking, doesn’t disable interrupts
Parameters
fifoaddress of the fifo to be used
lockspinlock to be used for locking
- kfifo_is_full¶
kfifo_is_full(fifo)
returns true if the fifo is full
Parameters
fifoaddress of the fifo to be used
- kfifo_avail¶
kfifo_avail(fifo)
returns the number of unused elements in the fifo
Parameters
fifoaddress of the fifo to be used
- kfifo_skip_count¶
kfifo_skip_count(fifo,count)
skip output data
Parameters
fifoaddress of the fifo to be used
countcount of data to skip
- kfifo_skip¶
kfifo_skip(fifo)
skip output data
Parameters
fifoaddress of the fifo to be used
- kfifo_peek_len¶
kfifo_peek_len(fifo)
gets the size of the next fifo record
Parameters
fifoaddress of the fifo to be used
Description
This function returns the size of the next fifo record in number of bytes.
- kfifo_alloc¶
kfifo_alloc(fifo,size,gfp_mask)
dynamically allocates a new fifo buffer
Parameters
fifopointer to the fifo
sizethe number of elements in the fifo, this must be a power of 2
gfp_maskget_free_pages mask, passed to
kmalloc()
Description
This macro dynamically allocates a new fifo buffer.
The number of elements will be rounded-up to a power of 2.The fifo will be release withkfifo_free().Return 0 if no error, otherwise an error code.
- kfifo_alloc_node¶
kfifo_alloc_node(fifo,size,gfp_mask,node)
dynamically allocates a new fifo buffer on a NUMA node
Parameters
fifopointer to the fifo
sizethe number of elements in the fifo, this must be a power of 2
gfp_maskget_free_pages mask, passed to
kmalloc()nodeNUMA node to allocate memory on
Description
This macro dynamically allocates a new fifo buffer with NUMA node awareness.
The number of elements will be rounded-up to a power of 2.The fifo will be release withkfifo_free().Return 0 if no error, otherwise an error code.
- kfifo_free¶
kfifo_free(fifo)
frees the fifo
Parameters
fifothe fifo to be freed
- kfifo_init¶
kfifo_init(fifo,buffer,size)
initialize a fifo using a preallocated buffer
Parameters
fifothe fifo to assign the buffer
bufferthe preallocated buffer to be used
sizethe size of the internal buffer, this have to be a power of 2
Description
This macro initializes a fifo using a preallocated buffer.
The number of elements will be rounded-up to a power of 2.Return 0 if no error, otherwise an error code.
- kfifo_put¶
kfifo_put(fifo,val)
put data into the fifo
Parameters
fifoaddress of the fifo to be used
valthe data to be added
Description
This macro copies the given value into the fifo.It returns 0 if the fifo was full. Otherwise it returns the numberprocessed elements.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_get¶
kfifo_get(fifo,val)
get data from the fifo
Parameters
fifoaddress of the fifo to be used
valaddress where to store the data
Description
This macro reads the data from the fifo.It returns 0 if the fifo was empty. Otherwise it returns the numberprocessed elements.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_peek¶
kfifo_peek(fifo,val)
get data from the fifo without removing
Parameters
fifoaddress of the fifo to be used
valaddress where to store the data
Description
This reads the data from the fifo without removing it from the fifo.It returns 0 if the fifo was empty. Otherwise it returns the numberprocessed elements.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_in¶
kfifo_in(fifo,buf,n)
put data into the fifo
Parameters
fifoaddress of the fifo to be used
bufthe data to be added
nnumber of elements to be added
Description
This macro copies the given buffer into the fifo and returns thenumber of copied elements.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_in_spinlocked¶
kfifo_in_spinlocked(fifo,buf,n,lock)
put data into the fifo using a spinlock for locking
Parameters
fifoaddress of the fifo to be used
bufthe data to be added
nnumber of elements to be added
lockpointer to the spinlock to use for locking
Description
This macro copies the given values buffer into the fifo and returns thenumber of copied elements.
- kfifo_in_spinlocked_noirqsave¶
kfifo_in_spinlocked_noirqsave(fifo,buf,n,lock)
put data into fifo using a spinlock for locking, don’t disable interrupts
Parameters
fifoaddress of the fifo to be used
bufthe data to be added
nnumber of elements to be added
lockpointer to the spinlock to use for locking
Description
This is a variant ofkfifo_in_spinlocked() but uses spin_lock/unlock()for locking and doesn’t disable interrupts.
- kfifo_out¶
kfifo_out(fifo,buf,n)
get data from the fifo
Parameters
fifoaddress of the fifo to be used
bufpointer to the storage buffer
nmax. number of elements to get
Description
This macro gets some data from the fifo and returns the numbers of elementscopied.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_out_spinlocked¶
kfifo_out_spinlocked(fifo,buf,n,lock)
get data from the fifo using a spinlock for locking
Parameters
fifoaddress of the fifo to be used
bufpointer to the storage buffer
nmax. number of elements to get
lockpointer to the spinlock to use for locking
Description
This macro gets the data from the fifo and returns the numbers of elementscopied.
- kfifo_out_spinlocked_noirqsave¶
kfifo_out_spinlocked_noirqsave(fifo,buf,n,lock)
get data from the fifo using a spinlock for locking, don’t disable interrupts
Parameters
fifoaddress of the fifo to be used
bufpointer to the storage buffer
nmax. number of elements to get
lockpointer to the spinlock to use for locking
Description
This is a variant ofkfifo_out_spinlocked() which uses spin_lock/unlock()for locking and doesn’t disable interrupts.
- kfifo_from_user¶
kfifo_from_user(fifo,from,len,copied)
puts some data from user space into the fifo
Parameters
fifoaddress of the fifo to be used
frompointer to the data to be added
lenthe length of the data to be added
copiedpointer to output variable to store the number of copied bytes
Description
This macro copies at mostlen bytes from thefrom into thefifo, depending of the available space and returns -EFAULT/0.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_to_user¶
kfifo_to_user(fifo,to,len,copied)
copies data from the fifo into user space
Parameters
fifoaddress of the fifo to be used
towhere the data must be copied
lenthe size of the destination buffer
copiedpointer to output variable to store the number of copied bytes
Description
This macro copies at mostlen bytes from the fifo into theto buffer and returns -EFAULT/0.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_dma_in_prepare_mapped¶
kfifo_dma_in_prepare_mapped(fifo,sgl,nents,len,dma)
setup a scatterlist for DMA input
Parameters
fifoaddress of the fifo to be used
sglpointer to the scatterlist array
nentsnumber of entries in the scatterlist array
lennumber of elements to transfer
dmamapped dma address to fill intosgl
Description
This macro fills a scatterlist for DMA input.It returns the number entries in the scatterlist array.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.
- kfifo_dma_in_finish¶
kfifo_dma_in_finish(fifo,len)
finish a DMA IN operation
Parameters
fifoaddress of the fifo to be used
lennumber of bytes to received
Description
This macro finishes a DMA IN operation. The in counter will be updated bythe len parameter. No error checking will be done.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.
- kfifo_dma_out_prepare_mapped¶
kfifo_dma_out_prepare_mapped(fifo,sgl,nents,len,dma)
setup a scatterlist for DMA output
Parameters
fifoaddress of the fifo to be used
sglpointer to the scatterlist array
nentsnumber of entries in the scatterlist array
lennumber of elements to transfer
dmamapped dma address to fill intosgl
Description
This macro fills a scatterlist for DMA output which at mostlen bytesto transfer.It returns the number entries in the scatterlist array.A zero means there is no space available and the scatterlist is not filled.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.
- kfifo_dma_out_finish¶
kfifo_dma_out_finish(fifo,len)
finish a DMA OUT operation
Parameters
fifoaddress of the fifo to be used
lennumber of bytes transferred
Description
This macro finishes a DMA OUT operation. The out counter will be updated bythe len parameter. No error checking will be done.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.
- kfifo_out_peek¶
kfifo_out_peek(fifo,buf,n)
gets some data from the fifo
Parameters
fifoaddress of the fifo to be used
bufpointer to the storage buffer
nmax. number of elements to get
Description
This macro gets the data from the fifo and returns the numbers of elementscopied. The data is not removed from the fifo.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_out_linear¶
kfifo_out_linear(fifo,tail,n)
gets a tail of/offset to available data
Parameters
fifoaddress of the fifo to be used
tailpointer to an unsigned int to store the value of tail
nmax. number of elements to point at
Description
This macro obtains the offset (tail) to the available data in the fifobuffer and returns thenumbers of elements available. It returns the available count till the endof data or till the end of the buffer. So that it can be used for lineardata processing (likememcpy() of (fifo->data +tail) with countreturned).
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_out_linear_ptr¶
kfifo_out_linear_ptr(fifo,ptr,n)
gets a pointer to the available data
Parameters
fifoaddress of the fifo to be used
ptrpointer to data to store the pointer to tail
nmax. number of elements to point at
Description
Similarly tokfifo_out_linear(), this macro obtains the pointer to theavailable data in the fifo buffer and returns the numbers of elementsavailable. It returns the available count till the end of available data ortill the end of the buffer. So that it can be used for linear dataprocessing (likememcpy() ofptr with count returned).
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
relay interface support¶
Relay interface support is designed to provide an efficient mechanismfor tools and facilities to relay large amounts of data from kernelspace to user space.
relay interface¶
- intrelay_buf_full(structrchan_buf*buf)¶
boolean, is the channel buffer full?
Parameters
structrchan_buf*bufchannel buffer
Description
Returns 1 if the buffer is full, 0 otherwise.
- voidrelay_reset(structrchan*chan)¶
reset the channel
Parameters
structrchan*chanthe channel
Description
This has the effect of erasing all data from all channel buffersand restarting the channel in its initial state. The buffersare not freed, so any mappings are still in effect.
NOTE. Care should be taken that the channel isn’t actuallybeing used by anything when this call is made.
- structrchan*relay_open(constchar*base_filename,structdentry*parent,size_tsubbuf_size,size_tn_subbufs,conststructrchan_callbacks*cb,void*private_data)¶
create a new relay channel
Parameters
constchar*base_filenamebase name of files to create
structdentry*parentdentry of parent directory,
NULLfor root directory or buffersize_tsubbuf_sizesize of sub-buffers
size_tn_subbufsnumber of sub-buffers
conststructrchan_callbacks*cbclient callback functions
void*private_datauser-defined data
Description
Returns channel pointer if successful,
NULLotherwise.Creates a channel buffer for each cpu using the sizes andattributes specified. The created channel buffer fileswill be named base_filename0...base_filenameN-1. Filepermissions will be
S_IRUSR.
- size_trelay_switch_subbuf(structrchan_buf*buf,size_tlength)¶
switch to a new sub-buffer
Parameters
structrchan_buf*bufchannel buffer
size_tlengthsize of current event
Description
Returns either the length passed in or 0 if full.
Performs sub-buffer-switch tasks such as invoking callbacks,updating padding counts, waking up readers, etc.
- voidrelay_subbufs_consumed(structrchan*chan,unsignedintcpu,size_tsubbufs_consumed)¶
update the buffer’s sub-buffers-consumed count
Parameters
structrchan*chanthe channel
unsignedintcputhe cpu associated with the channel buffer to update
size_tsubbufs_consumednumber of sub-buffers to add to current buf’s count
Description
Adds to the channel buffer’s consumed sub-buffer count.subbufs_consumed should be the number of sub-buffers newly consumed,not the total consumed.
NOTE. Kernel clients don’t need to call this function if the channelmode is ‘overwrite’.
- voidrelay_close(structrchan*chan)¶
close the channel
Parameters
structrchan*chanthe channel
Description
Closes all channel buffers and frees the channel.
- voidrelay_flush(structrchan*chan)¶
close the channel
Parameters
structrchan*chanthe channel
Description
Flushes all channel buffers, i.e. forces buffer switch.
- intrelay_mmap_prepare_buf(structrchan_buf*buf,structvm_area_desc*desc)¶
mmap channel buffer to process address space
Parameters
structrchan_buf*bufthe relay channel buffer
structvm_area_desc*descdescribing what to map
Description
Returns 0 if ok, negative on error
Caller should already have grabbed mmap_lock.
- void*relay_alloc_buf(structrchan_buf*buf,size_t*size)¶
allocate a channel buffer
Parameters
structrchan_buf*bufthe buffer struct
size_t*sizetotal size of the buffer
Description
Returns a pointer to the resulting buffer,
NULLif unsuccessful. Thepassed in size will get page aligned, if it isn’t already.
- structrchan_buf*relay_create_buf(structrchan*chan)¶
allocate and initialize a channel buffer
Parameters
structrchan*chanthe relay channel
Description
Returns channel buffer if successful,
NULLotherwise.
Parameters
structkref*kreftarget kernel reference that contains the relay channel
Description
Should only be called from
kref_put().
- voidrelay_destroy_buf(structrchan_buf*buf)¶
destroy an rchan_buf
structandassociated buffer
Parameters
structrchan_buf*bufthe buffer struct
Parameters
structkref*kreftarget kernel reference that contains the relay buffer
Description
Removes the file from the filesystem, which also frees therchan_buf_struct and the channel buffer. Should only be called from
kref_put().
- intrelay_buf_empty(structrchan_buf*buf)¶
boolean, is the channel buffer empty?
Parameters
structrchan_buf*bufchannel buffer
Description
Returns 1 if the buffer is empty, 0 otherwise.
- voidwakeup_readers(structirq_work*work)¶
wake up readers waiting on a channel
Parameters
structirq_work*workcontains the channel buffer
Description
This is the function used to defer reader waking
- void__relay_reset(structrchan_buf*buf,unsignedintinit)¶
reset a channel buffer
Parameters
structrchan_buf*bufthe channel buffer
unsignedintinit1 if this is a first-time initialization
Description
See
relay_reset()for description of effect.
- voidrelay_close_buf(structrchan_buf*buf)¶
close a channel buffer
Parameters
structrchan_buf*bufchannel buffer
Description
Marks the buffer finalized and restores the default callbacks.The channel buffer and channel buffer data structure are then freedautomatically when the last reference is given up.
- size_trelay_stats(structrchan*chan,intflags)¶
get channel buffer statistics
Parameters
structrchan*chanthe channel
intflagsselect particular information to get
Description
Returns the count of certain field that caller specifies.
Parameters
structinode*inodethe inode
structfile*filpthe file
Description
Increments the channel buffer refcount.
- intrelay_file_mmap_prepare(structvm_area_desc*desc)¶
mmap file op for relay files
Parameters
structvm_area_desc*descdescribing what to map
Description
Calls upon
relay_mmap_prepare_buf()to map the file into user space.
Parameters
structfile*filpthe file
poll_table*waitpoll table
Description
Poll implemention.
Parameters
structinode*inodethe inode
structfile*filpthe file
Description
Decrements the channel refcount, as the filesystem isno longer using it.
- size_trelay_file_read_subbuf_avail(size_tread_pos,structrchan_buf*buf)¶
return bytes available in sub-buffer
Parameters
size_tread_posfile read position
structrchan_buf*bufrelay channel buffer
- size_trelay_file_read_start_pos(structrchan_buf*buf)¶
find the first available byte to read
Parameters
structrchan_buf*bufrelay channel buffer
Description
If the read_pos is in the middle of padding, return theposition of the first actually available byte, otherwisereturn the original value.
- size_trelay_file_read_end_pos(structrchan_buf*buf,size_tread_pos,size_tcount)¶
return the new read position
Parameters
structrchan_buf*bufrelay channel buffer
size_tread_posfile read position
size_tcountnumber of bytes to be read
Module Support¶
Kernel module auto-loading¶
- int__request_module(boolwait,constchar*fmt,...)¶
try to load a kernel module
Parameters
boolwaitwait (or not) for the operation to complete
constchar*fmtprintf style format string for the name of the module
...arguments as specified in the format string
Description
Load a module using the user mode module loader. The function returnszero on success or a negative errno code or positive exit code from“modprobe” on failure. Note that a successful module load does not meanthe module did not then unload and exit on an error of its own. Callersmust check that the service they requested is now available not blindlyinvoke it.
If module auto-loading support is disabled then this functionsimply returns -ENOENT.
Module debugging¶
Enabling CONFIG_MODULE_STATS enables module debugging statistics whichare useful to monitor and root cause memory pressure issues with moduleloading. These statistics are useful to allow us to improve productionworkloads.
The current module debugging statistics supported help keep track of moduleloading failures to enable improvements either for kernel module auto-loadingusage (request_module()) or interactions with userspace. Statistics areprovided to track all possible failures in thefinit_module() path and memorywasted in this process space. Each of the failure counters are associatedto a type of module loading failure which is known to incur a certain amountof memory allocation loss. In the worst case loading a module will fail aftera 3 step memory allocation process:
memory allocated with
kernel_read_file_from_fd()module decompression processes the file read from
kernel_read_file_from_fd(), andvmap()is used to mapthe decompressed module to a new local buffer which representsa copy of the decompressed module passed from userspace. The bufferfromkernel_read_file_from_fd()is freed right away.
layout_and_allocate()allocates space for the final restingplace where we would keep the module if it were to be processedsuccessfully.
If a failure occurs after these three different allocations only onecounter will be incremented with the summation of the allocated bytes freedincurred during this failure. Likewise, if module loading failed only afterstep b) a separate counter is used and incremented for the bytes freed andnot used during both of those allocations.
Virtual memory space can be limited, for example on x86 virtual memory sizedefaults to 128 MiB. We should strive to limit and avoid wasting virtualmemory allocations when possible. These module debugging statistics helpto evaluate how much memory is being wasted on bootup due to module loadingfailures.
All counters are designed to be incremental. Atomic counters are used so toremain simple and avoid delays and deadlocks.
dup_failed_modules - tracks duplicate failed modules¶
Linked list of modules which failed to be loaded because an already existingmodule with the same name was already being processed or already loaded.Thefinit_module() system call incurs heavy virtual memory allocations. Inthe worst case anfinit_module() system call can end up allocating virtualmemory 3 times:
In practice on a typical boot today mostfinit_module() calls fail due tothe module with the same name already being loaded or about to be processed.All virtual memory allocated to these failed modules will be freed withno functional use.
To help with this the dup_failed_modules allows us to track modules whichfailed to load due to the fact that a module was already loaded or beingprocessed. There are only two points at which we can fail such calls,we list them below along with the number of virtual memory allocationcalls:
FAIL_DUP_MOD_BECOMING: at the end of
early_mod_check()beforelayout_and_allocate().- with module decompression: 2 virtual memory allocation calls- without module decompression: 1 virtual memory allocation callsFAIL_DUP_MOD_LOAD: after
layout_and_allocate()onadd_unformed_module()- with module decompression 3 virtual memory allocation calls- without module decompression 2 virtual memory allocation calls
We should strive to get this list to be as small as possible. If this listis not empty it is a reflection of possible work or optimizations possibleeither in-kernel or in userspace.
module statistics debugfs counters¶
The total amount of wasted virtual memory allocation space during moduleloading can be computed by adding the total from the summation:
invalid_kread_bytes +invalid_decompress_bytes +invalid_becoming_bytes +invalid_mod_bytes
The following debugfs counters are available to inspect module loadingfailures:
total_mod_size: total bytes ever used by all modules we’ve dealt with onthis system
total_text_size: total bytes of the .text and .init.text ELF sectionsizes we’ve dealt with on this system
invalid_kread_bytes: bytes allocated and then freed on failures whichhappen due to the initial
kernel_read_file_from_fd().kernel_read_file_from_fd()usesvmalloc(). These should typically not happen unless your system isunder memory pressure.invalid_decompress_bytes: number of bytes allocated and freed due tomemory allocations in the module decompression path that use
vmap().These typically should not happen unless your system is under memorypressure.invalid_becoming_bytes: total number of bytes allocated and freed usedto read the kernel module userspace wants us to read before wepromote it to be processed to be added to ourmodules linked list. Thesefailures can happen if we had a check in between a successful
kernel_read_file_from_fd()call and right before we allocate the our private memory for the modulewhich would be kept if the module is successfully loaded. The most commonreason for this failure is when userspace is racing to load a modulewhich it does not yet see loaded. The first module to succeed inadd_unformed_module()will add a module to ourmoduleslist andsubsequent loads of modules with the same name will error out at theend ofearly_mod_check(). The check formodule_patient_check_exists()at the end ofearly_mod_check()prevents duplicate allocationsonlayout_and_allocate()for modules already being processed. Theseduplicate failed modules are non-fatal, however they typically areindicative of userspace not seeing a module in userspace loaded yet andunnecessarily trying to load a module before the kernel even has a chanceto begin to process prior requests. Although duplicate failures can benon-fatal, we should try to reducevmalloc()pressure proactively, soideally after boot this will be close to as 0 as possible. If moduledecompression was used we also add to this counter the cost of theinitialkernel_read_file_from_fd()of the compressed module. If moduledecompression was not used the value represents the total allocated andfreed bytes inkernel_read_file_from_fd()calls for these type offailures. These failures can occur because:
module_sig_check()- module signature checks
elf_validity_cache_copy()- some ELF validation issue
early_mod_check():
blacklisting
failed to rewrite section headers
version magic
live patch requirements didn’t check out
the module was detected as being already present
invalid_mod_bytes: these are the total number of bytes allocated andfreed due to failures after we did all the sanity checks of the modulewhich userspace passed to us and after our first check that the moduleis unique. A module can still fail to load if we detect the module isloaded after we allocate space for it with
layout_and_allocate(), we dothis check right before processing the module as live and run itsinitialization routines. Note that you have a failure of this type italso means the respectivekernel_read_file_from_fd()memory space wasalso freed and not used, and so we increment this counter with twicethe size of the module. Additionally if you used module decompressionthe size of the compressed module is also added to this counter.
modcount: how many modules we’ve loaded in our kernel life time
failed_kreads: how many modules failed due to failed
kernel_read_file_from_fd()failed_decompress: how many failed module decompression attempts we’ve had.These really should not happen unless your compression / decompressionmight be broken.
failed_becoming: how many modules failed after we
kernel_read_file_from_fd()it and before we allocate memory for it withlayout_and_allocate(). Thiscounter is never incremented if you manage to validate the module andcalllayout_and_allocate()for it.failed_load_modules: how many modules failed once we’ve allocated ourprivate space for our module using
layout_and_allocate(). These failuresshould hopefully mostly be dealt with already. Races in theory couldstill exist here, but it would just mean the kernel had started processingtwo threads concurrently up toearly_mod_check()and one thread won.These failures are good signs the kernel or userspace is doing somethingseriously stupid or that could be improved. We should strive to fix these,but it is perhaps not easy to fix them. A recent example are the modulesrequests incurred for frequency modules, a separate module request wasbeing issued for each CPU on a system.
Inter Module support¶
Refer to the files in kernel/module/ for more information.
Hardware Interfaces¶
DMA Channels¶
- intrequest_dma(unsignedintdmanr,constchar*device_id)¶
request and reserve a system DMA channel
Parameters
unsignedintdmanrDMA channel number
constchar*device_idreserving device ID string, used in /proc/dma
- voidfree_dma(unsignedintdmanr)¶
free a reserved system DMA channel
Parameters
unsignedintdmanrDMA channel number
Resources Management¶
- structresource*request_resource_conflict(structresource*root,structresource*new)¶
request and reserve an I/O or memory resource
Parameters
structresource*rootroot resource descriptor
structresource*newresource descriptor desired by caller
Description
Returns 0 for success, conflict resource on error.
- intfind_next_iomem_res(resource_size_tstart,resource_size_tend,unsignedlongflags,unsignedlongdesc,structresource*res)¶
Finds the lowest iomem resource that covers part of [start..**end**].
Parameters
resource_size_tstartstart address of the resource searched for
resource_size_tendend address of same resource
unsignedlongflagsflags which the resource must have
unsignedlongdescdescriptor the resource must have
structresource*resreturn ptr, if resource found
Description
If a resource is found, returns 0 and***res is overwritten with the partof the resource that’s within [**start..**end**]; if none is found, returns-ENODEV. Returns -EINVAL for invalid parameters.
The caller must specifystart,end,flags, anddesc(which may be IORES_DESC_NONE).
- intreallocate_resource(structresource*root,structresource*old,resource_size_tnewsize,structresource_constraint*constraint)¶
allocate a slot in the resource tree given range & alignment. The resource will be relocated if the new size cannot be reallocated in the current location.
Parameters
structresource*rootroot resource descriptor
structresource*oldresource descriptor desired by caller
resource_size_tnewsizenew size of the resource descriptor
structresource_constraint*constraintthe memory range and alignment constraints to be met.
- structresource*lookup_resource(structresource*root,resource_size_tstart)¶
find an existing resource by a resource start address
Parameters
structresource*rootroot resource descriptor
resource_size_tstartresource start address
Description
Returns a pointer to the resource if found, NULL otherwise
- structresource*insert_resource_conflict(structresource*parent,structresource*new)¶
Inserts resource in the resource tree
Parameters
structresource*parentparent of the new resource
structresource*newnew resource to insert
Description
Returns 0 on success, conflict resource if the resource can’t be inserted.
This function is equivalent to request_resource_conflict when no conflicthappens. If a conflict happens, and the conflicting resourcesentirely fit within the range of the new resource, then the newresource is inserted and the conflicting resources become children ofthe new resource.
This function is intended for producers of resources, such as FW modulesand bus drivers.
- resource_size_tresource_alignment(structresource*res)¶
calculate resource’s alignment
Parameters
structresource*resresource pointer
Description
Returns alignment on success, 0 (invalid alignment) on failure.
- voidrelease_mem_region_adjustable(resource_size_tstart,resource_size_tsize)¶
release a previously reserved memory region
Parameters
resource_size_tstartresource start address
resource_size_tsizeresource region size
Description
This interface is intended for memory hot-delete. The requested regionis released from a currently busy memory resource. The requested regionmust either match exactly or fit into a single busy resource entry. Inthe latter case, the remaining resource is adjusted accordingly.
Note
Additional release conditions, such as overlapping region, can besupported after they are confirmed as valid cases.
When a busy memory resource gets split into two entries, its children arereassigned to the correct parent based on their range. If a child memoryresource overlaps with more than one parent, enhance the logic as needed.
- voidmerge_system_ram_resource(structresource*res)¶
mark the System RAM resource mergeable and try to merge it with adjacent, mergeable resources
Parameters
structresource*resresource descriptor
Description
This interface is intended for memory hotplug, whereby lots of contiguoussystem ram resources are added (e.g., via add_memory*()) by a driver, andthe actual resource boundaries are not of interest (e.g., it might berelevant for DIMMs). Only resources that are marked mergeable, that have thesame parent, and that don’t have any children are considered. All mergeableresources must be immutable during the request.
Note
The caller has to make sure that no pointers to resources that aremarked mergeable are used anymore after this call - the resource mightbe freed and the pointer might be stale!
release_mem_region_adjustable()will split on demand on memory hotunplug
- intrequest_resource(structresource*root,structresource*new)¶
request and reserve an I/O or memory resource
Parameters
structresource*rootroot resource descriptor
structresource*newresource descriptor desired by caller
Description
Returns 0 for success, negative error code on error.
- intrelease_resource(structresource*old)¶
release a previously reserved resource
Parameters
structresource*oldresource pointer
- intwalk_iomem_res_desc(unsignedlongdesc,unsignedlongflags,u64start,u64end,void*arg,int(*func)(structresource*,void*))¶
Walks through iomem resources and calls
func()with matching resource ranges. *
Parameters
unsignedlongdescI/O resource descriptor. Use IORES_DESC_NONE to skipdesc check.
unsignedlongflagsI/O resource flags
u64startstart addr
u64endend addr
void*argfunction argument for the callbackfunc
int(*func)(structresource*,void*)callback function that is called for each qualifying resource area
Description
All the memory ranges which overlap start,end and also match flags anddesc are valid candidates.
NOTE
For a new descriptor search, define a new IORES_DESC in<linux/ioport.h> and set it in ‘desc’ of a target resource entry.
- intregion_intersects(resource_size_tstart,size_tsize,unsignedlongflags,unsignedlongdesc)¶
determine intersection of region with known resources
Parameters
resource_size_tstartregion start address
size_tsizesize of region
unsignedlongflagsflags of resource (in iomem_resource)
unsignedlongdescdescriptor of resource (in iomem_resource) or IORES_DESC_NONE
Description
Check if the specified region partially overlaps or fully eclipses aresource identified byflags anddesc (optional with IORES_DESC_NONE).Return REGION_DISJOINT if the region does not overlapflags/desc,return REGION_MIXED if the region overlapsflags/desc and anotherresource, and return REGION_INTERSECTS if the region overlapsflags/descand no other defined resource. Note that REGION_INTERSECTS is alsoreturned in the case when the specified region overlaps RAM and undefinedmemory holes.
region_intersect() is used by memory remapping functions to ensurethe user is not remapping RAM and is a vast speed up over walkingthrough the resource table page by page.
- intfind_resource_space(structresource*root,structresource*new,resource_size_tsize,structresource_constraint*constraint)¶
Find empty space in the resource tree
Parameters
structresource*rootRoot resource descriptor
structresource*newResource descriptor awaiting an empty resource space
resource_size_tsizeThe minimum size of the empty space
structresource_constraint*constraintThe range and alignment constraints to be met
Description
Finds an empty space underroot in the resource tree satisfying range andalignmentconstraints.
Return
0- if successful,new members start, end, and flags are altered.-EBUSY- if no empty space was found.
- intallocate_resource(structresource*root,structresource*new,resource_size_tsize,resource_size_tmin,resource_size_tmax,resource_size_talign,resource_alignfalignf,void*alignf_data)¶
allocate empty slot in the resource tree given range & alignment. The resource will be reallocated with a new size if it was already allocated
Parameters
structresource*rootroot resource descriptor
structresource*newresource descriptor desired by caller
resource_size_tsizerequested resource region size
resource_size_tminminimum boundary to allocate
resource_size_tmaxmaximum boundary to allocate
resource_size_talignalignment requested, in bytes
resource_alignfalignfalignment function, optional, called if not NULL
void*alignf_dataarbitrary data to pass to thealignf function
- intinsert_resource(structresource*parent,structresource*new)¶
Inserts a resource in the resource tree
Parameters
structresource*parentparent of the new resource
structresource*newnew resource to insert
Description
Returns 0 on success, -EBUSY if the resource can’t be inserted.
This function is intended for producers of resources, such as FW modulesand bus drivers.
- voidinsert_resource_expand_to_fit(structresource*root,structresource*new)¶
Insert a resource into the resource tree
Parameters
structresource*rootroot resource descriptor
structresource*newnew resource to insert
Description
Insert a resource into the resource tree, possibly expanding it in orderto make it encompass any conflicting resources.
- intremove_resource(structresource*old)¶
Remove a resource in the resource tree
Parameters
structresource*oldresource to remove
Description
Returns 0 on success, -EINVAL if the resource is not valid.
This function removes a resource previously inserted byinsert_resource()orinsert_resource_conflict(), and moves the children (if any) up towhere they were before.insert_resource() andinsert_resource_conflict()insert a new resource, and move any conflicting resources down to thechildren of the new resource.
insert_resource(),insert_resource_conflict() andremove_resource() areintended for producers of resources, such as FW modules and bus drivers.
- intadjust_resource(structresource*res,resource_size_tstart,resource_size_tsize)¶
modify a resource’s start and size
Parameters
structresource*resresource to modify
resource_size_tstartnew start value
resource_size_tsizenew size
Description
Given an existing resource, change its start and size to match thearguments. Returns 0 on success, -EBUSY if it can’t fit.Existing children of the resource are assumed to be immutable.
- structresource*__request_region(structresource*parent,resource_size_tstart,resource_size_tn,constchar*name,intflags)¶
create a new busy resource region
Parameters
structresource*parentparent resource descriptor
resource_size_tstartresource start address
resource_size_tnresource region size
constchar*namereserving caller’s ID string
intflagsIO resource flags
- void__release_region(structresource*parent,resource_size_tstart,resource_size_tn)¶
release a previously reserved resource region
Parameters
structresource*parentparent resource descriptor
resource_size_tstartresource start address
resource_size_tnresource region size
Description
The described resource region must match a currently busy region.
- intdevm_request_resource(structdevice*dev,structresource*root,structresource*new)¶
request and reserve an I/O or memory resource
Parameters
structdevice*devdevice for which to request the resource
structresource*rootroot of the resource tree from which to request the resource
structresource*newdescriptor of the resource to request
Description
This is a device-managed version ofrequest_resource(). There is usuallyno need to release resources requested by this function explicitly sincethat will be taken care of when the device is unbound from its driver.If for some reason the resource needs to be released explicitly, becauseof ordering issues for example, drivers must calldevm_release_resource()rather than the regularrelease_resource().
When a conflict is detected between any existing resources and the newlyrequested resource, an error message will be printed.
Returns 0 on success or a negative error code on failure.
- voiddevm_release_resource(structdevice*dev,structresource*new)¶
release a previously requested resource
Parameters
structdevice*devdevice for which to release the resource
structresource*newdescriptor of the resource to release
Description
Releases a resource previously requested usingdevm_request_resource().
- structresource*devm_request_free_mem_region(structdevice*dev,structresource*base,unsignedlongsize)¶
find free region for device private memory
Parameters
structdevice*devdevice
structtobind the resource tostructresource*baseresource tree to look in
unsignedlongsizesize in bytes of the device memory to add
Description
This function tries to find an empty range of physical address big enough tocontain the new resource, so that it can later be hotplugged as ZONE_DEVICEmemory, which in turn allocatesstructpages.
- structresource*alloc_free_mem_region(structresource*base,unsignedlongsize,unsignedlongalign,constchar*name)¶
find a free region relative tobase
Parameters
structresource*baseresource that will parent the new resource
unsignedlongsizesize in bytes of memory to allocate frombase
unsignedlongalignalignment requirements for the allocation
constchar*nameresource name
Description
Buses like CXL, that can dynamically instantiate new memory regions,need a method to allocate physical address space for those regions.Allocate and insert a new resource to cover a free, unclaimed by adescendant ofbase, range in the span ofbase.
MTRR Handling¶
- intarch_phys_wc_add(unsignedlongbase,unsignedlongsize)¶
add a WC MTRR and handle errors if PAT is unavailable
Parameters
unsignedlongbasePhysical base address
unsignedlongsizeSize of region
Description
If PAT is available, this does nothing. If PAT is unavailable, itattempts to add a WC MTRR covering size bytes starting at base andlogs an error if this fails.
The called should provide a power of two size on an equivalentpower of two boundary.
Drivers must store the return value to pass to mtrr_del_wc_if_needed,but drivers should not try to interpret that return value.
Security Framework¶
Parameters
structfile*filethe file that needs a blob
Description
Allocate the file blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_blob_alloc(void**dest,size_tsize,gfp_tgfp)¶
allocate a composite blob
Parameters
void**destthe destination for the blob
size_tsizethe size of the blob
gfp_tgfpallocation type
Description
Allocate a blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
Parameters
structcred*credthe cred that needs a blob
gfp_tgfpallocation type
Description
Allocate the cred blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
Parameters
structinode*inodethe inode that needs a blob
gfp_tgfpallocation flags
Description
Allocate the inode blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_task_alloc(structtask_struct*task)¶
allocate a composite task blob
Parameters
structtask_struct*taskthe task that needs a blob
Description
Allocate the task blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_ipc_alloc(structkern_ipc_perm*kip)¶
allocate a composite ipc blob
Parameters
structkern_ipc_perm*kipthe ipc that needs a blob
Description
Allocate the ipc blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
Parameters
structkey*keythe key that needs a blob
Description
Allocate the key blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_msg_msg_alloc(structmsg_msg*mp)¶
allocate a composite msg_msg blob
Parameters
structmsg_msg*mpthe msg_msg that needs a blob
Description
Allocate the ipc blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_bdev_alloc(structblock_device*bdev)¶
allocate a composite block_device blob
Parameters
structblock_device*bdevthe block_device that needs a blob
Description
Allocate the block_device blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_bpf_map_alloc(structbpf_map*map)¶
allocate a composite bpf_map blob
Parameters
structbpf_map*mapthe bpf_map that needs a blob
Description
Allocate the bpf_map blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_bpf_prog_alloc(structbpf_prog*prog)¶
allocate a composite bpf_prog blob
Parameters
structbpf_prog*progthe bpf_prog that needs a blob
Description
Allocate the bpf_prog blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_bpf_token_alloc(structbpf_token*token)¶
allocate a composite bpf_token blob
Parameters
structbpf_token*tokenthe bpf_token that needs a blob
Description
Allocate the bpf_token blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_superblock_alloc(structsuper_block*sb)¶
allocate a composite superblock blob
Parameters
structsuper_block*sbthe superblock that needs a blob
Description
Allocate the superblock blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_fill_user_ctx(structlsm_ctx__user*uctx,u32*uctx_len,void*val,size_tval_len,u64id,u64flags)¶
Fill a user space lsm_ctx structure
Parameters
structlsm_ctx__user*uctxa userspace LSM context to be filled
u32*uctx_lenavailable uctx size (input), used uctx size (output)
void*valthe new LSM context value
size_tval_lenthe size of the new LSM context value
u64idLSM id
u64flagsLSM defined flags
Description
Fill all of the fields in a userspace lsm_ctx structure. Ifuctx is NULLsimply calculate the required size to output viautc_len and returnsuccess.
Returns 0 on success, -E2BIG if userspace buffer is not large enough,-EFAULT on a copyout error, -ENOMEM if memory can’t be allocated.
- intsecurity_binder_set_context_mgr(conststructcred*mgr)¶
Check if becoming binder ctx mgr is ok
Parameters
conststructcred*mgrtask credentials of current binder process
Description
Check whethermgr is allowed to be the binder context manager.
Return
Return 0 if permission is granted.
- intsecurity_binder_transaction(conststructcred*from,conststructcred*to)¶
Check if a binder transaction is allowed
Parameters
conststructcred*fromsending process
conststructcred*toreceiving process
Description
Check whetherfrom is allowed to invoke a binder transaction call toto.
Return
Returns 0 if permission is granted.
- intsecurity_binder_transfer_binder(conststructcred*from,conststructcred*to)¶
Check if a binder transfer is allowed
Parameters
conststructcred*fromsending process
conststructcred*toreceiving process
Description
Check whetherfrom is allowed to transfer a binder reference toto.
Return
Returns 0 if permission is granted.
- intsecurity_binder_transfer_file(conststructcred*from,conststructcred*to,conststructfile*file)¶
Check if a binder file xfer is allowed
Parameters
conststructcred*fromsending process
conststructcred*toreceiving process
conststructfile*filefile being transferred
Description
Check whetherfrom is allowed to transferfile toto.
Return
Returns 0 if permission is granted.
- intsecurity_ptrace_access_check(structtask_struct*child,unsignedintmode)¶
Check if tracing is allowed
Parameters
structtask_struct*childtarget process
unsignedintmodePTRACE_MODE flags
Description
Check permission before allowing the current process to trace thechildprocess. Security modules may also want to perform a process tracing checkduring an execve in the set_security or apply_creds hooks of tracing checkduring an execve in the bprm_set_creds hook of binprm_security_ops if theprocess is being traced and its security attributes would be changed by theexecve.
Return
Returns 0 if permission is granted.
- intsecurity_ptrace_traceme(structtask_struct*parent)¶
Check if tracing is allowed
Parameters
structtask_struct*parenttracing process
Description
Check that theparent process has sufficient permission to trace thecurrent process before allowing the current process to present itself to theparent process for tracing.
Return
Returns 0 if permission is granted.
- intsecurity_capget(conststructtask_struct*target,kernel_cap_t*effective,kernel_cap_t*inheritable,kernel_cap_t*permitted)¶
Get the capability sets for a process
Parameters
conststructtask_struct*targettarget process
kernel_cap_t*effectiveeffective capability set
kernel_cap_t*inheritableinheritable capability set
kernel_cap_t*permittedpermitted capability set
Description
Get theeffective,inheritable, andpermitted capability sets for thetarget process. The hook may also perform permission checking to determineif the current process is allowed to see the capability sets of thetargetprocess.
Return
Returns 0 if the capability sets were successfully obtained.
- intsecurity_capset(structcred*new,conststructcred*old,constkernel_cap_t*effective,constkernel_cap_t*inheritable,constkernel_cap_t*permitted)¶
Set the capability sets for a process
Parameters
structcred*newnew credentials for the target process
conststructcred*oldcurrent credentials of the target process
constkernel_cap_t*effectiveeffective capability set
constkernel_cap_t*inheritableinheritable capability set
constkernel_cap_t*permittedpermitted capability set
Description
Set theeffective,inheritable, andpermitted capability sets for thecurrent process.
Return
Returns 0 and updatenew if permission is granted.
- intsecurity_capable(conststructcred*cred,structuser_namespace*ns,intcap,unsignedintopts)¶
Check if a process has the necessary capability
Parameters
conststructcred*credcredentials to examine
structuser_namespace*nsuser namespace
intcapcapability requested
unsignedintoptscapability check options
Description
Check whether thetsk process has thecap capability in the indicatedcredentials.cap contains the capability <include/linux/capability.h>.opts contains options for the capable check <include/linux/security.h>.
Return
Returns 0 if the capability is granted.
- intsecurity_quotactl(intcmds,inttype,intid,conststructsuper_block*sb)¶
Check if a
quotactl()syscall is allowed for this fs
Parameters
intcmdscommands
inttypetype
intidid
conststructsuper_block*sbfilesystem
Description
Check whether the quotactl syscall is allowed for thissb.
Return
Returns 0 if permission is granted.
Parameters
structdentry*dentrydentry
Description
Check whether QUOTAON is allowed fordentry.
Return
Returns 0 if permission is granted.
- intsecurity_syslog(inttype)¶
Check if accessing the kernel message ring is allowed
Parameters
inttypeSYSLOG_ACTION_* type
Description
Check permission before accessing the kernel message ring or changinglogging to the console. See the syslog(2) manual page for an explanation ofthetype values.
Return
Return 0 if permission is granted.
- intsecurity_settime64(conststructtimespec64*ts,conststructtimezone*tz)¶
Check if changing the system time is allowed
Parameters
conststructtimespec64*tsnew time
conststructtimezone*tztimezone
Description
Check permission to change the system time,structtimespec64 is defined in<include/linux/time64.h> and timezone is defined in <include/linux/time.h>.
Return
Returns 0 if permission is granted.
- intsecurity_vm_enough_memory_mm(structmm_struct*mm,longpages)¶
Check if allocating a new mem map is allowed
Parameters
structmm_struct*mmmm struct
longpagesnumber of pages
Description
Check permissions for allocating a new virtual mapping. If all LSMs returna positive value,__vm_enough_memory() will be called with cap_sys_adminset. If at least one LSM returns 0 or negative,__vm_enough_memory() will becalled with cap_sys_admin cleared.
Return
Returns 0 if permission is granted by the LSM infrastructure to thecaller.
- intsecurity_bprm_creds_for_exec(structlinux_binprm*bprm)¶
Prepare the credentials for
exec()
Parameters
structlinux_binprm*bprmbinary program information
Description
If the setup in prepare_exec_creds did not setupbprm->cred->securityproperly for executingbprm->file, update the LSM’s portion ofbprm->cred->security to be what commit_creds needs to install for the newprogram. This hook may also optionally check permissions (e.g. fortransitions between security domains). The hook must setbprm->secureexecto 1 if AT_SECURE should be set to request libc enable secure mode.bprmcontains the linux_binprm structure.
If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check isset. The result must be the same as without this flag even if the executionwill never really happen andbprm will always be dropped.
This hook must not change current->cred, onlybprm->cred.
Return
Returns 0 if the hook is successful and permission is granted.
- intsecurity_bprm_creds_from_file(structlinux_binprm*bprm,conststructfile*file)¶
Update linux_binprm creds based on file
Parameters
structlinux_binprm*bprmbinary program information
conststructfile*fileassociated file
Description
Iffile is setpcap, suid, sgid or otherwise marked to change privilege uponexec, updatebprm->cred to reflect that change. This is called afterfinding the binary that will be executed without an interpreter. Thisensures that the credentials will not be derived from a script that thebinary will need to reopen, which when reopend may end up being a completelydifferent file. This hook may also optionally check permissions (e.g. fortransitions between security domains). The hook must setbprm->secureexecto 1 if AT_SECURE should be set to request libc enable secure mode. Thehook must add tobprm->per_clear any personality flags that should becleared from current->personality.bprm contains the linux_binprmstructure.
Return
Returns 0 if the hook is successful and permission is granted.
- intsecurity_bprm_check(structlinux_binprm*bprm)¶
Mediate binary handler search
Parameters
structlinux_binprm*bprmbinary program information
Description
This hook mediates the point when a search for a binary handler will begin.It allows a check against thebprm->cred->security value which was set inthe preceding creds_for_exec call. The argv list and envp list are reliablyavailable inbprm. This hook may be called multiple times during a singleexecve.bprm contains the linux_binprm structure.
Return
Returns 0 if the hook is successful and permission is granted.
- voidsecurity_bprm_committing_creds(conststructlinux_binprm*bprm)¶
Install creds for a process during
exec()
Parameters
conststructlinux_binprm*bprmbinary program information
Description
Prepare to install the new security attributes of a process beingtransformed by an execve operation, based on the old credentials pointed tobycurrent->cred and the information set inbprm->cred by thebprm_creds_for_exec hook.bprm points to the linux_binprm structure. Thishook is a good place to perform state changes on the process such as closingopen file descriptors to which access will no longer be granted when theattributes are changed. This is called immediately beforecommit_creds().
- voidsecurity_bprm_committed_creds(conststructlinux_binprm*bprm)¶
Tidy up after cred install during
exec()
Parameters
conststructlinux_binprm*bprmbinary program information
Description
Tidy up after the installation of the new security attributes of a processbeing transformed by an execve operation. The new credentials have, by thispoint, been set tocurrent->cred.bprm points to the linux_binprmstructure. This hook is a good place to perform state changes on theprocess such as clearing out non-inheritable signal state. This is calledimmediately aftercommit_creds().
- intsecurity_fs_context_submount(structfs_context*fc,structsuper_block*reference)¶
Initialise fc->security
Parameters
structfs_context*fcnew filesystem context
structsuper_block*referencedentry reference for submount/remount
Description
Fill out the ->security field for a new fs_context.
Return
Returns 0 on success or negative error code on failure.
- intsecurity_fs_context_dup(structfs_context*fc,structfs_context*src_fc)¶
Duplicate a fs_context LSM blob
Parameters
structfs_context*fcdestination filesystem context
structfs_context*src_fcsource filesystem context
Description
Allocate and attach a security structure to sc->security. This pointer isinitialised to NULL by the caller.fc indicates the new filesystem context.src_fc indicates the original filesystem context.
Return
Returns 0 on success or a negative error code on failure.
- intsecurity_fs_context_parse_param(structfs_context*fc,structfs_parameter*param)¶
Configure a filesystem context
Parameters
structfs_context*fcfilesystem context
structfs_parameter*paramfilesystem parameter
Description
Userspace provided a parameter to configure a superblock. The LSM canconsume the parameter or return it to the caller for use elsewhere.
Return
If the parameter is used by the LSM it should return 0, if it isreturned to the caller -ENOPARAM is returned, otherwise a negativeerror code is returned.
- intsecurity_sb_alloc(structsuper_block*sb)¶
Allocate a super_block LSM blob
Parameters
structsuper_block*sbfilesystem superblock
Description
Allocate and attach a security structure to the sb->s_security field. Thes_security field is initialized to NULL when the structure is allocated.sb contains the super_block structure to be modified.
Return
Returns 0 if operation was successful.
- voidsecurity_sb_delete(structsuper_block*sb)¶
Release super_block LSM associated objects
Parameters
structsuper_block*sbfilesystem superblock
Description
Release objects tied to a superblock (e.g. inodes).sb contains thesuper_block structure being released.
- voidsecurity_sb_free(structsuper_block*sb)¶
Free a super_block LSM blob
Parameters
structsuper_block*sbfilesystem superblock
Description
Deallocate and clear the sb->s_security field.sb contains the super_blockstructure to be modified.
- intsecurity_sb_kern_mount(conststructsuper_block*sb)¶
Check if a kernel mount is allowed
Parameters
conststructsuper_block*sbfilesystem superblock
Description
Mount thissb if allowed by permissions.
Return
Returns 0 if permission is granted.
- intsecurity_sb_show_options(structseq_file*m,structsuper_block*sb)¶
Output the mount options for a superblock
Parameters
structseq_file*moutput file
structsuper_block*sbfilesystem superblock
Description
Show (print onm) mount options for thissb.
Return
Returns 0 on success, negative values on failure.
Parameters
structdentry*dentrysuperblock handle
Description
Check permission before obtaining filesystem statistics for themntmountpoint.dentry is a handle on the superblock for the filesystem.
Return
Returns 0 if permission is granted.
- intsecurity_sb_mount(constchar*dev_name,conststructpath*path,constchar*type,unsignedlongflags,void*data)¶
Check permission for mounting a filesystem
Parameters
constchar*dev_namefilesystem backing device
conststructpath*pathmount point
constchar*typefilesystem type
unsignedlongflagsmount flags
void*datafilesystem specific data
Description
Check permission before an object specified bydev_name is mounted on themount point named bynd. For an ordinary mount,dev_name identifies adevice if the file system type requires a device. For a remount(flags & MS_REMOUNT),dev_name is irrelevant. For a loopback/bind mount(flags & MS_BIND),dev_name identifies the pathname of the object beingmounted.
Return
Returns 0 if permission is granted.
- intsecurity_sb_umount(structvfsmount*mnt,intflags)¶
Check permission for unmounting a filesystem
Parameters
structvfsmount*mntmounted filesystem
intflagsunmount flags
Description
Check permission before themnt file system is unmounted.
Return
Returns 0 if permission is granted.
- intsecurity_sb_pivotroot(conststructpath*old_path,conststructpath*new_path)¶
Check permissions for pivoting the rootfs
Parameters
conststructpath*old_pathnew location for current rootfs
conststructpath*new_pathlocation of the new rootfs
Description
Check permission before pivoting the root filesystem.
Return
Returns 0 if permission is granted.
- intsecurity_move_mount(conststructpath*from_path,conststructpath*to_path)¶
Check permissions for moving a mount
Parameters
conststructpath*from_pathsource mount point
conststructpath*to_pathdestination mount point
Description
Check permission before a mount is moved.
Return
Returns 0 if permission is granted.
- intsecurity_path_notify(conststructpath*path,u64mask,unsignedintobj_type)¶
Check if setting a watch is allowed
Parameters
conststructpath*pathfile path
u64maskevent mask
unsignedintobj_typefile path type
Description
Check permissions before setting a watch on events as defined bymask, onan object atpath, whose type is defined byobj_type.
Return
Returns 0 if permission is granted.
Parameters
structinode*inodethe inode
gfp_tgfpallocation flags
Description
Allocate and attach a security structure toinode->i_security. Thei_security field is initialized to NULL when the inode structure isallocated.
Return
Return 0 if operation was successful.
Parameters
structinode*inodethe inode
Description
Release any LSM resources associated withinode, although due to theinode’s RCU protections it is possible that the resources will not befully released until after the current RCU grace period has elapsed.
It is important for LSMs to note that despite being present in a call tosecurity_inode_free(),inode may still be referenced in a VFS path walkand calls tosecurity_inode_permission() may be made during, or after,a call tosecurity_inode_free(). For this reason the inode->i_securityfield is released via acall_rcu() callback and any LSMs which need toretain inode state for use insecurity_inode_permission() should onlyrelease that state in theinode_free_security_rcu() LSM hook callback.
- intsecurity_inode_init_security_anon(structinode*inode,conststructqstr*name,conststructinode*context_inode)¶
Initialize an anonymous inode
Parameters
structinode*inodethe inode
conststructqstr*namethe anonymous inode class
conststructinode*context_inodean optional related inode
Description
Set up the incore security field for the new anonymous inode and returnwhether the inode creation is permitted by the security module or not.
Return
Returns 0 on success, -EACCES if the security module denies thecreation of this inode, or another -errno upon other errors.
- voidsecurity_path_post_mknod(structmnt_idmap*idmap,structdentry*dentry)¶
Update inode security after reg file creation
Parameters
structmnt_idmap*idmapidmap of the mount
structdentry*dentrynew file
Description
Update inode security field after a regular file has been created.
- intsecurity_path_rmdir(conststructpath*dir,structdentry*dentry)¶
Check if removing a directory is allowed
Parameters
conststructpath*dirparent directory
structdentry*dentrydirectory to remove
Description
Check the permission to remove a directory.
Return
Returns 0 if permission is granted.
- intsecurity_path_symlink(conststructpath*dir,structdentry*dentry,constchar*old_name)¶
Check if creating a symbolic link is allowed
Parameters
conststructpath*dirparent directory
structdentry*dentrysymbolic link
constchar*old_namefile pathname
Description
Check the permission to create a symbolic link to a file.
Return
Returns 0 if permission is granted.
- intsecurity_path_link(structdentry*old_dentry,conststructpath*new_dir,structdentry*new_dentry)¶
Check if creating a hard link is allowed
Parameters
structdentry*old_dentryexisting file
conststructpath*new_dirnew parent directory
structdentry*new_dentrynew link
Description
Check permission before creating a new hard link to a file.
Return
Returns 0 if permission is granted.
Parameters
conststructpath*pathfile
Description
Check permission before truncating the file indicated by path. Note thattruncation permissions may also be checked based on already opened files,using thesecurity_file_truncate() hook.
Return
Returns 0 if permission is granted.
- intsecurity_path_chmod(conststructpath*path,umode_tmode)¶
Check if changing the file’s mode is allowed
Parameters
conststructpath*pathfile
umode_tmodenew mode
Description
Check for permission to change a mode of the filepath. The new mode isspecified inmode which is a bitmask of constants from<include/uapi/linux/stat.h>.
Return
Returns 0 if permission is granted.
- intsecurity_path_chown(conststructpath*path,kuid_tuid,kgid_tgid)¶
Check if changing the file’s owner/group is allowed
Parameters
conststructpath*pathfile
kuid_tuidfile owner
kgid_tgidfile group
Description
Check for permission to change owner/group of a file or directory.
Return
Returns 0 if permission is granted.
Parameters
conststructpath*pathdirectory
Description
Check for permission to change root directory.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_create_tmpfile(structmnt_idmap*idmap,structinode*inode)¶
Update inode security of new tmpfile
Parameters
structmnt_idmap*idmapidmap of the mount
structinode*inodeinode of the new tmpfile
Description
Update inode security data after a tmpfile has been created.
- intsecurity_inode_link(structdentry*old_dentry,structinode*dir,structdentry*new_dentry)¶
Check if creating a hard link is allowed
Parameters
structdentry*old_dentryexisting file
structinode*dirnew parent directory
structdentry*new_dentrynew link
Description
Check permission before creating a new hard link to a file.
Return
Returns 0 if permission is granted.
- intsecurity_inode_unlink(structinode*dir,structdentry*dentry)¶
Check if removing a hard link is allowed
Parameters
structinode*dirparent directory
structdentry*dentryfile
Description
Check the permission to remove a hard link to a file.
Return
Returns 0 if permission is granted.
- intsecurity_inode_symlink(structinode*dir,structdentry*dentry,constchar*old_name)¶
Check if creating a symbolic link is allowed
Parameters
structinode*dirparent directory
structdentry*dentrysymbolic link
constchar*old_nameexisting filename
Description
Check the permission to create a symbolic link to a file.
Return
Returns 0 if permission is granted.
- intsecurity_inode_rmdir(structinode*dir,structdentry*dentry)¶
Check if removing a directory is allowed
Parameters
structinode*dirparent directory
structdentry*dentrydirectory to be removed
Description
Check the permission to remove a directory.
Return
Returns 0 if permission is granted.
- intsecurity_inode_mknod(structinode*dir,structdentry*dentry,umode_tmode,dev_tdev)¶
Check if creating a special file is allowed
Parameters
structinode*dirparent directory
structdentry*dentrynew file
umode_tmodenew file mode
dev_tdevdevice number
Description
Check permissions when creating a special file (or a socket or a fifo filecreated via the mknod system call). Note that if mknod operation is beingdone for a regular file, then the create hook will be called and not thishook.
Return
Returns 0 if permission is granted.
- intsecurity_inode_rename(structinode*old_dir,structdentry*old_dentry,structinode*new_dir,structdentry*new_dentry,unsignedintflags)¶
Check if renaming a file is allowed
Parameters
structinode*old_dirparent directory of the old file
structdentry*old_dentrythe old file
structinode*new_dirparent directory of the new file
structdentry*new_dentrythe new file
unsignedintflagsflags
Description
Check for permission to rename a file or directory.
Return
Returns 0 if permission is granted.
Parameters
structdentry*dentrylink
Description
Check the permission to read the symbolic link.
Return
Returns 0 if permission is granted.
- intsecurity_inode_follow_link(structdentry*dentry,structinode*inode,boolrcu)¶
Check if following a symbolic link is allowed
Parameters
structdentry*dentrylink dentry
structinode*inodelink inode
boolrcutrue if in RCU-walk mode
Description
Check permission to follow a symbolic link when looking up a pathname. Ifrcu is true,inode is not stable.
Return
Returns 0 if permission is granted.
Parameters
structinode*inodeinode
intmaskaccess mask
Description
Check permission before accessing an inode. This hook is called by theexisting Linux permission function, so a security module can use it toprovide additional checking for existing Linux permission checks. Noticethat this hook is called when a file is opened (as well as many otheroperations), whereas the file_security_ops permission hook is called whenthe actual read/write operations are performed.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_setattr(structmnt_idmap*idmap,structdentry*dentry,intia_valid)¶
Update the inode after a setattr operation
Parameters
structmnt_idmap*idmapidmap of the mount
structdentry*dentryfile
intia_validfile attributes set
Description
Update inode security field after successful setting file attributes.
Parameters
conststructpath*pathfile
Description
Check permission before obtaining file attributes.
Return
Returns 0 if permission is granted.
- intsecurity_inode_setxattr(structmnt_idmap*idmap,structdentry*dentry,constchar*name,constvoid*value,size_tsize,intflags)¶
Check if setting file xattrs is allowed
Parameters
structmnt_idmap*idmapidmap of the mount
structdentry*dentryfile
constchar*namexattr name
constvoid*valuexattr value
size_tsizesize of xattr value
intflagsflags
Description
This hook performs the desired permission checks before setting the extendedattributes (xattrs) ondentry. It is important to note that we have someadditional logic before the main LSM implementation calls to detect if weneed to perform an additional capability check at the LSM layer.
Normally we enforce a capability check prior to executing the various LSMhook implementations, but if a LSM wants to avoid this capability check,it can register a ‘inode_xattr_skipcap’ hook and return a value of 1 forxattrs that it wants to avoid the capability check, leaving the LSM fullyresponsible for enforcing the access control for the specific xattr. If allof the enabled LSMs refrain from registering a ‘inode_xattr_skipcap’ hook,or return a 0 (the default return value), the capability check is stillperformed. If no ‘inode_xattr_skipcap’ hooks are registered the capabilitycheck is performed.
Return
Returns 0 if permission is granted.
- intsecurity_inode_set_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name,structposix_acl*kacl)¶
Check if setting posix acls is allowed
Parameters
structmnt_idmap*idmapidmap of the mount
structdentry*dentryfile
constchar*acl_nameacl name
structposix_acl*kaclacl struct
Description
Check permission before setting posix acls, the posix acls inkacl areidentified byacl_name.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_set_acl(structdentry*dentry,constchar*acl_name,structposix_acl*kacl)¶
Update inode security from posix acls set
Parameters
structdentry*dentryfile
constchar*acl_nameacl name
structposix_acl*kaclacl struct
Description
Update inode security data after successfully setting posix acls ondentry.The posix acls inkacl are identified byacl_name.
- intsecurity_inode_get_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)¶
Check if reading posix acls is allowed
Parameters
structmnt_idmap*idmapidmap of the mount
structdentry*dentryfile
constchar*acl_nameacl name
Description
Check permission before getting osix acls, the posix acls are identified byacl_name.
Return
Returns 0 if permission is granted.
- intsecurity_inode_remove_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)¶
Check if removing a posix acl is allowed
Parameters
structmnt_idmap*idmapidmap of the mount
structdentry*dentryfile
constchar*acl_nameacl name
Description
Check permission before removing posix acls, the posix acls are identifiedbyacl_name.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_remove_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)¶
Update inode security after rm posix acls
Parameters
structmnt_idmap*idmapidmap of the mount
structdentry*dentryfile
constchar*acl_nameacl name
Description
Update inode security data after successfully removing posix acls ondentry inidmap. The posix acls are identified byacl_name.
- voidsecurity_inode_post_setxattr(structdentry*dentry,constchar*name,constvoid*value,size_tsize,intflags)¶
Update the inode after a setxattr operation
Parameters
structdentry*dentryfile
constchar*namexattr name
constvoid*valuexattr value
size_tsizexattr value size
intflagsflags
Description
Update inode security field after successful setxattr operation.
Parameters
structdentry*dentryfile
constchar*namexattr name
Description
Check permission before obtaining the extended attributes identified byname fordentry.
Return
Returns 0 if permission is granted.
Parameters
structdentry*dentryfile
Description
Check permission before obtaining the list of extended attribute names fordentry.
Return
Returns 0 if permission is granted.
- intsecurity_inode_removexattr(structmnt_idmap*idmap,structdentry*dentry,constchar*name)¶
Check if removing an xattr is allowed
Parameters
structmnt_idmap*idmapidmap of the mount
structdentry*dentryfile
constchar*namexattr name
Description
This hook performs the desired permission checks before setting the extendedattributes (xattrs) ondentry. It is important to note that we have someadditional logic before the main LSM implementation calls to detect if weneed to perform an additional capability check at the LSM layer.
Normally we enforce a capability check prior to executing the various LSMhook implementations, but if a LSM wants to avoid this capability check,it can register a ‘inode_xattr_skipcap’ hook and return a value of 1 forxattrs that it wants to avoid the capability check, leaving the LSM fullyresponsible for enforcing the access control for the specific xattr. If allof the enabled LSMs refrain from registering a ‘inode_xattr_skipcap’ hook,or return a 0 (the default return value), the capability check is stillperformed. If no ‘inode_xattr_skipcap’ hooks are registered the capabilitycheck is performed.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_removexattr(structdentry*dentry,constchar*name)¶
Update the inode after a removexattr op
Parameters
structdentry*dentryfile
constchar*namexattr name
Description
Update the inode after a successful removexattr operation.
- intsecurity_inode_file_setattr(structdentry*dentry,structfile_kattr*fa)¶
check if setting fsxattr is allowed
Parameters
structdentry*dentryfile to set filesystem extended attributes on
structfile_kattr*faextended attributes to set on the inode
Description
Called whenfile_setattr() syscall or FS_IOC_FSSETXATTR ioctl() is called oninode
Return
Returns 0 if permission is granted.
- intsecurity_inode_file_getattr(structdentry*dentry,structfile_kattr*fa)¶
check if retrieving fsxattr is allowed
Parameters
structdentry*dentryfile to retrieve filesystem extended attributes from
structfile_kattr*faextended attributes to get
Description
Called whenfile_getattr() syscall or FS_IOC_FSGETXATTR ioctl() is called oninode
Return
Returns 0 if permission is granted.
- intsecurity_inode_need_killpriv(structdentry*dentry)¶
Check if
security_inode_killpriv()required
Parameters
structdentry*dentryassociated dentry
Description
Called when an inode has been changed to determine ifsecurity_inode_killpriv() should be called.
Return
Return <0 on error to abort the inode change operation, return 0 ifsecurity_inode_killpriv() does not need to be called, return >0 ifsecurity_inode_killpriv() does need to be called.
- intsecurity_inode_killpriv(structmnt_idmap*idmap,structdentry*dentry)¶
The setuid bit is removed, update LSM state
Parameters
structmnt_idmap*idmapidmap of the mount
structdentry*dentryassociated dentry
Description
Thedentry’s setuid bit is being removed. Remove similar security labels.Called with the dentry->d_inode->i_mutex held.
Return
Return 0 on success. If error is returned, then the operationcausing setuid bit removal is failed.
- intsecurity_inode_getsecurity(structmnt_idmap*idmap,structinode*inode,constchar*name,void**buffer,boolalloc)¶
Get the xattr security label of an inode
Parameters
structmnt_idmap*idmapidmap of the mount
structinode*inodeinode
constchar*namexattr name
void**buffersecurity label buffer
boolallocallocation flag
Description
Retrieve a copy of the extended attribute representation of the securitylabel associated withname forinode viabuffer. Note thatname is theremainder of the attribute name after the security prefix has been removed.alloc is used to specify if the call should return a value via the bufferor just the value length.
Return
Returns size of buffer on success.
- intsecurity_inode_setsecurity(structinode*inode,constchar*name,constvoid*value,size_tsize,intflags)¶
Set the xattr security label of an inode
Parameters
structinode*inodeinode
constchar*namexattr name
constvoid*valuesecurity label
size_tsizelength of security label
intflagsflags
Description
Set the security label associated withname forinode from the extendedattribute valuevalue.size indicates the size of thevalue in bytes.flags may be XATTR_CREATE, XATTR_REPLACE, or 0. Note thatname is theremainder of the attribute name after the security. prefix has been removed.
Return
Returns 0 on success.
Parameters
structinode*inodeinode
structlsm_prop*proplsm specific information to return
Description
Get the lsm specific information associated with the node.
- intsecurity_kernfs_init_security(structkernfs_node*kn_dir,structkernfs_node*kn)¶
Init LSM context for a kernfs node
Parameters
structkernfs_node*kn_dirparent kernfs node
structkernfs_node*knthe kernfs node to initialize
Description
Initialize the security context of a newly created kernfs node based on itsown and its parent’s attributes.
Return
Returns 0 if permission is granted.
Parameters
structfile*filefile
intmaskrequested permissions
Description
Check file permissions before accessing an open file. This hook is calledby various operations that read or write files. A security module can usethis hook to perform additional checking on these operations, e.g. torevalidate permissions on use to support privilege bracketing or policychanges. Notice that this hook is used when the actual read/writeoperations are performed, whereas the inode_security_ops hook is called whena file is opened (as well as many other operations). Although this hook canbe used to revalidate permissions for various system call operations thatread or write files, it does not address the revalidation of permissions formemory-mapped files. Security modules must handle this separately if theyneed such revalidation.
Return
Returns 0 if permission is granted.
Parameters
structfile*filethe file
Description
Allocate and attach a security structure to the file->f_security field. Thesecurity field is initialized to NULL when the structure is first created.
Return
Return 0 if the hook is successful and permission is granted.
Parameters
structfile*filethe file
Description
Perform actions before releasing the last reference to a file.
Parameters
structfile*filethe file
Description
Deallocate and free any security structures stored in file->f_security.
- intsecurity_mmap_file(structfile*file,unsignedlongprot,unsignedlongflags)¶
Check if mmap’ing a file is allowed
Parameters
structfile*filefile
unsignedlongprotprotection applied by the kernel
unsignedlongflagsflags
Description
Check permissions for a mmap operation. Thefile may be NULL, e.g. ifmapping anonymous memory.
Return
Returns 0 if permission is granted.
- intsecurity_mmap_addr(unsignedlongaddr)¶
Check if mmap’ing an address is allowed
Parameters
unsignedlongaddraddress
Description
Check permissions for a mmap operation ataddr.
Return
Returns 0 if permission is granted.
- intsecurity_file_mprotect(structvm_area_struct*vma,unsignedlongreqprot,unsignedlongprot)¶
Check if changing memory protections is allowed
Parameters
structvm_area_struct*vmamemory region
unsignedlongreqprotapplication requested protection
unsignedlongprotprotection applied by the kernel
Description
Check permissions before changing memory access permissions.
Return
Returns 0 if permission is granted.
Parameters
structfile*filefile
unsignedintcmdlock operation (e.g. F_RDLCK, F_WRLCK)
Description
Check permission before performing file locking operations. Note the hookmediates both flock and fcntl style locks.
Return
Returns 0 if permission is granted.
- intsecurity_file_fcntl(structfile*file,unsignedintcmd,unsignedlongarg)¶
Check if fcntl() op is allowed
Parameters
structfile*filefile
unsignedintcmdfcntl command
unsignedlongargcommand argument
Description
Check permission before allowing the file operation specified bycmd frombeing performed on the filefile. Note thatarg sometimes represents auser space pointer; in other cases, it may be a simple integer value. Whenarg represents a user space pointer, it should never be used by thesecurity module.
Return
Returns 0 if permission is granted.
Parameters
structfile*filethe file
Description
Save owner security information (typically from current->security) infile->f_security for later use by the send_sigiotask hook.
This hook is called with file->f_owner.lock held.
Return
Returns 0 on success.
- intsecurity_file_send_sigiotask(structtask_struct*tsk,structfown_struct*fown,intsig)¶
Check if sending SIGIO/SIGURG is allowed
Parameters
structtask_struct*tsktarget task
structfown_struct*fownsignal sender
intsigsignal to be sent, SIGIO is sent if 0
Description
Check permission for the file ownerfown to send SIGIO or SIGURG to theprocesstsk. Note that this hook is sometimes called from interrupt. Notethat the fown_struct,fown, is never outside the context of astructfile,so the file structure (and associated security information) can always beobtained: container_of(fown,structfile, f_owner).
Return
Returns 0 if permission is granted.
Parameters
structfile*filefile being received
Description
This hook allows security modules to control the ability of a process toreceive an open file descriptor via socket IPC.
Return
Returns 0 if permission is granted.
Parameters
structfile*file
Description
Save open-time permission checking state for later use upon file_permission,and recheck access if anything has changed since inode_permission.
We can check if a file is opened for execution (e.g. execve(2) call), eitherdirectly or indirectly (e.g. ELF’s ld.so) by checking file->f_flags &__FMODE_EXEC .
Return
Returns 0 if permission is granted.
Parameters
structfile*filefile
Description
Check permission before truncating a file, i.e. using ftruncate. Note thattruncation permission may also be checked based on the path, using thepath_truncate hook.
Return
Returns 0 if permission is granted.
- intsecurity_task_alloc(structtask_struct*task,u64clone_flags)¶
Allocate a task’s LSM blob
Parameters
structtask_struct*taskthe task
u64clone_flagsflags indicating what is being shared
Description
Handle allocation of task-related resources.
Return
Returns a zero on success, negative values on failure.
- voidsecurity_task_free(structtask_struct*task)¶
Free a task’s LSM blob and related resources
Parameters
structtask_struct*tasktask
Description
Handle release of task-related resources. Note that this can be called frominterrupt context.
- intsecurity_cred_alloc_blank(structcred*cred,gfp_tgfp)¶
Allocate the min memory to allow cred_transfer
Parameters
structcred*credcredentials
gfp_tgfpgfp flags
Description
Only allocate sufficient memory and attach tocred such thatcred_transfer() will not get ENOMEM.
Return
Returns 0 on success, negative values on failure.
Parameters
structcred*credcredentials
Description
Deallocate and clear the cred->security field in a set of credentials.
- intsecurity_prepare_creds(structcred*new,conststructcred*old,gfp_tgfp)¶
Prepare a new set of credentials
Parameters
structcred*newnew credentials
conststructcred*oldoriginal credentials
gfp_tgfpgfp flags
Description
Prepare a new set of credentials by copying the data from the old set.
Return
Returns 0 on success, negative values on failure.
- voidsecurity_transfer_creds(structcred*new,conststructcred*old)¶
Transfer creds
Parameters
structcred*newtarget credentials
conststructcred*oldoriginal credentials
Description
Transfer data from original creds to new creds.
- intsecurity_kernel_act_as(structcred*new,u32secid)¶
Set the kernel credentials to act as secid
Parameters
structcred*newcredentials
u32secidsecid
Description
Set the credentials for a kernel service to act as (subjective context).The current task must be the one that nominatedsecid.
Return
Returns 0 if successful.
- intsecurity_kernel_create_files_as(structcred*new,structinode*inode)¶
Set file creation context using an inode
Parameters
structcred*newtarget credentials
structinode*inodereference inode
Description
Set the file creation context in a set of credentials to be the same as theobjective context of the specified inode. The current task must be the onethat nominatedinode.
Return
Returns 0 if successful.
- intsecurity_kernel_module_request(char*kmod_name)¶
Check if loading a module is allowed
Parameters
char*kmod_namemodule name
Description
Ability to trigger the kernel to automatically upcall to userspace foruserspace to load a kernel module with the given name.
Return
Returns 0 if successful.
- intsecurity_task_fix_setuid(structcred*new,conststructcred*old,intflags)¶
Update LSM with new user id attributes
Parameters
structcred*newupdated credentials
conststructcred*oldcredentials being replaced
intflagsLSM_SETID_* flag values
Description
Update the module’s state after setting one or more of the user identityattributes of the current process. Theflags parameter indicates which ofthe set*uid system calls invoked this hook. Ifnew is the set ofcredentials that will be installed. Modifications should be made to thisrather than tocurrent->cred.
Return
Returns 0 on success.
- intsecurity_task_fix_setgid(structcred*new,conststructcred*old,intflags)¶
Update LSM with new group id attributes
Parameters
structcred*newupdated credentials
conststructcred*oldcredentials being replaced
intflagsLSM_SETID_* flag value
Description
Update the module’s state after setting one or more of the group identityattributes of the current process. Theflags parameter indicates which ofthe set*gid system calls invoked this hook.new is the set of credentialsthat will be installed. Modifications should be made to this rather than tocurrent->cred.
Return
Returns 0 on success.
- intsecurity_task_fix_setgroups(structcred*new,conststructcred*old)¶
Update LSM with new supplementary groups
Parameters
structcred*newupdated credentials
conststructcred*oldcredentials being replaced
Description
Update the module’s state after setting the supplementary group identityattributes of the current process.new is the set of credentials that willbe installed. Modifications should be made to this rather than tocurrent->cred.
Return
Returns 0 on success.
- intsecurity_task_setpgid(structtask_struct*p,pid_tpgid)¶
Check if setting the pgid is allowed
Parameters
structtask_struct*ptask being modified
pid_tpgidnew pgid
Description
Check permission before setting the process group identifier of the processp topgid.
Return
Returns 0 if permission is granted.
- intsecurity_task_getpgid(structtask_struct*p)¶
Check if getting the pgid is allowed
Parameters
structtask_struct*ptask
Description
Check permission before getting the process group identifier of the processp.
Return
Returns 0 if permission is granted.
- intsecurity_task_getsid(structtask_struct*p)¶
Check if getting the session id is allowed
Parameters
structtask_struct*ptask
Description
Check permission before getting the session identifier of the processp.
Return
Returns 0 if permission is granted.
- intsecurity_task_setnice(structtask_struct*p,intnice)¶
Check if setting a task’s nice value is allowed
Parameters
structtask_struct*ptarget task
intnicenice value
Description
Check permission before setting the nice value ofp tonice.
Return
Returns 0 if permission is granted.
- intsecurity_task_setioprio(structtask_struct*p,intioprio)¶
Check if setting a task’s ioprio is allowed
Parameters
structtask_struct*ptarget task
intioprioioprio value
Description
Check permission before setting the ioprio value ofp toioprio.
Return
Returns 0 if permission is granted.
- intsecurity_task_getioprio(structtask_struct*p)¶
Check if getting a task’s ioprio is allowed
Parameters
structtask_struct*ptask
Description
Check permission before getting the ioprio value ofp.
Return
Returns 0 if permission is granted.
- intsecurity_task_prlimit(conststructcred*cred,conststructcred*tcred,unsignedintflags)¶
Check if get/setting resources limits is allowed
Parameters
conststructcred*credcurrent task credentials
conststructcred*tcredtarget task credentials
unsignedintflagsLSM_PRLIMIT_* flag bits indicating a get/set/both
Description
Check permission before getting and/or setting the resource limits ofanother task.
Return
Returns 0 if permission is granted.
- intsecurity_task_setrlimit(structtask_struct*p,unsignedintresource,structrlimit*new_rlim)¶
Check if setting a new rlimit value is allowed
Parameters
structtask_struct*ptarget task’s group leader
unsignedintresourceresource whose limit is being set
structrlimit*new_rlimnew resource limit
Description
Check permission before setting the resource limits of processp forresource tonew_rlim. The old resource limit values can be examined bydereferencing (p->signal->rlim + resource).
Return
Returns 0 if permission is granted.
- intsecurity_task_setscheduler(structtask_struct*p)¶
Check if setting sched policy/param is allowed
Parameters
structtask_struct*ptarget task
Description
Check permission before setting scheduling policy and/or parameters ofprocessp.
Return
Returns 0 if permission is granted.
- intsecurity_task_getscheduler(structtask_struct*p)¶
Check if getting scheduling info is allowed
Parameters
structtask_struct*ptarget task
Description
Check permission before obtaining scheduling information for processp.
Return
Returns 0 if permission is granted.
- intsecurity_task_movememory(structtask_struct*p)¶
Check if moving memory is allowed
Parameters
structtask_struct*ptask
Description
Check permission before moving memory owned by processp.
Return
Returns 0 if permission is granted.
- intsecurity_task_kill(structtask_struct*p,structkernel_siginfo*info,intsig,conststructcred*cred)¶
Check if sending a signal is allowed
Parameters
structtask_struct*ptarget process
structkernel_siginfo*infosignal information
intsigsignal value
conststructcred*credcredentials of the signal sender, NULL ifcurrent
Description
Check permission before sending signalsig top.info can be NULL, theconstant 1, or a pointer to a kernel_siginfo structure. Ifinfo is 1 orSI_FROMKERNEL(info) is true, then the signal should be viewed as coming fromthe kernel and should typically be permitted. SIGIO signals are handledseparately by the send_sigiotask hook in file_security_ops.
Return
Returns 0 if permission is granted.
- intsecurity_task_prctl(intoption,unsignedlongarg2,unsignedlongarg3,unsignedlongarg4,unsignedlongarg5)¶
Check if a prctl op is allowed
Parameters
intoptionoperation
unsignedlongarg2argument
unsignedlongarg3argument
unsignedlongarg4argument
unsignedlongarg5argument
Description
Check permission before performing a process control operation on thecurrent process.
Return
Return -ENOSYS if no-one wanted to handle this op, any other valueto causeprctl() to return immediately with that value.
- voidsecurity_task_to_inode(structtask_struct*p,structinode*inode)¶
Set the security attributes of a task’s inode
Parameters
structtask_struct*ptask
structinode*inodeinode
Description
Set the security attributes for an inode based on an associated task’ssecurity attributes, e.g. for /proc/pid inodes.
Parameters
conststructcred*credprepared creds
Description
Check permission prior to creating a new user namespace.
Return
Returns 0 if successful, otherwise < 0 error code.
- intsecurity_ipc_permission(structkern_ipc_perm*ipcp,shortflag)¶
Check if sysv ipc access is allowed
Parameters
structkern_ipc_perm*ipcpipc permission structure
shortflagrequested permissions
Description
Check permissions for access to IPC.
Return
Returns 0 if permission is granted.
- voidsecurity_ipc_getlsmprop(structkern_ipc_perm*ipcp,structlsm_prop*prop)¶
Get the sysv ipc object LSM data
Parameters
structkern_ipc_perm*ipcpipc permission structure
structlsm_prop*proppointer to lsm information
Description
Get the lsm information associated with the ipc object.
- intsecurity_msg_msg_alloc(structmsg_msg*msg)¶
Allocate a sysv ipc message LSM blob
Parameters
structmsg_msg*msgmessage structure
Description
Allocate and attach a security structure to the msg->security field. Thesecurity field is initialized to NULL when the structure is first created.
Return
Return 0 if operation was successful and permission is granted.
- voidsecurity_msg_msg_free(structmsg_msg*msg)¶
Free a sysv ipc message LSM blob
Parameters
structmsg_msg*msgmessage structure
Description
Deallocate the security structure for this message.
- intsecurity_msg_queue_alloc(structkern_ipc_perm*msq)¶
Allocate a sysv ipc msg queue LSM blob
Parameters
structkern_ipc_perm*msqsysv ipc permission structure
Description
Allocate and attach a security structure tomsg. The security field isinitialized to NULL when the structure is first created.
Return
Returns 0 if operation was successful and permission is granted.
- voidsecurity_msg_queue_free(structkern_ipc_perm*msq)¶
Free a sysv ipc msg queue LSM blob
Parameters
structkern_ipc_perm*msqsysv ipc permission structure
Description
Deallocate security fieldperm->security for the message queue.
- intsecurity_msg_queue_associate(structkern_ipc_perm*msq,intmsqflg)¶
Check if a msg queue operation is allowed
Parameters
structkern_ipc_perm*msqsysv ipc permission structure
intmsqflgoperation flags
Description
Check permission when a message queue is requested through the msgget systemcall. This hook is only called when returning the message queue identifierfor an existing message queue, not when a new message queue is created.
Return
Return 0 if permission is granted.
- intsecurity_msg_queue_msgctl(structkern_ipc_perm*msq,intcmd)¶
Check if a msg queue operation is allowed
Parameters
structkern_ipc_perm*msqsysv ipc permission structure
intcmdoperation
Description
Check permission when a message control operation specified bycmd is to beperformed on the message queue with permissions.
Return
Returns 0 if permission is granted.
- intsecurity_msg_queue_msgsnd(structkern_ipc_perm*msq,structmsg_msg*msg,intmsqflg)¶
Check if sending a sysv ipc message is allowed
Parameters
structkern_ipc_perm*msqsysv ipc permission structure
structmsg_msg*msgmessage
intmsqflgoperation flags
Description
Check permission before a message,msg, is enqueued on the message queuewith permissions specified inmsq.
Return
Returns 0 if permission is granted.
- intsecurity_msg_queue_msgrcv(structkern_ipc_perm*msq,structmsg_msg*msg,structtask_struct*target,longtype,intmode)¶
Check if receiving a sysv ipc msg is allowed
Parameters
structkern_ipc_perm*msqsysv ipc permission structure
structmsg_msg*msgmessage
structtask_struct*targettarget task
longtypetype of message requested
intmodeoperation flags
Description
Check permission before a message,msg, is removed from the message queue.Thetarget task structure contains a pointer to the process that will bereceiving the message (not equal to the current process when inline receivesare being performed).
Return
Returns 0 if permission is granted.
- intsecurity_shm_alloc(structkern_ipc_perm*shp)¶
Allocate a sysv shm LSM blob
Parameters
structkern_ipc_perm*shpsysv ipc permission structure
Description
Allocate and attach a security structure to theshp security field. Thesecurity field is initialized to NULL when the structure is first created.
Return
Returns 0 if operation was successful and permission is granted.
- voidsecurity_shm_free(structkern_ipc_perm*shp)¶
Free a sysv shm LSM blob
Parameters
structkern_ipc_perm*shpsysv ipc permission structure
Description
Deallocate the security structureperm->security for the memory segment.
- intsecurity_shm_associate(structkern_ipc_perm*shp,intshmflg)¶
Check if a sysv shm operation is allowed
Parameters
structkern_ipc_perm*shpsysv ipc permission structure
intshmflgoperation flags
Description
Check permission when a shared memory region is requested through the shmgetsystem call. This hook is only called when returning the shared memoryregion identifier for an existing region, not when a new shared memoryregion is created.
Return
Returns 0 if permission is granted.
- intsecurity_shm_shmctl(structkern_ipc_perm*shp,intcmd)¶
Check if a sysv shm operation is allowed
Parameters
structkern_ipc_perm*shpsysv ipc permission structure
intcmdoperation
Description
Check permission when a shared memory control operation specified bycmd isto be performed on the shared memory region with permissions inshp.
Return
Return 0 if permission is granted.
- intsecurity_shm_shmat(structkern_ipc_perm*shp,char__user*shmaddr,intshmflg)¶
Check if a sysv shm attach operation is allowed
Parameters
structkern_ipc_perm*shpsysv ipc permission structure
char__user*shmaddraddress of memory region to attach
intshmflgoperation flags
Description
Check permissions prior to allowing the shmat system call to attach theshared memory segment with permissionsshp to the data segment of thecalling process. The attaching address is specified byshmaddr.
Return
Returns 0 if permission is granted.
- intsecurity_sem_alloc(structkern_ipc_perm*sma)¶
Allocate a sysv semaphore LSM blob
Parameters
structkern_ipc_perm*smasysv ipc permission structure
Description
Allocate and attach a security structure to thesma security field. Thesecurity field is initialized to NULL when the structure is first created.
Return
Returns 0 if operation was successful and permission is granted.
- voidsecurity_sem_free(structkern_ipc_perm*sma)¶
Free a sysv semaphore LSM blob
Parameters
structkern_ipc_perm*smasysv ipc permission structure
Description
Deallocate security structuresma->security for the semaphore.
- intsecurity_sem_associate(structkern_ipc_perm*sma,intsemflg)¶
Check if a sysv semaphore operation is allowed
Parameters
structkern_ipc_perm*smasysv ipc permission structure
intsemflgoperation flags
Description
Check permission when a semaphore is requested through the semget systemcall. This hook is only called when returning the semaphore identifier foran existing semaphore, not when a new one must be created.
Return
Returns 0 if permission is granted.
- intsecurity_sem_semctl(structkern_ipc_perm*sma,intcmd)¶
Check if a sysv semaphore operation is allowed
Parameters
structkern_ipc_perm*smasysv ipc permission structure
intcmdoperation
Description
Check permission when a semaphore operation specified bycmd is to beperformed on the semaphore.
Return
Returns 0 if permission is granted.
- intsecurity_sem_semop(structkern_ipc_perm*sma,structsembuf*sops,unsignednsops,intalter)¶
Check if a sysv semaphore operation is allowed
Parameters
structkern_ipc_perm*smasysv ipc permission structure
structsembuf*sopsoperations to perform
unsignednsopsnumber of operations
intalterflag indicating changes will be made
Description
Check permissions before performing operations on members of the semaphoreset. If thealter flag is nonzero, the semaphore set may be modified.
Return
Returns 0 if permission is granted.
- intsecurity_getselfattr(unsignedintattr,structlsm_ctx__user*uctx,u32__user*size,u32flags)¶
Read an LSM attribute of the current process.
Parameters
unsignedintattrwhich attribute to return
structlsm_ctx__user*uctxthe user-space destination for the information, or NULL
u32__user*sizepointer to the size of space available to receive the data
u32flagsspecial handling options. LSM_FLAG_SINGLE indicates that onlyattributes associated with the LSM identified in the passedctx bereported.
Description
A NULL value foructx can be used to get both the number of attributesand the size of the data.
Returns the number of attributes found on success, negative valueon error.size is reset to the total size of the data.Ifsize is insufficient to contain the data -E2BIG is returned.
- intsecurity_setselfattr(unsignedintattr,structlsm_ctx__user*uctx,u32size,u32flags)¶
Set an LSM attribute on the current process.
Parameters
unsignedintattrwhich attribute to set
structlsm_ctx__user*uctxthe user-space source for the information
u32sizethe size of the data
u32flagsreserved for future use, must be 0
Description
Set an LSM attribute for the current process. The LSM, attributeand new value are included inuctx.
Returns 0 on success, -EINVAL if the input is inconsistent, -EFAULTif the user buffer is inaccessible, E2BIG if size is too big, or anLSM specific failure.
- intsecurity_getprocattr(structtask_struct*p,intlsmid,constchar*name,char**value)¶
Read an attribute for a task
Parameters
structtask_struct*pthe task
intlsmidLSM identification
constchar*nameattribute name
char**valueattribute value
Description
Read attributename for taskp and store it intovalue if allowed.
Return
Returns the length ofvalue on success, a negative value otherwise.
- intsecurity_setprocattr(intlsmid,constchar*name,void*value,size_tsize)¶
Set an attribute for a task
Parameters
intlsmidLSM identification
constchar*nameattribute name
void*valueattribute value
size_tsizeattribute value size
Description
Write (set) the current task’s attributename tovalue, sizesize ifallowed.
Return
Returns bytes written on success, a negative value otherwise.
- intsecurity_post_notification(conststructcred*w_cred,conststructcred*cred,structwatch_notification*n)¶
Check if a watch notification can be posted
Parameters
conststructcred*w_credcredentials of the task that set the watch
conststructcred*credcredentials of the task which triggered the watch
structwatch_notification*nthe notification
Description
Check to see if a watch notification can be posted to a particular queue.
Return
Returns 0 if permission is granted.
Parameters
structkey*keythe key to watch
Description
Check to see if a process is allowed to watch for event notifications froma key or keyring.
Return
Returns 0 if permission is granted.
- intsecurity_netlink_send(structsock*sk,structsk_buff*skb)¶
Save info and check if netlink sending is allowed
Parameters
structsock*sksending socket
structsk_buff*skbnetlink message
Description
Save security information for a netlink message so that permission checkingcan be performed when the message is processed. The security informationcan be saved using the eff_cap field of the netlink_skb_parms structure.Also may be used to provide fine grained control over message transmission.
Return
Returns 0 if the information was successfully saved and message isallowed to be transmitted.
- intsecurity_socket_create(intfamily,inttype,intprotocol,intkern)¶
Check if creating a new socket is allowed
Parameters
intfamilyprotocol family
inttypecommunications type
intprotocolrequested protocol
intkernset to 1 if a kernel socket is requested
Description
Check permissions prior to creating a new socket.
Return
Returns 0 if permission is granted.
- intsecurity_socket_post_create(structsocket*sock,intfamily,inttype,intprotocol,intkern)¶
Initialize a newly created socket
Parameters
structsocket*socksocket
intfamilyprotocol family
inttypecommunications type
intprotocolrequested protocol
intkernset to 1 if a kernel socket is requested
Description
This hook allows a module to update or allocate a per-socket securitystructure. Note that the security field was not added directly to the socketstructure, but rather, the socket security information is stored in theassociated inode. Typically, the inode alloc_security hook will allocateand attach security information to SOCK_INODE(sock)->i_security. This hookmay be used to update the SOCK_INODE(sock)->i_security field with additionalinformation that wasn’t available when the inode was allocated.
Return
Returns 0 if permission is granted.
- intsecurity_socket_bind(structsocket*sock,structsockaddr*address,intaddrlen)¶
Check if a socket bind operation is allowed
Parameters
structsocket*socksocket
structsockaddr*addressrequested bind address
intaddrlenlength of address
Description
Check permission before socket protocol layer bind operation is performedand the socketsock is bound to the address specified in theaddressparameter.
Return
Returns 0 if permission is granted.
- intsecurity_socket_connect(structsocket*sock,structsockaddr*address,intaddrlen)¶
Check if a socket connect operation is allowed
Parameters
structsocket*socksocket
structsockaddr*addressaddress of remote connection point
intaddrlenlength of address
Description
Check permission before socket protocol layer connect operation attempts toconnect socketsock to a remote address,address.
Return
Returns 0 if permission is granted.
Parameters
structsocket*socksocket
intbacklogconnection queue size
Description
Check permission before socket protocol layer listen operation.
Return
Returns 0 if permission is granted.
- intsecurity_socket_accept(structsocket*sock,structsocket*newsock)¶
Check if a socket is allowed to accept connections
Parameters
structsocket*socklistening socket
structsocket*newsocknewly creation connection socket
Description
Check permission before accepting a new connection. Note that the newsocket,newsock, has been created and some information copied to it, butthe accept operation has not actually been performed.
Return
Returns 0 if permission is granted.
- intsecurity_socket_sendmsg(structsocket*sock,structmsghdr*msg,intsize)¶
Check if sending a message is allowed
Parameters
structsocket*socksending socket
structmsghdr*msgmessage to send
intsizesize of message
Description
Check permission before transmitting a message to another socket.
Return
Returns 0 if permission is granted.
- intsecurity_socket_recvmsg(structsocket*sock,structmsghdr*msg,intsize,intflags)¶
Check if receiving a message is allowed
Parameters
structsocket*sockreceiving socket
structmsghdr*msgmessage to receive
intsizesize of message
intflagsoperational flags
Description
Check permission before receiving a message from a socket.
Return
Returns 0 if permission is granted.
Parameters
structsocket*socksocket
Description
Check permission before reading the local address (name) of the socketobject.
Return
Returns 0 if permission is granted.
Parameters
structsocket*socksocket
Description
Check permission before the remote address (name) of a socket object.
Return
Returns 0 if permission is granted.
- intsecurity_socket_getsockopt(structsocket*sock,intlevel,intoptname)¶
Check if reading a socket option is allowed
Parameters
structsocket*socksocket
intleveloption’s protocol level
intoptnameoption name
Description
Check permissions before retrieving the options associated with socketsock.
Return
Returns 0 if permission is granted.
- intsecurity_socket_setsockopt(structsocket*sock,intlevel,intoptname)¶
Check if setting a socket option is allowed
Parameters
structsocket*socksocket
intleveloption’s protocol level
intoptnameoption name
Description
Check permissions before setting the options associated with socketsock.
Return
Returns 0 if permission is granted.
Parameters
structsocket*socksocket
inthowflag indicating how sends and receives are handled
Description
Checks permission before all or part of a connection on the socketsock isshut down.
Return
Returns 0 if permission is granted.
- intsecurity_socket_getpeersec_stream(structsocket*sock,sockptr_toptval,sockptr_toptlen,unsignedintlen)¶
Get the remote peer label
Parameters
structsocket*socksocket
sockptr_toptvaldestination buffer
sockptr_toptlensize of peer label copied into the buffer
unsignedintlenmaximum size of the destination buffer
Description
This hook allows the security module to provide peer socket security statefor unix or connected tcp sockets to userspace via getsockopt SO_GETPEERSEC.For tcp sockets this can be meaningful if the socket is associated with anipsec SA.
Return
Returns 0 if all is well, otherwise, typical getsockopt returnvalues.
Parameters
structsock*sockthe sock that needs a blob
gfp_tgfpallocation mode
Description
Allocate the sock blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intsecurity_sk_alloc(structsock*sk,intfamily,gfp_tpriority)¶
Allocate and initialize a sock’s LSM blob
Parameters
structsock*sksock
intfamilyprotocol family
gfp_tprioritygfp flags
Description
Allocate and attach a security structure to the sk->sk_security field, whichis used to copy security attributes between local stream sockets.
Return
Returns 0 on success, error on failure.
Parameters
structsock*sksock
Description
Deallocate security structure.
- voidsecurity_inet_csk_clone(structsock*newsk,conststructrequest_sock*req)¶
Set new sock LSM state based on request_sock
Parameters
structsock*newsknew sock
conststructrequest_sock*reqconnection request_sock
Description
Set that LSM state ofsock using the LSM state fromreq.
- intsecurity_mptcp_add_subflow(structsock*sk,structsock*ssk)¶
Inherit the LSM label from the MPTCP socket
Parameters
structsock*skthe owning MPTCP socket
structsock*sskthe new subflow
Description
Update the labeling for the given MPTCP subflow, to match the one of theowning MPTCP socket. This hook has to be called after the socket creation andinitialization via thesecurity_socket_create() andsecurity_socket_post_create() LSM hooks.
Return
Returns 0 on success or a negative error code on failure.
- intsecurity_xfrm_policy_clone(structxfrm_sec_ctx*old_ctx,structxfrm_sec_ctx**new_ctxp)¶
Clone xfrm policy LSM state
Parameters
structxfrm_sec_ctx*old_ctxxfrm security context
structxfrm_sec_ctx**new_ctxptarget xfrm security context
Description
Allocate a security structure in new_ctxp that contains the information fromthe old_ctx structure.
Return
Return 0 if operation was successful.
- intsecurity_xfrm_policy_delete(structxfrm_sec_ctx*ctx)¶
Check if deleting a xfrm policy is allowed
Parameters
structxfrm_sec_ctx*ctxxfrm security context
Description
Authorize deletion of a SPD entry.
Return
Returns 0 if permission is granted.
- intsecurity_xfrm_state_alloc_acquire(structxfrm_state*x,structxfrm_sec_ctx*polsec,u32secid)¶
Allocate a xfrm state LSM blob
Parameters
structxfrm_state*xxfrm state being added to the SAD
structxfrm_sec_ctx*polsecassociated policy’s security context
u32secidsecid from the flow
Description
Allocate a security structure to the x->security field; the security fieldis initialized to NULL when the xfrm_state is allocated. Set the context tocorrespond to secid.
Return
Returns 0 if operation was successful.
- voidsecurity_xfrm_state_free(structxfrm_state*x)¶
Free a xfrm state
Parameters
structxfrm_state*xxfrm state
Description
Deallocate x->security.
- intsecurity_xfrm_policy_lookup(structxfrm_sec_ctx*ctx,u32fl_secid)¶
Check if using a xfrm policy is allowed
Parameters
structxfrm_sec_ctx*ctxtarget xfrm security context
u32fl_secidflow secid used to authorize access
Description
Check permission when a flow selects a xfrm_policy for processing XFRMs on apacket. The hook is called when selecting either a per-socket policy or ageneric xfrm policy.
Return
Return 0 if permission is granted, -ESRCH otherwise, or -errno onother errors.
- intsecurity_xfrm_state_pol_flow_match(structxfrm_state*x,structxfrm_policy*xp,conststructflowi_common*flic)¶
Check for a xfrm match
Parameters
structxfrm_state*xxfrm state to match
structxfrm_policy*xpxfrm policy to check for a match
conststructflowi_common*flicflow to check for a match.
Description
Checkxp andflic for a match withx.
Return
Returns 1 if there is a match.
Parameters
structsk_buff*skbxfrm packet
u32*secidsecid
Description
Decode the packet inskb and return the security label insecid.
Return
Return 0 if all xfrms used have the same secid.
- intsecurity_key_alloc(structkey*key,conststructcred*cred,unsignedlongflags)¶
Allocate and initialize a kernel key LSM blob
Parameters
structkey*keykey
conststructcred*credcredentials
unsignedlongflagsallocation flags
Description
Permit allocation of a key and assign security data. Note that key does nothave a serial number assigned at this point.
Return
Return 0 if permission is granted, -ve error otherwise.
Parameters
structkey*keykey
Description
Notification of destruction; free security data.
- intsecurity_key_permission(key_ref_tkey_ref,conststructcred*cred,enumkey_need_permneed_perm)¶
Check if a kernel key operation is allowed
Parameters
key_ref_tkey_refkey reference
conststructcred*credcredentials of actor requesting access
enumkey_need_permneed_permrequested permissions
Description
See whether a specific operational right is granted to a process on a key.
Return
Return 0 if permission is granted, -ve error otherwise.
Parameters
structkey*keykey
char**buffersecurity label buffer
Description
Get a textual representation of the security context attached to a key forthe purposes of honouring KEYCTL_GETSECURITY. This function allocates thestorage for the NUL-terminated string and the caller should free it.
Return
Returns the length ofbuffer (including terminating NUL) or -ve ifan error occurs. May also return 0 (and a NULL buffer pointer) ifthere is no security label assigned to the key.
- voidsecurity_key_post_create_or_update(structkey*keyring,structkey*key,constvoid*payload,size_tpayload_len,unsignedlongflags,boolcreate)¶
Notification of key create or update
Parameters
structkey*keyringkeyring to which the key is linked to
structkey*keycreated or updated key
constvoid*payloaddata used to instantiate or update the key
size_tpayload_lenlength of payload
unsignedlongflagskey flags
boolcreateflag indicating whether the key was created or updated
Description
Notify the caller of a key creation or update.
- intsecurity_audit_rule_init(u32field,u32op,char*rulestr,void**lsmrule,gfp_tgfp)¶
Allocate and init an LSM audit rule struct
Parameters
u32fieldaudit action
u32oprule operator
char*rulestrrule context
void**lsmrulereceive buffer for audit rule struct
gfp_tgfpGFP flag used for kmalloc
Description
Allocate and initialize an LSM audit rule structure.
Return
Return 0 iflsmrule has been successfully set, -EINVAL in case ofan invalid rule.
- intsecurity_audit_rule_known(structaudit_krule*krule)¶
Check if an audit rule contains LSM fields
Parameters
structaudit_krule*kruleaudit rule
Description
Specifies whether givenkrule contains any fields related to the currentLSM.
Return
Returns 1 in case of relation found, 0 otherwise.
- voidsecurity_audit_rule_free(void*lsmrule)¶
Free an LSM audit rule struct
Parameters
void*lsmruleaudit rule struct
Description
Deallocate the LSM audit rule structure previously allocated byaudit_rule_init().
- intsecurity_audit_rule_match(structlsm_prop*prop,u32field,u32op,void*lsmrule)¶
Check if a label matches an audit rule
Parameters
structlsm_prop*propsecurity label
u32fieldLSM audit field
u32opmatching operator
void*lsmruleaudit rule
Description
Determine if givensecid matches a rule previously approved bysecurity_audit_rule_known().
Return
Returns 1 if secid matches the rule, 0 if it does not, -ERRNO onfailure.
- intsecurity_bpf(intcmd,unionbpf_attr*attr,unsignedintsize,boolkernel)¶
Check if the bpf syscall operation is allowed
Parameters
intcmdcommand
unionbpf_attr*attrbpf attribute
unsignedintsizesize
boolkernelwhether or not call originated from kernel
Description
Do a initial check for all bpf syscalls after the attribute is copied intothe kernel. The actual security module can implement their own rules tocheck the specific cmd they need.
Return
Returns 0 if permission is granted.
- intsecurity_bpf_map(structbpf_map*map,fmode_tfmode)¶
Check if access to a bpf map is allowed
Parameters
structbpf_map*mapbpf map
fmode_tfmodemode
Description
Do a check when the kernel generates and returns a file descriptor for eBPFmaps.
Return
Returns 0 if permission is granted.
- intsecurity_bpf_prog(structbpf_prog*prog)¶
Check if access to a bpf program is allowed
Parameters
structbpf_prog*progbpf program
Description
Do a check when the kernel generates and returns a file descriptor for eBPFprograms.
Return
Returns 0 if permission is granted.
- intsecurity_bpf_map_create(structbpf_map*map,unionbpf_attr*attr,structbpf_token*token,boolkernel)¶
Check if BPF map creation is allowed
Parameters
structbpf_map*mapBPF map object
unionbpf_attr*attrBPF syscall attributes used to create BPF map
structbpf_token*tokenBPF token used to grant user access
boolkernelwhether or not call originated from kernel
Description
Do a check when the kernel creates a new BPF map. This is also thepoint where LSM blob is allocated for LSMs that need them.
Return
Returns 0 on success, error on failure.
- intsecurity_bpf_prog_load(structbpf_prog*prog,unionbpf_attr*attr,structbpf_token*token,boolkernel)¶
Check if loading of BPF program is allowed
Parameters
structbpf_prog*progBPF program object
unionbpf_attr*attrBPF syscall attributes used to create BPF program
structbpf_token*tokenBPF token used to grant user access to BPF subsystem
boolkernelwhether or not call originated from kernel
Description
Perform an access control check when the kernel loads a BPF program andallocates associated BPF program object. This hook is also responsible forallocating any required LSM state for the BPF program.
Return
Returns 0 on success, error on failure.
- intsecurity_bpf_token_create(structbpf_token*token,unionbpf_attr*attr,conststructpath*path)¶
Check if creating of BPF token is allowed
Parameters
structbpf_token*tokenBPF token object
unionbpf_attr*attrBPF syscall attributes used to create BPF token
conststructpath*pathpath pointing to BPF FS mount point from which BPF token is created
Description
Do a check when the kernel instantiates a new BPF token object from BPF FSinstance. This is also the point where LSM blob can be allocated for LSMs.
Return
Returns 0 on success, error on failure.
- intsecurity_bpf_token_cmd(conststructbpf_token*token,enumbpf_cmdcmd)¶
Check if BPF token is allowed to delegate requested BPF syscall command
Parameters
conststructbpf_token*tokenBPF token object
enumbpf_cmdcmdBPF syscall command requested to be delegated by BPF token
Description
Do a check when the kernel decides whether provided BPF token should allowdelegation of requested BPF syscall command.
Return
Returns 0 on success, error on failure.
- intsecurity_bpf_token_capable(conststructbpf_token*token,intcap)¶
Check if BPF token is allowed to delegate requested BPF-related capability
Parameters
conststructbpf_token*tokenBPF token object
intcapcapabilities requested to be delegated by BPF token
Description
Do a check when the kernel decides whether provided BPF token should allowdelegation of requested BPF-related capabilities.
Return
Returns 0 on success, error on failure.
- voidsecurity_bpf_map_free(structbpf_map*map)¶
Free a bpf map’s LSM blob
Parameters
structbpf_map*mapbpf map
Description
Clean up the security information stored inside bpf map.
- voidsecurity_bpf_prog_free(structbpf_prog*prog)¶
Free a BPF program’s LSM blob
Parameters
structbpf_prog*progBPF program struct
Description
Clean up the security information stored inside BPF program.
- voidsecurity_bpf_token_free(structbpf_token*token)¶
Free a BPF token’s LSM blob
Parameters
structbpf_token*tokenBPF token struct
Description
Clean up the security information stored inside BPF token.
- intsecurity_perf_event_open(inttype)¶
Check if a perf event open is allowed
Parameters
inttypetype of event
Description
Check whether thetype of perf_event_open syscall is allowed.
Return
Returns 0 if permission is granted.
- intsecurity_perf_event_alloc(structperf_event*event)¶
Allocate a perf event LSM blob
Parameters
structperf_event*eventperf event
Description
Allocate and save perf_event security info.
Return
Returns 0 on success, error on failure.
- voidsecurity_perf_event_free(structperf_event*event)¶
Free a perf event LSM blob
Parameters
structperf_event*eventperf event
Description
Release (free) perf_event security info.
- intsecurity_perf_event_read(structperf_event*event)¶
Check if reading a perf event label is allowed
Parameters
structperf_event*eventperf event
Description
Read perf_event security info if allowed.
Return
Returns 0 if permission is granted.
- intsecurity_perf_event_write(structperf_event*event)¶
Check if writing a perf event label is allowed
Parameters
structperf_event*eventperf event
Description
Write perf_event security info if allowed.
Return
Returns 0 if permission is granted.
- intsecurity_uring_override_creds(conststructcred*new)¶
Check if overriding creds is allowed
Parameters
conststructcred*newnew credentials
Description
Check if the current task, executing an io_uring operation, is allowed tooverride it’s credentials withnew.
Return
Returns 0 if permission is granted.
- intsecurity_uring_sqpoll(void)¶
Check if IORING_SETUP_SQPOLL is allowed
Parameters
voidno arguments
Description
Check whether the current task is allowed to spawn a io_uring polling thread(IORING_SETUP_SQPOLL).
Return
Returns 0 if permission is granted.
- intsecurity_uring_cmd(structio_uring_cmd*ioucmd)¶
Check if a io_uring passthrough command is allowed
Parameters
structio_uring_cmd*ioucmdcommand
Description
Check whether the file_operations uring_cmd is allowed to run.
Return
Returns 0 if permission is granted.
- intsecurity_uring_allowed(void)¶
Check if
io_uring_setup()is allowed
Parameters
voidno arguments
Description
Check whether the current task is allowed to callio_uring_setup().
Return
Returns 0 if permission is granted.
- voidsecurity_initramfs_populated(void)¶
Notify LSMs that initramfs has been loaded
Parameters
voidno arguments
Description
Tells the LSMs the initramfs has been unpacked into the rootfs.
- structdentry*securityfs_create_file(constchar*name,umode_tmode,structdentry*parent,void*data,conststructfile_operations*fops)¶
create a file in the securityfs filesystem
Parameters
constchar*namea pointer to a string containing the name of the file to create.
umode_tmodethe permission that the file should have
structdentry*parenta pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter is
NULL, then thefile will be created in the root of the securityfs filesystem.void*dataa pointer to something that the caller will want to get to lateron. The inode.i_private pointer will point to this value onthe open() call.
conststructfile_operations*fopsa pointer to a
structfile_operationsthat should be used forthis file.
Description
This function creates a file in securityfs with the givenname.
This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove() function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).
If securityfs is not enabled in the kernel, the value-ENODEV isreturned.
- structdentry*securityfs_create_dir(constchar*name,structdentry*parent)¶
create a directory in the securityfs filesystem
Parameters
constchar*namea pointer to a string containing the name of the directory tocreate.
structdentry*parenta pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter is
NULL, then thedirectory will be created in the root of the securityfs filesystem.
Description
This function creates a directory in securityfs with the givenname.
This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove() function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).
If securityfs is not enabled in the kernel, the value-ENODEV isreturned.
- structdentry*securityfs_create_symlink(constchar*name,structdentry*parent,constchar*target,conststructinode_operations*iops)¶
create a symlink in the securityfs filesystem
Parameters
constchar*namea pointer to a string containing the name of the symlink tocreate.
structdentry*parenta pointer to the parent dentry for the symlink. This should be adirectory dentry if set. If this parameter is
NULL, then thedirectory will be created in the root of the securityfs filesystem.constchar*targeta pointer to a string containing the name of the symlink’s target.If this parameter is
NULL, then theiops parameter needs to besetup to handle .readlink and .get_link inode_operations.conststructinode_operations*iopsa pointer to the
structinode_operationsto use for the symlink. Ifthis parameter isNULL, then the default simple_symlink_inodeoperations will be used.
Description
This function creates a symlink in securityfs with the givenname.
This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove() function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).
If securityfs is not enabled in the kernel, the value-ENODEV isreturned.
- voidsecurityfs_remove(structdentry*dentry)¶
removes a file or directory from the securityfs filesystem
Parameters
structdentry*dentrya pointer to a the dentry of the file or directory to be removed.
Description
This function removes a file or directory in securityfs that was previouslycreated with a call to another securityfs function (likesecurityfs_create_file() or variants thereof.)
This function is required to be called in order for the file to beremoved. No automatic cleanup of files will happen when a module isremoved; you are responsible here.
AV: when applied to directory it will take all children out; no need to callit for descendents if ancestor is getting killed.
Audit Interfaces¶
- structaudit_buffer*audit_log_start(structaudit_context*ctx,gfp_tgfp_mask,inttype)¶
obtain an audit buffer
Parameters
structaudit_context*ctxaudit_context (may be NULL)
gfp_tgfp_masktype of allocation
inttypeaudit message type
Description
Returns audit_buffer pointer on success or NULL on error.
Obtain an audit buffer. This routine does locking to obtain theaudit buffer, but then no locking is required for calls toaudit_log_*format. If the task (ctx) is a task that is currently in asyscall, then the syscall is marked as auditable and an audit recordwill be written at syscall exit. If there is no associated task, thentask context (ctx) should be NULL.
- voidaudit_log_format(structaudit_buffer*ab,constchar*fmt,...)¶
format a message into the audit buffer.
Parameters
structaudit_buffer*abaudit_buffer
constchar*fmtformat string
...optional parameters matchingfmt string
Description
All the work is done in audit_log_vformat.
- intaudit_log_subj_ctx(structaudit_buffer*ab,structlsm_prop*prop)¶
Add LSM subject information
Parameters
structaudit_buffer*abaudit_buffer
structlsm_prop*propLSM subject properties.
Description
Add a subj= field and, if necessary, a AUDIT_MAC_TASK_CONTEXTS record.
- voidaudit_log_end(structaudit_buffer*ab)¶
end one audit record
Parameters
structaudit_buffer*abthe audit_buffer
Description
We can not do a netlink send inside an irq context because it blocks (lastarg, flags, is not set to MSG_DONTWAIT), so the audit buffer is placed on aqueue and a kthread is scheduled to remove them from the queue outside theirq context. May be called in any context.
- voidaudit_log(structaudit_context*ctx,gfp_tgfp_mask,inttype,constchar*fmt,...)¶
Log an audit record
Parameters
structaudit_context*ctxaudit context
gfp_tgfp_masktype of allocation
inttypeaudit message type
constchar*fmtformat string to use
...variable parameters matching the format string
Description
This is a convenience function that calls audit_log_start,audit_log_vformat, and audit_log_end. It may be calledin any context.
- int__audit_filter_op(structtask_struct*tsk,structaudit_context*ctx,structlist_head*list,structaudit_names*name,unsignedlongop)¶
common filter helper for operations (syscall/uring/etc)
Parameters
structtask_struct*tskassociated task
structaudit_context*ctxaudit context
structlist_head*listaudit filter list
structaudit_names*nameaudit_name (can be NULL)
unsignedlongopcurrent syscall/uring_op
Description
Run the udit filters specified inlist againsttsk usingctx,name, andop, as necessary; the caller is responsible for ensuringthat the call is made while the RCU read lock is held. Thenameparameter can be NULL, but all others must be specified.Returns 1/true if the filter finds a match, 0/false if none are found.
- voidaudit_filter_uring(structtask_struct*tsk,structaudit_context*ctx)¶
apply filters to an io_uring operation
Parameters
structtask_struct*tskassociated task
structaudit_context*ctxaudit context
- voidaudit_reset_context(structaudit_context*ctx)¶
reset a audit_context structure
Parameters
structaudit_context*ctxthe audit_context to reset
Description
All fields in the audit_context will be reset to an initial state, allreferences held by fields will be dropped, and private memory will bereleased. When this function returns the audit_context will be suitablefor reuse, so long as the passed context is not NULL or a dummy context.
- intaudit_alloc(structtask_struct*tsk)¶
allocate an audit context block for a task
Parameters
structtask_struct*tsktask
Description
Filter on the task information and allocate a per-task audit contextif necessary. Doing so turns on system call auditing for thespecified task. This is called from copy_process, so no lock isneeded.
- voidaudit_log_uring(structaudit_context*ctx)¶
generate a AUDIT_URINGOP record
Parameters
structaudit_context*ctxthe audit context
- void__audit_free(structtask_struct*tsk)¶
free a per-task audit context
Parameters
structtask_struct*tsktask whose audit context block to free
Description
Called from copy_process, do_exit, and the io_uring code
- voidaudit_return_fixup(structaudit_context*ctx,intsuccess,longcode)¶
fixup the return codes in the audit_context
Parameters
structaudit_context*ctxthe audit_context
intsuccesstrue/false value to indicate if the operation succeeded or not
longcodeoperation return code
Description
We need to fixup the return code in the audit logs if the actual returncodes are later going to be fixed by the arch specific signal handlers.
- void__audit_uring_entry(u8op)¶
prepare the kernel task’s audit context for io_uring
Parameters
u8opthe io_uring opcode
Description
This is similar toaudit_syscall_entry() but is intended for use by io_uringoperations. This function should only ever be called fromaudit_uring_entry() as we rely on the audit context checking present in thatfunction.
- void__audit_uring_exit(intsuccess,longcode)¶
wrap up the kernel task’s audit context after io_uring
Parameters
intsuccesstrue/false value to indicate if the operation succeeded or not
longcodeoperation return code
Description
This is similar toaudit_syscall_exit() but is intended for use by io_uringoperations. This function should only ever be called fromaudit_uring_exit() as we rely on the audit context checking present in thatfunction.
- void__audit_syscall_entry(intmajor,unsignedlonga1,unsignedlonga2,unsignedlonga3,unsignedlonga4)¶
fill in an audit record at syscall entry
Parameters
intmajormajor syscall type (function)
unsignedlonga1additional syscall register 1
unsignedlonga2additional syscall register 2
unsignedlonga3additional syscall register 3
unsignedlonga4additional syscall register 4
Description
Fill in audit context at syscall entry. This only happens if theaudit context was created when the task was created and the state orfilters demand the audit context be built. If the state from theper-task filter or from the per-syscall filter is AUDIT_STATE_RECORD,then the record will be written at syscall exit time (otherwise, itwill only be written if another part of the kernel requests that itbe written).
- void__audit_syscall_exit(intsuccess,longreturn_code)¶
deallocate audit context after a system call
Parameters
intsuccesssuccess value of the syscall
longreturn_codereturn value of the syscall
Description
Tear down after system call. If the audit context has been marked asauditable (either because of the AUDIT_STATE_RECORD state fromfiltering, or because some other part of the kernel wrote an auditmessage), then write out the syscall information. In call cases,free the names stored fromgetname().
- structfilename*__audit_reusename(__userconstchar*uptr)¶
fill out filename with info from existing entry
Parameters
const__userchar*uptruserland ptr to pathname
Description
Search the audit_names list for the current audit context. If there is anexisting entry with a matching “uptr” then return the filenameassociated with that audit_name. If not, return NULL.
- void__audit_getname(structfilename*name)¶
add a name to the list
Parameters
structfilename*namename to add
Description
Add a name to the list of audit names for this context.Called from fs/namei.c:getname().
- void__audit_inode(structfilename*name,conststructdentry*dentry,unsignedintflags)¶
store the inode and device from a lookup
Parameters
structfilename*namename being audited
conststructdentry*dentrydentry being audited
unsignedintflagsattributes for this particular entry
- intauditsc_get_stamp(structaudit_context*ctx,structaudit_stamp*stamp)¶
get local copies of audit_context values
Parameters
structaudit_context*ctxaudit_context for the task
structaudit_stamp*stamptimestamp to record
Description
Also sets the context as auditable.
- void__audit_mq_open(intoflag,umode_tmode,structmq_attr*attr)¶
record audit data for a POSIX MQ open
Parameters
intoflagopen flag
umode_tmodemode bits
structmq_attr*attrqueue attributes
- void__audit_mq_sendrecv(mqd_tmqdes,size_tmsg_len,unsignedintmsg_prio,conststructtimespec64*abs_timeout)¶
record audit data for a POSIX MQ timed send/receive
Parameters
mqd_tmqdesMQ descriptor
size_tmsg_lenMessage length
unsignedintmsg_prioMessage priority
conststructtimespec64*abs_timeoutMessage timeout in absolute time
- void__audit_mq_notify(mqd_tmqdes,conststructsigevent*notification)¶
record audit data for a POSIX MQ notify
Parameters
mqd_tmqdesMQ descriptor
conststructsigevent*notificationNotification event
- void__audit_mq_getsetattr(mqd_tmqdes,structmq_attr*mqstat)¶
record audit data for a POSIX MQ get/set attribute
Parameters
mqd_tmqdesMQ descriptor
structmq_attr*mqstatMQ flags
- void__audit_ipc_obj(structkern_ipc_perm*ipcp)¶
record audit data for ipc object
Parameters
structkern_ipc_perm*ipcpipc permissions
- void__audit_ipc_set_perm(unsignedlongqbytes,uid_tuid,gid_tgid,umode_tmode)¶
record audit data for new ipc permissions
Parameters
unsignedlongqbytesmsgq bytes
uid_tuidmsgq user id
gid_tgidmsgq group id
umode_tmodemsgq mode (permissions)
Description
Called only afteraudit_ipc_obj().
- int__audit_socketcall(intnargs,unsignedlong*args)¶
record audit data for sys_socketcall
Parameters
intnargsnumber of args, which should not be more than AUDITSC_ARGS.
unsignedlong*argsargs array
- void__audit_fd_pair(intfd1,intfd2)¶
record audit data for pipe and socketpair
Parameters
intfd1the first file descriptor
intfd2the second file descriptor
- int__audit_sockaddr(intlen,void*a)¶
record audit data for sys_bind, sys_connect, sys_sendto
Parameters
intlendata length in user space
void*adata address in kernel space
Description
Returns 0 for success or NULL context or < 0 on error.
- intaudit_signal_info_syscall(structtask_struct*t)¶
record signal info for syscalls
Parameters
structtask_struct*ttask being signaled
Description
If the audit subsystem is being terminated, record the task (pid)and uid that is doing that.
- int__audit_log_bprm_fcaps(structlinux_binprm*bprm,conststructcred*new,conststructcred*old)¶
store information about a loading bprm and relevant fcaps
Parameters
structlinux_binprm*bprmpointer to the bprm being processed
conststructcred*newthe proposed new credentials
conststructcred*oldthe old credentials
Description
Simply check if the proc already has the caps given by the file and if notstore the priv escalation info for later auditing at the end of the syscall
-Eric
- void__audit_log_capset(conststructcred*new,conststructcred*old)¶
store information about the arguments to the capset syscall
Parameters
conststructcred*newthe new credentials
conststructcred*oldthe old (current) credentials
Description
Record the arguments userspace sent to sys_capset for later printing by theaudit system if applicable
- voidaudit_core_dumps(longsignr)¶
record information about processes that end abnormally
Parameters
longsignrsignal value
Description
If a process ends with a core dump, something fishy is going on and weshould record the event for investigation.
- voidaudit_seccomp(unsignedlongsyscall,longsignr,intcode)¶
record information about a seccomp action
Parameters
unsignedlongsyscallsyscall number
longsignrsignal value
intcodethe seccomp action
Description
Record the information associated with a seccomp action. Event filtering forseccomp actions that are not to be logged is done inseccomp_log().Therefore, this function forces auditing independent of the audit_enabledand dummy context state because seccomp actions should be logged even whenaudit is not in use.
- intaudit_rule_change(inttype,intseq,void*data,size_tdatasz)¶
apply all rules to the specified message type
Parameters
inttypeaudit message type
intseqnetlink audit message sequence (serial) number
void*datapayload data
size_tdataszsize of payload data
Parameters
structsk_buff*request_skbskb of request we are replying to (used to target the reply)
intseqnetlink audit message sequence (serial) number
- intparent_len(constchar*path)¶
find the length of the parent portion of a pathname
Parameters
constchar*pathpathname of which to determine length
- intaudit_compare_dname_path(conststructqstr*dname,constchar*path,intparentlen)¶
compare given dentry name with last component in given path. Return of 0 indicates a match.
Parameters
conststructqstr*dnamedentry name that we’re comparing
constchar*pathfull pathname that we’re comparing
intparentlenlength of the parent if known. Passing in AUDIT_NAME_FULLhere indicates that we must compute this value.
Accounting Framework¶
- longsys_acct(constchar__user*name)¶
enable/disable process accounting
Parameters
constchar__user*namefile name for accounting records or NULL to shutdown accounting
Description
sys_acct() is the only system call needed to implement processaccounting. It takes the name of the file where accounting recordsshould be written. If the filename is NULL, accounting will beshutdown.
Return
0 for success or negative errno values for failure.
- voidacct_collect(longexitcode,intgroup_dead)¶
collect accounting information into pacct_struct
Parameters
longexitcodetask exit code
intgroup_deadnot 0, if this thread is the last one in the process.
- voidacct_process(void)¶
handles process accounting for an exiting task
Parameters
voidno arguments
Block Devices¶
Parameters
structbio*biobio to advance
unsignedintnbytesnumber of bytes to complete
Description
This updates bi_sector, bi_size and bi_idx; if the number of bytes tocomplete doesn’t align with a bvec boundary, then bv_len and bv_offset willbe updated on the last bvec as well.
bio will then represent the remaining, uncompleted portion of the io.
- structfolio_iter¶
State for iterating all folios in a bio.
Definition:
struct folio_iter { struct folio *folio; size_t offset; size_t length;};Members
folioThe current folio we’re iterating. NULL after the last folio.
offsetThe byte offset within the current folio.
lengthThe number of bytes in this iteration (will not cross folioboundary).
- bio_for_each_folio_all¶
bio_for_each_folio_all(fi,bio)
Iterate over each folio in a bio.
Parameters
fistructfolio_iterwhich is updated for each folio.biostructbioto iterate over.
- structbio*bio_next_split(structbio*bio,intsectors,gfp_tgfp,structbio_set*bs)¶
get nextsectors from a bio, splitting if necessary
Parameters
structbio*biobio to split
intsectorsnumber of sectors to split from the front ofbio
gfp_tgfpgfp mask
structbio_set*bsbio set to allocate from
Return
a bio representing the nextsectors ofbio - if the bio is smallerthansectors, returns the original bio unchanged.
- unsignedintbio_add_max_vecs(void*kaddr,unsignedintlen)¶
number of bio_vecs needed to add data to a bio
Parameters
void*kaddrkernel virtual address to add
unsignedintlenlength in bytes to add
Description
Calculate how many bio_vecs need to be allocated to add the kernel virtualaddress range in [kaddr:len] in the worse case.
Parameters
structbio*biobio to check
Description
Check ifbio is a zone append operation. Core block layer code and end_iohandlers must use this instead of an open coded REQ_OP_ZONE_APPEND checkbecause the block layer can rewrite REQ_OP_ZONE_APPEND to REQ_OP_WRITE ifit is not natively supported.
- voidblk_queue_flag_set(unsignedintflag,structrequest_queue*q)¶
atomically set a queue flag
Parameters
unsignedintflagflag to be set
structrequest_queue*qrequest queue
- voidblk_queue_flag_clear(unsignedintflag,structrequest_queue*q)¶
atomically clear a queue flag
Parameters
unsignedintflagflag to be cleared
structrequest_queue*qrequest queue
- constchar*blk_op_str(enumreq_opop)¶
Return string XXX in the REQ_OP_XXX.
Parameters
enumreq_opopREQ_OP_XXX.
Description
Centralize block layer function to convert REQ_OP_XXX intostring format. Useful in the debugging and tracing bio or request. Forinvalid REQ_OP_XXX it returns string “UNKNOWN”.
- voidblk_sync_queue(structrequest_queue*q)¶
cancel any pending callbacks on a queue
Parameters
structrequest_queue*qthe queue
Description
The block layer may perform asynchronous callback activityon a queue, such as calling the unplug function after a timeout.A block device may call blk_sync_queue to ensure that anysuch activity is cancelled, thus allowing it to release resourcesthat the callbacks might use. The caller must already have made surethat its ->submit_bio will not re-add plugging prior to callingthis function.
This function does not cancel any asynchronous activity arisingout of elevator or throttling code. That would require
elevator_exit()andblkcg_exit_queue()to be called with queue lock initialized.
- voidblk_set_pm_only(structrequest_queue*q)¶
increment pm_only counter
Parameters
structrequest_queue*qrequest queue pointer
- voidblk_put_queue(structrequest_queue*q)¶
decrement the request_queue refcount
Parameters
structrequest_queue*qthe request_queue structure to decrement the refcount for
Description
Decrements the refcount of the request_queue and free it when the refcountreaches 0.
- boolblk_get_queue(structrequest_queue*q)¶
increment the request_queue refcount
Parameters
structrequest_queue*qthe request_queue structure to increment the refcount for
Description
Increment the refcount of the request_queue kobject.
Context
Any context.
Parameters
structbio*bioThe bio describing the location in memory and on the device.
Description
This is a version ofsubmit_bio() that shall only be used for I/O that isresubmitted to lower level drivers by stacking block drivers. All filesystems and other upper level users of the block layer should usesubmit_bio() instead.
Parameters
structbio*bioThe
structbiowhich describes the I/O
Description
submit_bio() is used to submit I/O requests to block devices. It is passed afully set upstructbio that describes the I/O that needs to be done. Thebio will be sent to the device described by the bi_bdev field.
The success/failure status of the request, along with notification ofcompletion, is delivered asynchronously through the ->bi_end_io() callbackinbio. The bio must NOT be touched by the caller until ->bi_end_io() hasbeen called.
Parameters
structbio*biobio to poll for
structio_comp_batch*iobbatches of IO
unsignedintflagsBLK_POLL_* flags that control the behavior
Description
Poll for completions on queue associated with the bio. Returns number ofcompleted entries found.
Note
the caller must either be the context that submittedbio, orbe in a RCU critical section to prevent freeing ofbio.
Parameters
structbio*biobio to start account for
Description
Returns the start time that should be passed back tobio_end_io_acct().
- intblk_lld_busy(structrequest_queue*q)¶
Check if underlying low-level drivers of a device are busy
Parameters
structrequest_queue*qthe queue of the device being checked
Description
Check if underlying low-level drivers of a device are busy.If the drivers want to export their busy state, they must set ownexporting function using
blk_queue_lld_busy()first.Basically, this function is used only by request stacking driversto stop dispatching requests to underlying devices when underlyingdevices are busy. This behavior helps more I/O merging on the queueof the request stacking driver and prevents I/O throughput regressionon burst I/O load.
Return
0 - Not busy (The request stacking driver should dispatch request)1 - Busy (The request stacking driver should stop dispatching request)
- voidblk_start_plug(structblk_plug*plug)¶
initialize blk_plug and track it inside the task_struct
Parameters
structblk_plug*plugThe
structblk_plugthat needs to be initialized
Description
blk_start_plug()indicates to the block layer an intent by the callerto submit multiple I/O requests in a batch. The block layer may usethis hint to defer submitting I/Os from the caller untilblk_finish_plug()is called. However, the block layer may choose to submit requestsbefore a call toblk_finish_plug()if the number of queued I/OsexceedsBLK_MAX_REQUEST_COUNT, or if the size of the I/O is larger thanBLK_PLUG_FLUSH_SIZE. The queued I/Os may also be submitted early ifthe task schedules (see below).Tracking blk_plug inside the task_struct will help with auto-flushing thepending I/O should the task end up blocking between
blk_start_plug()andblk_finish_plug(). This is important from a performance perspective, butalso ensures that we don’t deadlock. For instance, if the task is blockingfor a memory allocation, memory reclaim could end up wanting to free apage belonging to that request that is currently residing in our privateplug. By flushing the pending I/O when the process goes to sleep, we avoidthis kind of deadlock.
- voidblk_finish_plug(structblk_plug*plug)¶
mark the end of a batch of submitted I/O
Parameters
structblk_plug*plugThe
structblk_plugpassed toblk_start_plug()
Description
Indicate that a batch of I/O submissions is complete. This functionmust be paired with an initial call toblk_start_plug(). The intentis to allow the block layer to optimize I/O submission. See thedocumentation forblk_start_plug() for more information.
- intblk_queue_enter(structrequest_queue*q,blk_mq_req_flags_tflags)¶
try to increase q->q_usage_counter
Parameters
structrequest_queue*qrequest queue pointer
blk_mq_req_flags_tflagsBLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PM
- intblk_rq_map_user_iov(structrequest_queue*q,structrequest*rq,structrq_map_data*map_data,conststructiov_iter*iter,gfp_tgfp_mask)¶
map user data to a request, for passthrough requests
Parameters
structrequest_queue*qrequest queue where request should be inserted
structrequest*rqrequest to map data to
structrq_map_data*map_datapointer to the rq_map_data holding pages (if necessary)
conststructiov_iter*iteriovec iterator
gfp_tgfp_maskmemory allocation flags
Description
Data will be mapped directly for zero copy I/O, if possible. Otherwisea kernel bounce buffer is used.
A matching
blk_rq_unmap_user()must be issued at the end of I/O, whilestill in process context.
Parameters
structbio*biostart of bio list
Description
Unmap a rq previously mapped by
blk_rq_map_user(). The caller mustsupply the original rq->bio from theblk_rq_map_user()return, sincethe I/O completion may have changed rq->bio.
- intblk_rq_map_kern(structrequest*rq,void*kbuf,unsignedintlen,gfp_tgfp_mask)¶
map kernel data to a request, for passthrough requests
Parameters
structrequest*rqrequest to fill
void*kbufthe kernel buffer
unsignedintlenlength of user data
gfp_tgfp_maskmemory allocation flags
Description
Data will be mapped directly if possible. Otherwise a bouncebuffer is used. Can be called multiple times to append multiplebuffers.
- intblk_register_queue(structgendisk*disk)¶
register a block layer queue with sysfs
Parameters
structgendisk*diskDisk of which the request queue should be registered with sysfs.
- voidblk_unregister_queue(structgendisk*disk)¶
counterpart of
blk_register_queue()
Parameters
structgendisk*diskDisk of which the request queue should be unregistered from sysfs.
Note
the caller is responsible for guaranteeing that this function is calledafterblk_register_queue() has finished.
- voidblk_set_stacking_limits(structqueue_limits*lim)¶
set default limits for stacking devices
Parameters
structqueue_limits*limthe queue_limits structure to reset
Description
Prepare queue limits for applying limits from underlying devices usingblk_stack_limits().
- intqueue_limits_commit_update(structrequest_queue*q,structqueue_limits*lim)¶
commit an atomic update of queue limits
Parameters
structrequest_queue*qqueue to update
structqueue_limits*limlimits to apply
Description
Apply the limits inlim that were obtained fromqueue_limits_start_update()and updated by the caller toq. The caller must have frozen the queue orensure that there are no outstanding I/Os by other means.
Returns 0 if successful, else a negative error code.
- intqueue_limits_commit_update_frozen(structrequest_queue*q,structqueue_limits*lim)¶
commit an atomic update of queue limits
Parameters
structrequest_queue*qqueue to update
structqueue_limits*limlimits to apply
Description
Apply the limits inlim that were obtained fromqueue_limits_start_update()and updated with the new values by the caller toq. Freezes the queuebefore the update and unfreezes it after.
Returns 0 if successful, else a negative error code.
- intqueue_limits_set(structrequest_queue*q,structqueue_limits*lim)¶
apply queue limits to queue
Parameters
structrequest_queue*qqueue to update
structqueue_limits*limlimits to apply
Description
Apply the limits inlim that were freshly initialized toq.To update existing limits usequeue_limits_start_update() andqueue_limits_commit_update() instead.
Returns 0 if successful, else a negative error code.
- intblk_stack_limits(structqueue_limits*t,structqueue_limits*b,sector_tstart)¶
adjust queue_limits for stacked devices
Parameters
structqueue_limits*tthe stacking driver limits (top device)
structqueue_limits*bthe underlying queue limits (bottom, component device)
sector_tstartfirst data sector within component device
Description
This function is used by stacking drivers like MD and DM to ensurethat all component devices have compatible block sizes andalignments. The stacking driver must provide a queue_limitsstruct (top) and then iteratively call the stacking function forall component (bottom) devices. The stacking function willattempt to combine the values and ensure proper alignment.
Returns 0 if the top and bottom queue_limits are compatible. Thetop device’s block sizes and alignment offsets may be adjusted toensure alignment with the bottom device. If no compatible sizesand alignments exist, -1 is returned and the resulting topqueue_limits will have the misaligned flag set to indicate thatthe alignment_offset is undefined.
- voidqueue_limits_stack_bdev(structqueue_limits*t,structblock_device*bdev,sector_toffset,constchar*pfx)¶
adjust queue_limits for stacked devices
Parameters
structqueue_limits*tthe stacking driver limits (top device)
structblock_device*bdevthe underlying block device (bottom)
sector_toffsetoffset to beginning of data within component device
constchar*pfxprefix to use for warnings logged
Description
This function is used by stacking drivers like MD and DM to ensurethat all component devices have compatible block sizes andalignments. The stacking driver must provide a queue_limitsstruct (top) and then iteratively call the stacking function forall component (bottom) devices. The stacking function willattempt to combine the values and ensure proper alignment.
- boolqueue_limits_stack_integrity(structqueue_limits*t,structqueue_limits*b)¶
stack integrity profile
Parameters
structqueue_limits*ttarget queue limits
structqueue_limits*bbase queue limits
Description
Check if the integrity profile in theb can be stacked into thetargett. Stacking is possible if either:
does not have any integrity information stacked into it yet
the integrity profile inb is identical to the one int
Ifb can be stacked intot, returntrue. Else returnfalse and clear theintegrity information int.
- voidblk_set_queue_depth(structrequest_queue*q,unsignedintdepth)¶
tell the block layer about the device queue depth
Parameters
structrequest_queue*qthe request queue for the device
unsignedintdepthqueue depth
- intblkdev_issue_flush(structblock_device*bdev)¶
queue a flush
Parameters
structblock_device*bdevblockdev to issue flush for
Description
Issue a flush for the block device in question.
- intblkdev_issue_discard(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask)¶
queue a discard
Parameters
structblock_device*bdevblockdev to issue discard for
sector_tsectorstart sector
sector_tnr_sectsnumber of sectors to discard
gfp_tgfp_maskmemory allocation flags (for bio_alloc)
Description
Issue a discard request for the sectors in question.
- int__blkdev_issue_zeroout(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask,structbio**biop,unsignedflags)¶
generate number of zero filed write bios
Parameters
structblock_device*bdevblockdev to issue
sector_tsectorstart sector
sector_tnr_sectsnumber of sectors to write
gfp_tgfp_maskmemory allocation flags (for bio_alloc)
structbio**bioppointer to anchor bio
unsignedflagscontrols detailed behavior
Description
Zero-fill a block range, either using hardware offload or by explicitlywriting zeroes to the device.
If a device is using logical block provisioning, the underlying space willnot be released if
flagscontains BLKDEV_ZERO_NOUNMAP.If
flagscontains BLKDEV_ZERO_NOFALLBACK, the function will return-EOPNOTSUPP if no explicit hardware offload for zeroing is provided.
- intblkdev_issue_zeroout(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask,unsignedflags)¶
zero-fill a block range
Parameters
structblock_device*bdevblockdev to write
sector_tsectorstart sector
sector_tnr_sectsnumber of sectors to write
gfp_tgfp_maskmemory allocation flags (for bio_alloc)
unsignedflagscontrols detailed behavior
Description
Zero-fill a block range, either using hardware offload or by explicitlywriting zeroes to the device. See
__blkdev_issue_zeroout()for thevalid values forflags.
- intblk_trace_ioctl(structblock_device*bdev,unsignedcmd,char__user*arg)¶
handle the ioctls associated with tracing
Parameters
structblock_device*bdevthe block device
unsignedcmdthe ioctl cmd
char__user*argthe argument data, if any
- voidblk_trace_shutdown(structrequest_queue*q)¶
stop and cleanup trace structures
Parameters
structrequest_queue*qthe request queue associated with the device
- voidblk_add_trace_rq(structrequest*rq,blk_status_terror,unsignedintnr_bytes,u64what,u64cgid)¶
Add a trace for a request oriented action
Parameters
structrequest*rqthe source request
blk_status_terrorreturn status to log
unsignedintnr_bytesnumber of completed bytes
u64whatthe action
u64cgidthe cgroup info
Description
Records an action against a request. Will log the bio offset + size.
- voidblk_add_trace_bio(structrequest_queue*q,structbio*bio,u64what,interror)¶
Add a trace for a bio oriented action
Parameters
structrequest_queue*qqueue the io is for
structbio*biothe source bio
u64whatthe action
interrorerror, if any
Description
Records an action against a bio. Will log the bio offset + size.
- voidblk_add_trace_bio_remap(void*ignore,structbio*bio,dev_tdev,sector_tfrom)¶
Add a trace for a bio-remap operation
Parameters
void*ignoretrace callback data parameter (not used)
structbio*biothe source bio
dev_tdevsource device
sector_tfromsource sector
Description
Called after a bio is remapped to a different device and/or sector.
- voidblk_add_trace_rq_remap(void*ignore,structrequest*rq,dev_tdev,sector_tfrom)¶
Add a trace for a request-remap operation
Parameters
void*ignoretrace callback data parameter (not used)
structrequest*rqthe source request
dev_tdevtarget device
sector_tfromsource sector
Description
Device mapper remaps request to other devices.Add a trace for that action.
Parameters
structdevice*devthe device representing this disk
Description
This function releases all allocated resources of the gendisk.
Drivers which useddevice_add_disk() have a gendisk with a request_queueassigned. Since the request_queue sits on top of the gendisk for thesedrivers we also callblk_put_queue() for them, and we expect therequest_queue refcount to reach 0 at this point, and so the request_queuewill also be freed prior to the disk.
Context
can sleep
- unsignedintbdev_count_inflight(structblock_device*part)¶
get the number of inflight IOs for a block device.
Parameters
structblock_device*partthe block device.
Description
Inflight here means started IO accounting, frombdev_start_io_acct() forbio-based block device, and fromblk_account_io_start() for rq-based blockdevice.
- int__register_blkdev(unsignedintmajor,constchar*name,void(*probe)(dev_tdevt))¶
register a new block device
Parameters
unsignedintmajorthe requested major device number [1..BLKDEV_MAJOR_MAX-1]. Ifmajor = 0, try to allocate any unused major number.
constchar*namethe name of the new block device as a zero terminated string
void(*probe)(dev_tdevt)pre-devtmpfs / pre-udev callback used to create disks when theirpre-created device node is accessed. When a probe call uses
add_disk()and it fails the driver must cleanup resources. Thisinterface may soon be removed.
Description
Thename must be unique within the system.
The return value depends on themajor input parameter:
if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1]then the function returns zero on success, or a negative error code
if any unused major number was requested withmajor = 0 parameterthen the return value is the allocated major number in range[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise
SeeLinux allocated devices (4.x+ version) for the list of allocatedmajor numbers.
Use register_blkdev instead for any new code.
- intadd_disk_fwnode(structdevice*parent,structgendisk*disk,conststructattribute_group**groups,structfwnode_handle*fwnode)¶
add disk information to kernel list with fwnode
Parameters
structdevice*parentparent device for the disk
structgendisk*diskper-device partitioning information
conststructattribute_group**groupsAdditional per-device sysfs groups
structfwnode_handle*fwnodeattached disk fwnode
Description
This function registers the partitioning information indiskwith the kernel. Also attach a fwnode to the disk device.
- intdevice_add_disk(structdevice*parent,structgendisk*disk,conststructattribute_group**groups)¶
add disk information to kernel list
Parameters
structdevice*parentparent device for the disk
structgendisk*diskper-device partitioning information
conststructattribute_group**groupsAdditional per-device sysfs groups
Description
This function registers the partitioning information indiskwith the kernel.
- voidblk_mark_disk_dead(structgendisk*disk)¶
mark a disk as dead
Parameters
structgendisk*diskdisk to mark as dead
Description
Mark as disk as dead (e.g. surprise removed) and don’t accept any new I/Oto this disk.
- voiddel_gendisk(structgendisk*disk)¶
remove the gendisk
Parameters
structgendisk*diskthe
structgendiskto remove
Description
Removes the gendisk and all its associated resources. This deletes thepartitions associated with the gendisk, and unregisters the associatedrequest_queue.
This is the counter to the respectivedevice_add_disk() call.
The final removal of thestructgendisk happens when its refcount reaches 0withput_disk(), which should be called afterdel_gendisk(), ifdevice_add_disk() was used.
Drivers exist which depend on the release of the gendisk to be synchronous,it should not be deferred.
Context
can sleep
- voidinvalidate_disk(structgendisk*disk)¶
invalidate the disk
Parameters
structgendisk*diskthe
structgendiskto invalidate
Description
A helper to invalidates the disk. It will clean the disk’s associatedbuffer/page caches and reset its internal states so that the diskcan be reused by the drivers.
Context
can sleep
- voidput_disk(structgendisk*disk)¶
decrements the gendisk refcount
Parameters
structgendisk*diskthe
structgendiskto decrement the refcount for
Description
This decrements the refcount for thestructgendisk. When this reaches 0we’ll havedisk_release() called.
Note
for blk-mq disk put_disk must be called before freeing the tag_setwhen handling probe errors (that is beforeadd_disk() is called).
Context
Any context, but the last reference must not be dropped fromatomic context.
- voidset_disk_ro(structgendisk*disk,boolread_only)¶
set a gendisk read-only
Parameters
structgendisk*diskgendisk to operate on
boolread_onlytrueto set the disk read-only,falseset the disk read/write
Description
This function is used to indicate whether a given disk device should have itsread-only flag set.set_disk_ro() is typically used by device drivers toindicate whether the underlying physical device is write-protected.
- intbdev_validate_blocksize(structblock_device*bdev,intblock_size)¶
check that this block size is acceptable
Parameters
structblock_device*bdevblockdevice to check
intblock_sizeblock size to check
Description
For block device users that do not use buffer heads or the block devicepage cache, make sure that this block size can be used with the device.
Return
On success zero is returned, negative error code on failure.
- intbdev_freeze(structblock_device*bdev)¶
lock a filesystem and force it into a consistent state
Parameters
structblock_device*bdevblockdevice to lock
Description
If a superblock is found on this device, we take the s_umount semaphoreon it to make sure nobody unmounts until the snapshot creation is done.The reference counter (bd_fsfreeze_count) guarantees that only the lastunfreeze process can unfreeze the frozen filesystem actually when multiplefreeze requests arrive simultaneously. It counts up inbdev_freeze() andcount down inbdev_thaw(). When it becomes 0,thaw_bdev() will unfreezeactually.
Return
On success zero is returned, negative error code on failure.
- intbdev_thaw(structblock_device*bdev)¶
unlock filesystem
Parameters
structblock_device*bdevblockdevice to unlock
Description
Unlocks the filesystem and marks it writeable again afterbdev_freeze().
Return
On success zero is returned, negative error code on failure.
- intbd_prepare_to_claim(structblock_device*bdev,void*holder,conststructblk_holder_ops*hops)¶
claim a block device
Parameters
structblock_device*bdevblock device of interest
void*holderholder trying to claimbdev
conststructblk_holder_ops*hopsholder ops.
Description
Claimbdev. This function fails ifbdev is already claimed by anotherholder and waits if another claiming is in progress. return, the callerhas ownership of bd_claiming and bd_holder[s].
Return
0 ifbdev can be claimed, -EBUSY otherwise.
- voidbd_abort_claiming(structblock_device*bdev,void*holder)¶
abort claiming of a block device
Parameters
structblock_device*bdevblock device of interest
void*holderholder that has claimedbdev
Description
Abort claiming of a block device when the exclusive open failed. This can bealso used when exclusive open is not actually desired and we just neededto block other exclusive openers for a while.
Parameters
structfile*bdev_fileopen block device
Description
Yield claim on the block device and put the file. Ensure that theblock device can be reclaimed before the file is closed which is adeferred operation.
- intlookup_bdev(constchar*pathname,dev_t*dev)¶
Look up a
structblock_deviceby name.
Parameters
constchar*pathnameName of the block device in the filesystem.
dev_t*devPointer to the block device’s dev_t, if found.
Description
Lookup the block device’s dev_t atpathname in the currentnamespace if possible and return it indev.
Context
May sleep.
Return
0 if succeeded, negative errno otherwise.
- voidbdev_mark_dead(structblock_device*bdev,boolsurprise)¶
mark a block device as dead
Parameters
structblock_device*bdevblock device to operate on
boolsurpriseindicate a surprise removal
Description
Tell the file system that this devices or media is dead. Ifsurprise is settotrue the device or media is already gone, if not we are preparing for anorderly removal.
This calls into the file system, which then typicall syncs out all dirty dataand writes back inodes and then invalidates any cached data in the inodes onthe file system. In addition we also invalidate the block device mapping.
Char devices¶
- intregister_chrdev_region(dev_tfrom,unsignedcount,constchar*name)¶
register a range of device numbers
Parameters
dev_tfromthe first in the desired range of device numbers; must includethe major number.
unsignedcountthe number of consecutive device numbers required
constchar*namethe name of the device or driver.
Description
Return value is zero on success, a negative error code on failure.
- intalloc_chrdev_region(dev_t*dev,unsignedbaseminor,unsignedcount,constchar*name)¶
register a range of char device numbers
Parameters
dev_t*devoutput parameter for first assigned number
unsignedbaseminorfirst of the requested range of minor numbers
unsignedcountthe number of minor numbers required
constchar*namethe name of the associated device or driver
Description
Allocates a range of char device numbers. The major number will bechosen dynamically, and returned (along with the first minor number)indev. Returns zero or a negative error code.
- int__register_chrdev(unsignedintmajor,unsignedintbaseminor,unsignedintcount,constchar*name,conststructfile_operations*fops)¶
create and register a cdev occupying a range of minors
Parameters
unsignedintmajormajor device number or 0 for dynamic allocation
unsignedintbaseminorfirst of the requested range of minor numbers
unsignedintcountthe number of minor numbers required
constchar*namename of this range of devices
conststructfile_operations*fopsfile operations associated with this devices
Description
Ifmajor == 0 this functions will dynamically allocate a major and returnits number.
Ifmajor > 0 this function will attempt to reserve a device with the givenmajor number and will return zero on success.
Returns a -ve errno on failure.
The name of this device has nothing to do with the name of the device in/dev. It only helps to keep track of the different owners of devices. Ifyour module name has only one type of devices it’s ok to use e.g. the nameof the module here.
- voidunregister_chrdev_region(dev_tfrom,unsignedcount)¶
unregister a range of device numbers
Parameters
dev_tfromthe first in the range of numbers to unregister
unsignedcountthe number of device numbers to unregister
Description
This function will unregister a range ofcount device numbers,starting withfrom. The caller should normally be the one whoallocated those numbers in the first place...
- void__unregister_chrdev(unsignedintmajor,unsignedintbaseminor,unsignedintcount,constchar*name)¶
unregister and destroy a cdev
Parameters
unsignedintmajormajor device number
unsignedintbaseminorfirst of the range of minor numbers
unsignedintcountthe number of minor numbers this cdev is occupying
constchar*namename of this range of devices
Description
Unregister and destroy the cdev occupying the region described bymajor,baseminor andcount. This function undoes what__register_chrdev() did.
- intcdev_add(structcdev*p,dev_tdev,unsignedcount)¶
add a char device to the system
Parameters
structcdev*pthe cdev structure for the device
dev_tdevthe first device number for which this device is responsible
unsignedcountthe number of consecutive minor numbers corresponding to thisdevice
Description
cdev_add() adds the device represented byp to the system, making itlive immediately. A negative error code is returned on failure.
- voidcdev_set_parent(structcdev*p,structkobject*kobj)¶
set the parent kobject for a char device
Parameters
structcdev*pthe cdev structure
structkobject*kobjthe kobject to take a reference to
Description
cdev_set_parent() sets a parent kobject which will be referencedappropriately so the parent is not freed before the cdev. Thisshould be called before cdev_add.
- intcdev_device_add(structcdev*cdev,structdevice*dev)¶
add a char device and it’s corresponding
structdevice, linkink
Parameters
structcdev*cdevthe cdev structure
structdevice*devthe device structure
Description
cdev_device_add() adds the char device represented bycdev to the system,just as cdev_add does. It then addsdev to the system using device_addThe dev_t for the char device will be taken from thestructdevice whichneeds to be initialized first. This helper function correctly takes areference to the parent device so the parent will not get released untilall references to the cdev are released.
This helper uses dev->devt for the device number. If it is not setit will not add the cdev and it will be equivalent to device_add.
This function should be used whenever thestructcdev and thestructdevice are members of the same structure whose lifetime ismanaged by thestructdevice.
NOTE
Callers must assume that userspace was able to open the cdev andcan call cdev fops callbacks at any time, even if this function fails.
Parameters
structcdev*cdevthe cdev structure
structdevice*devthe device structure
Description
cdev_device_del() is a helper function to call cdev_del and device_del.It should be used whenever cdev_device_add is used.
If dev->devt is not set it will not remove the cdev and will be equivalentto device_del.
NOTE
This guarantees that associated sysfs callbacks are not runningor runnable, however any cdevs already open will remain and their fopswill still be callable even after this function returns.
- voidcdev_del(structcdev*p)¶
remove a cdev from the system
Parameters
structcdev*pthe cdev structure to be removed
Description
cdev_del() removesp from the system, possibly freeing the structureitself.
NOTE
This guarantees that cdev device will no longer be able to beopened, however any cdevs already open will remain and their fops willstill be callable even after cdev_del returns.
- structcdev*cdev_alloc(void)¶
allocate a cdev structure
Parameters
voidno arguments
Description
Allocates and returns a cdev structure, or NULL on failure.
Parameters
structcdev*cdevthe structure to initialize
conststructfile_operations*fopsthe file_operations for this device
Description
Initializescdev, rememberingfops, making it ready to add to thesystem withcdev_add().
Clock Framework¶
The clock framework defines programming interfaces to support softwaremanagement of the system clock tree. This framework is widely used withSystem-On-Chip (SOC) platforms to support power management and variousdevices which may need custom clock rates. Note that these “clocks”don’t relate to timekeeping or real time clocks (RTCs), each of whichhave separate frameworks. Thesestructclkinstances may be used to manage for example a 96 MHz signal that is usedto shift bits into and out of peripherals or busses, or otherwisetrigger synchronous state machine transitions in system hardware.
Power management is supported by explicit software clock gating: unusedclocks are disabled, so the system doesn’t waste power changing thestate of transistors that aren’t in active use. On some systems this maybe backed by hardware clock gating, where clocks are gated without beingdisabled in software. Sections of chips that are powered but not clockedmay be able to retain their last state. This low power state is oftencalled aretention mode. This mode still incurs leakage currents,especially with finer circuit geometries, but for CMOS circuits power ismostly used by clocked state changes.
Power-aware drivers only enable their clocks when the device they manageis in active use. Also, system sleep states often differ according towhich clock domains are active: while a “standby” state may allow wakeupfrom several active domains, a “mem” (suspend-to-RAM) state may requirea more wholesale shutdown of clocks derived from higher speed PLLs andoscillators, limiting the number of possible wakeup event sources. Adriver’s suspend method may need to be aware of system-specific clockconstraints on the target sleep state.
Some platforms support programmable clock generators. These can be usedby external chips of various kinds, such as other CPUs, multimediacodecs, and devices with strict requirements for interface clocking.
- structclk_notifier¶
associate a clk with a notifier
Definition:
struct clk_notifier { struct clk *clk; struct srcu_notifier_head notifier_head; struct list_head node;};Members
clkstructclk* to associate the notifier withnotifier_heada blocking_notifier_head for this clk
nodelinked list pointers
Description
A list ofstructclk_notifier is maintained by the notifier code.An entry is created whenever code registers the first notifier on aparticularclk. Future notifiers on thatclk are added to thenotifier_head.
- structclk_notifier_data¶
rate data to pass to the notifier callback
Definition:
struct clk_notifier_data { struct clk *clk; unsigned long old_rate; unsigned long new_rate;};Members
clkstructclk* being changedold_rateprevious rate of this clk
new_ratenew rate of this clk
Description
For a pre-notifier, old_rate is the clk’s rate before this ratechange, and new_rate is what the rate will be in the future. For apost-notifier, old_rate and new_rate are both set to the clk’scurrent rate (this was done to optimize the implementation).
- structclk_bulk_data¶
Data used for bulk clk operations.
Definition:
struct clk_bulk_data { const char *id; struct clk *clk;};Members
idclock consumer ID
clkstructclk* to store the associated clock
Description
The CLK APIs provide a series ofclk_bulk_() API calls asa convenience to consumers which require multiple clks. Thisstructure is used to manage data for these calls.
- intclk_notifier_register(structclk*clk,structnotifier_block*nb)¶
register a clock rate-change notifier callback
Parameters
structclk*clkclock whose rate we are interested in
structnotifier_block*nbnotifier block with callback function pointer
Description
ProTip: debugging across notifier chains can be frustrating. Make sure thatyour notifier callback function prints a nice big warning in case offailure.
- intclk_notifier_unregister(structclk*clk,structnotifier_block*nb)¶
unregister a clock rate-change notifier callback
Parameters
structclk*clkclock whose rate we are no longer interested in
structnotifier_block*nbnotifier block which will be unregistered
- intdevm_clk_notifier_register(structdevice*dev,structclk*clk,structnotifier_block*nb)¶
register a managed rate-change notifier callback
Parameters
structdevice*devdevice for clock “consumer”
structclk*clkclock whose rate we are interested in
structnotifier_block*nbnotifier block with callback function pointer
Description
Returns 0 on success, -EERROR otherwise
- longclk_get_accuracy(structclk*clk)¶
obtain the clock accuracy in ppb (parts per billion) for a clock source.
Parameters
structclk*clkclock source
Description
This gets the clock source accuracy expressed in ppb.A perfect clock returns 0.
Parameters
structclk*clkclock signal source
intdegreesnumber of degrees the signal is shifted
Description
Shifts the phase of a clock signal by the specified degrees. Returns 0 onsuccess, -EERROR otherwise.
Parameters
structclk*clkclock signal source
Description
Returns the phase shift of a clock node in degrees, otherwise returns-EERROR.
- intclk_set_duty_cycle(structclk*clk,unsignedintnum,unsignedintden)¶
adjust the duty cycle ratio of a clock signal
Parameters
structclk*clkclock signal source
unsignedintnumnumerator of the duty cycle ratio to be applied
unsignedintdendenominator of the duty cycle ratio to be applied
Description
Adjust the duty cycle of a clock signal by the specified ratio. Returns 0 onsuccess, -EERROR otherwise.
- intclk_get_scaled_duty_cycle(structclk*clk,unsignedintscale)¶
return the duty cycle ratio of a clock signal
Parameters
structclk*clkclock signal source
unsignedintscalescaling factor to be applied to represent the ratio as an integer
Description
Returns the duty cycle ratio multiplied by the scale provided, otherwisereturns -EERROR.
- boolclk_is_match(conststructclk*p,conststructclk*q)¶
check if two clk’s point to the same hardware clock
Parameters
conststructclk*pclk compared against q
conststructclk*qclk compared against p
Description
Returns true if the twostructclk pointers both point to the same hardwareclock node. Put differently, returns true ifp andqshare the samestructclk_core object.
Returns false otherwise. Note that two NULL clks are treated as matching.
Parameters
structclk*clkclock source
Description
This function allows drivers to get exclusive control over the rate of aprovider. It prevents any other consumer to execute, even indirectly,opereation which could alter the rate of the provider or cause glitches
If exlusivity is claimed more than once on clock, even by the same driver,the rate effectively gets locked as exclusivity can’t be preempted.
Must not be called from within atomic context.
Returns success (0) or negative errno.
- intdevm_clk_rate_exclusive_get(structdevice*dev,structclk*clk)¶
devm variant of clk_rate_exclusive_get
Parameters
structdevice*devdevice the exclusivity is bound to
structclk*clkclock source
Description
Callsclk_rate_exclusive_get() onclk and registers a devm cleanup handlerondev to callclk_rate_exclusive_put().
Must not be called from within atomic context.
Parameters
structclk*clkclock source
Description
This function allows drivers to release the exclusivity it previously gotfromclk_rate_exclusive_get()
The caller must balance the number ofclk_rate_exclusive_get() andclk_rate_exclusive_put() calls.
Must not be called from within atomic context.
Parameters
structclk*clkclock source
Description
This prepares the clock source for use.
Must not be called from within atomic context.
Parameters
structclk*clkclock source
Description
Returns true ifclk_prepare() implicitly enables the clock, effectivelymakingclk_enable()/clk_disable() no-ops, false otherwise.
This is of interest mainly to the power management code where actuallydisabling the clock also requires unpreparing it to have any materialeffect.
Regardless of the value returned here, the caller must always invokeclk_enable() orclk_prepare_enable() and counterparts for usage countsto be right.
Parameters
structclk*clkclock source
Description
This undoes a previously prepared clock. The caller must balancethe number of prepare and unprepare calls.
Must not be called from within atomic context.
Parameters
structdevice*devdevice for clock “consumer”
constchar*idclock consumer ID
Description
Returns astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)
Drivers must assume that the clock source is not enabled.
clk_get should not be called from within interrupt context.
- intclk_bulk_get(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶
lookup and obtain a number of references to clock producer.
Parameters
structdevice*devdevice for clock “consumer”
intnum_clksthe number of clk_bulk_data
structclk_bulk_data*clksthe clk_bulk_data table of consumer
Description
This helper function allows drivers to get several clk consumers in oneoperation. If any of the clk cannot be acquired then any clksthat were obtained will be freed before returning to the caller.
Returns 0 if all clocks specified in clk_bulk_data table are obtainedsuccessfully, or validIS_ERR() condition containing errno.The implementation usesdev andclk_bulk_data.id to determine theclock consumer, and thereby the clock producer.The clock returned is stored in eachclk_bulk_data.clk field.
Drivers must assume that the clock source is not enabled.
clk_bulk_get should not be called from within interrupt context.
- intclk_bulk_get_all(structdevice*dev,structclk_bulk_data**clks)¶
lookup and obtain all available references to clock producer.
Parameters
structdevice*devdevice for clock “consumer”
structclk_bulk_data**clkspointer to the clk_bulk_data table of consumer
Description
This helper function allows drivers to get all clk consumers in oneoperation. If any of the clk cannot be acquired then any clksthat were obtained will be freed before returning to the caller.
Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.
Drivers must assume that the clock source is not enabled.
clk_bulk_get should not be called from within interrupt context.
- intclk_bulk_get_optional(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶
lookup and obtain a number of references to clock producer
Parameters
structdevice*devdevice for clock “consumer”
intnum_clksthe number of clk_bulk_data
structclk_bulk_data*clksthe clk_bulk_data table of consumer
Description
Behaves the same asclk_bulk_get() except where there is no clock producer.In this case, instead of returning -ENOENT, the function returns 0 andNULL for a clk for which a clock producer could not be determined.
- intdevm_clk_bulk_get(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶
managed get multiple clk consumers
Parameters
structdevice*devdevice for clock “consumer”
intnum_clksthe number of clk_bulk_data
structclk_bulk_data*clksthe clk_bulk_data table of consumer
Description
Return 0 on success, an errno on failure.
This helper function allows drivers to get several clkconsumers in one operation with management, the clks willautomatically be freed when the device is unbound.
- intdevm_clk_bulk_get_optional(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶
managed get multiple optional consumer clocks
Parameters
structdevice*devdevice for clock “consumer”
intnum_clksthe number of clk_bulk_data
structclk_bulk_data*clkspointer to the clk_bulk_data table of consumer
Description
Behaves the same asdevm_clk_bulk_get() except where there is no clockproducer. In this case, instead of returning -ENOENT, the function returnsNULL for given clk. It is assumed all clocks in clk_bulk_data are optional.
Returns 0 if all clocks specified in clk_bulk_data table are obtainedsuccessfully or for any clk there was no clk provider available, otherwisereturns validIS_ERR() condition containing errno.The implementation usesdev andclk_bulk_data.id to determine theclock consumer, and thereby the clock producer.The clock returned is stored in eachclk_bulk_data.clk field.
Drivers must assume that the clock source is not enabled.
clk_bulk_get should not be called from within interrupt context.
- intdevm_clk_bulk_get_all(structdevice*dev,structclk_bulk_data**clks)¶
managed get multiple clk consumers
Parameters
structdevice*devdevice for clock “consumer”
structclk_bulk_data**clkspointer to the clk_bulk_data table of consumer
Description
Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.
This helper function allows drivers to get several clkconsumers in one operation with management, the clks willautomatically be freed when the device is unbound.
- intdevm_clk_bulk_get_all_enabled(structdevice*dev,structclk_bulk_data**clks)¶
Get and enable all clocks of the consumer (managed)
Parameters
structdevice*devdevice for clock “consumer”
structclk_bulk_data**clkspointer to the clk_bulk_data table of consumer
Description
Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.
This helper function allows drivers to get all clocks of theconsumer and enables them in one operation with management.The clks will automatically be disabled and freed when the deviceis unbound.
- structclk*devm_clk_get(structdevice*dev,constchar*id)¶
lookup and obtain a managed reference to a clock producer.
Parameters
structdevice*devdevice for clock “consumer”
constchar*idclock consumer ID
Context
May sleep.
Return
astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)
Description
Drivers must assume that the clock source is neither prepared norenabled.
The clock will automatically be freed when the device is unboundfrom the bus.
Parameters
structdevice*devdevice for clock “consumer”
constchar*idclock consumer ID
Context
May sleep.
Return
astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)
Description
The returned clk (if valid) is prepared. Drivers must however assumethat the clock is not enabled.
The clock will automatically be unprepared and freed when the deviceis unbound from the bus.
- structclk*devm_clk_get_enabled(structdevice*dev,constchar*id)¶
devm_clk_get()+clk_prepare_enable()
Parameters
structdevice*devdevice for clock “consumer”
constchar*idclock consumer ID
Context
May sleep.
Return
astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)
Description
The returned clk (if valid) is prepared and enabled.
The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.
- structclk*devm_clk_get_optional(structdevice*dev,constchar*id)¶
lookup and obtain a managed reference to an optional clock producer.
Parameters
structdevice*devdevice for clock “consumer”
constchar*idclock consumer ID
Context
May sleep.
Return
astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get().
Description
Drivers must assume that the clock source is neither prepared norenabled.
The clock will automatically be freed when the device is unboundfrom the bus.
Parameters
structdevice*devdevice for clock “consumer”
constchar*idclock consumer ID
Context
May sleep.
Return
astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_prepared().
Description
The returned clk (if valid) is prepared. Drivers must howeverassume that the clock is not enabled.
The clock will automatically be unprepared and freed when thedevice is unbound from the bus.
- structclk*devm_clk_get_optional_enabled(structdevice*dev,constchar*id)¶
devm_clk_get_optional()+clk_prepare_enable()
Parameters
structdevice*devdevice for clock “consumer”
constchar*idclock consumer ID
Context
May sleep.
Return
astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_enabled().
Description
The returned clk (if valid) is prepared and enabled.
The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.
- structclk*devm_clk_get_optional_enabled_with_rate(structdevice*dev,constchar*id,unsignedlongrate)¶
devm_clk_get_optional()+clk_set_rate()+clk_prepare_enable()
Parameters
structdevice*devdevice for clock “consumer”
constchar*idclock consumer ID
unsignedlongratenew clock rate
Context
May sleep.
Return
astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_enabled().
Description
The returned clk (if valid) is prepared and enabled and rate was set.
The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.
- structclk*devm_get_clk_from_child(structdevice*dev,structdevice_node*np,constchar*con_id)¶
lookup and obtain a managed reference to a clock producer from child node.
Parameters
structdevice*devdevice for clock “consumer”
structdevice_node*nppointer to clock consumer node
constchar*con_idclock consumer ID
Description
This function parses the clocks, and uses them to look up thestructclk from the registered list of clock providers by usingnp andcon_id
The clock will automatically be freed when the device is unboundfrom the bus.
Parameters
structclk*clkclock source
Description
If the clock can not be enabled/disabled, this should return success.
May be called from atomic contexts.
Returns success (0) or negative errno.
- intclk_bulk_enable(intnum_clks,conststructclk_bulk_data*clks)¶
inform the system when the set of clks should be running.
Parameters
intnum_clksthe number of clk_bulk_data
conststructclk_bulk_data*clksthe clk_bulk_data table of consumer
Description
May be called from atomic contexts.
Returns success (0) or negative errno.
Parameters
structclk*clkclock source
Description
Inform the system that a clock source is no longer required bya driver and may be shut down.
May be called from atomic contexts.
Implementation detail: if the clock source is shared betweenmultiple drivers,clk_enable() calls must be balanced by thesame number ofclk_disable() calls for the clock source to bedisabled.
- voidclk_bulk_disable(intnum_clks,conststructclk_bulk_data*clks)¶
inform the system when the set of clks is no longer required.
Parameters
intnum_clksthe number of clk_bulk_data
conststructclk_bulk_data*clksthe clk_bulk_data table of consumer
Description
Inform the system that a set of clks is no longer required bya driver and may be shut down.
May be called from atomic contexts.
Implementation detail: if the set of clks is shared betweenmultiple drivers,clk_bulk_enable() calls must be balanced by thesame number ofclk_bulk_disable() calls for the clock source to bedisabled.
- unsignedlongclk_get_rate(structclk*clk)¶
obtain the current clock rate (in Hz) for a clock source. This is only valid once the clock source has been enabled.
Parameters
structclk*clkclock source
Parameters
structclk*clkclock source
Note
drivers must ensure that all clk_enable calls made on thisclock source are balanced by clk_disable calls prior to callingthis function.
clk_put should not be called from within interrupt context.
- voidclk_bulk_put(intnum_clks,structclk_bulk_data*clks)¶
“free” the clock source
Parameters
intnum_clksthe number of clk_bulk_data
structclk_bulk_data*clksthe clk_bulk_data table of consumer
Note
drivers must ensure that all clk_bulk_enable calls made on thisclock source are balanced by clk_bulk_disable calls prior to callingthis function.
clk_bulk_put should not be called from within interrupt context.
- voidclk_bulk_put_all(intnum_clks,structclk_bulk_data*clks)¶
“free” all the clock source
Parameters
intnum_clksthe number of clk_bulk_data
structclk_bulk_data*clksthe clk_bulk_data table of consumer
Note
drivers must ensure that all clk_bulk_enable calls made on thisclock source are balanced by clk_bulk_disable calls prior to callingthis function.
clk_bulk_put_all should not be called from within interrupt context.
Parameters
structdevice*devdevice used to acquire the clock
structclk*clkclock source acquired with
devm_clk_get()
Note
drivers must ensure that all clk_enable calls made on thisclock source are balanced by clk_disable calls prior to callingthis function.
clk_put should not be called from within interrupt context.
- longclk_round_rate(structclk*clk,unsignedlongrate)¶
adjust a rate to the exact rate a clock can provide
Parameters
structclk*clkclock source
unsignedlongratedesired clock rate in Hz
Description
This answers the question “if I were to passrate toclk_set_rate(),what clock rate would I end up with?” without changing the hardwarein any way. In other words:
rate = clk_round_rate(clk, r);
and:
clk_set_rate(clk, r);rate = clk_get_rate(clk);
are equivalent except the former does not modify the clock hardwarein any way.
Returns rounded clock rate in Hz, or negative errno.
Parameters
structclk*clkclock source
unsignedlongratedesired clock rate in Hz
Description
Updating the rate starts at the top-most affected clock and thenwalks the tree down to the bottom-most clock that needs updating.
Returns success (0) or negative errno.
- intclk_set_rate_exclusive(structclk*clk,unsignedlongrate)¶
set the clock rate and claim exclusivity over clock source
Parameters
structclk*clkclock source
unsignedlongratedesired clock rate in Hz
Description
This helper function allows drivers to atomically set the rate of a producerand claim exclusivity over the rate control of the producer.
It is essentially a combination ofclk_set_rate() andclk_rate_exclusite_get(). Caller must balance this call with a call toclk_rate_exclusive_put()
Returns success (0) or negative errno.
- boolclk_has_parent(conststructclk*clk,conststructclk*parent)¶
check if a clock is a possible parent for another
Parameters
conststructclk*clkclock source
conststructclk*parentparent clock source
Description
This function can be used in drivers that need to check that a clock can bethe parent of another without actually changing the parent.
Returns true ifparent is a possible parent forclk, false otherwise.
- intclk_set_rate_range(structclk*clk,unsignedlongmin,unsignedlongmax)¶
set a rate range for a clock source
Parameters
structclk*clkclock source
unsignedlongmindesired minimum clock rate in Hz, inclusive
unsignedlongmaxdesired maximum clock rate in Hz, inclusive
Description
Returns success (0) or negative errno.
Parameters
structclk*clkclock source
unsignedlongratedesired minimum clock rate in Hz, inclusive
Description
Returns success (0) or negative errno.
Parameters
structclk*clkclock source
unsignedlongratedesired maximum clock rate in Hz, inclusive
Description
Returns success (0) or negative errno.
Parameters
structclk*clkclock source
structclk*parentparent clock source
Description
Returns success (0) or negative errno.
Parameters
structclk*clkclock source
Description
Returnsstructclk corresponding to parent clock source, orvalidIS_ERR() condition containing errno.
- structclk*clk_get_sys(constchar*dev_id,constchar*con_id)¶
get a clock based upon the device name
Parameters
constchar*dev_iddevice name
constchar*con_idconnection ID
Description
Returns astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev_id andcon_id to determine the clock consumer, andthereby the clock producer. In contrast toclk_get() this functiontakes the device name instead of the device itself for identification.
Drivers must assume that the clock source is not enabled.
clk_get_sys should not be called from within interrupt context.
- intclk_save_context(void)¶
save clock context for poweroff
Parameters
voidno arguments
Description
Saves the context of the clock register for powerstates in which thecontents of the registers will be lost. Occurs deep within the suspendcode so locking is not necessary.
- voidclk_restore_context(void)¶
restore clock context after poweroff
Parameters
voidno arguments
Description
This occurs with all clocks enabled. Occurs deep within the resume codeso locking is not necessary.
Parameters
structclk*clkclock source
Description
Returns success (0) or negative errno.
- structclk*clk_get_optional(structdevice*dev,constchar*id)¶
lookup and obtain a reference to an optional clock producer.
Parameters
structdevice*devdevice for clock “consumer”
constchar*idclock consumer ID
Description
Behaves the same asclk_get() except where there is no clock producer. Inthis case, instead of returning -ENOENT, the function returns NULL.
Synchronization Primitives¶
Read-Copy Update (RCU)¶
- boolsame_state_synchronize_rcu(unsignedlongoldstate1,unsignedlongoldstate2)¶
Are two old-state values identical?
Parameters
unsignedlongoldstate1First old-state value.
unsignedlongoldstate2Second old-state value.
Description
The two old-state values must have been obtained from eitherget_state_synchronize_rcu(),start_poll_synchronize_rcu(), orget_completed_synchronize_rcu(). Returnstrue if the two values areidentical andfalse otherwise. This allows structures whose lifetimesare tracked by old-state values to push these values to a list header,allowing those structures to be slightly smaller.
- boolrcu_trace_implies_rcu_gp(void)¶
does an RCU Tasks Trace grace period imply an RCU grace period?
Parameters
voidno arguments
Description
As an accident of implementation, an RCU Tasks Trace grace period alsoacts as an RCU grace period. However, this could change at any time.Code relying on this accident must call this function to verify thatthis accident is still happening.
You have been warned!
- cond_resched_tasks_rcu_qs¶
cond_resched_tasks_rcu_qs()
Report potential quiescent states to RCU
Description
This macro resembles
cond_resched(), except that it is defined toreport potential quiescent states to RCU-tasks even if thecond_resched()machinery were to be shut off, as some advocate for PREEMPTION kernels.
- rcu_softirq_qs_periodic¶
rcu_softirq_qs_periodic(old_ts)
Report RCU and RCU-Tasks quiescent states
Parameters
old_tsjiffies at start of processing.
Description
This helper is for long-running softirq handlers, such as NAPI threads innetworking. The caller should initialize the variable passed in asold_tsat the beginning of the softirq handler. When invoked frequently, this macrowill invokercu_softirq_qs() every 100 milliseconds thereafter, which willprovide both RCU and RCU-Tasks quiescent states. Note that this macromodifies its old_ts argument.
Because regions of code that have disabled softirq act as RCU read-sidecritical sections, this macro should be invoked with softirq (andpreemption) enabled.
The macro is not needed when CONFIG_PREEMPT_RT is defined. RT kernels wouldhave more chance to invokeschedule() calls and provide necessary quiescentstates. As a contrast, callingcond_resched() only won’t achieve the sameeffect becausecond_resched() does not provide RCU-Tasks quiescent states.
- RCU_LOCKDEP_WARN¶
RCU_LOCKDEP_WARN(c,s)
emit lockdep splat if specified condition is met
Parameters
ccondition to check
sinformative message
Description
This checksdebug_lockdep_rcu_enabled() before checking (c) toprevent early boot splats due to lockdep not yet being initialized,and rechecks it after checking (c) to prevent false-positive splatsdue to races with lockdep being disabled. Seecommit 3066820034b5dd(“rcu: RejectRCU_LOCKDEP_WARN() false positives”) for more detail.
- lockdep_assert_in_rcu_read_lock¶
lockdep_assert_in_rcu_read_lock()
WARN if not protected by
rcu_read_lock()Description
Splats if lockdep is enabled and there is no
rcu_read_lock()in effect.
- lockdep_assert_in_rcu_read_lock_bh¶
lockdep_assert_in_rcu_read_lock_bh()
WARN if not protected by
rcu_read_lock_bh()Description
Splats if lockdep is enabled and there is no
rcu_read_lock_bh()in effect.Note thatlocal_bh_disable()and friends do not suffice here, instead anactualrcu_read_lock_bh()is required.
- lockdep_assert_in_rcu_read_lock_sched¶
lockdep_assert_in_rcu_read_lock_sched()
WARN if not protected by
rcu_read_lock_sched()Description
Splats if lockdep is enabled and there is no
rcu_read_lock_sched()in effect. Note thatpreempt_disable()and friends do not suffice here,instead an actualrcu_read_lock_sched()is required.
- lockdep_assert_in_rcu_reader¶
lockdep_assert_in_rcu_reader()
WARN if not within some type of RCU reader
Description
Splats if lockdep is enabled and there is no RCU reader of anytype in effect. Note that regions of code protected by things likepreempt_disable,
local_bh_disable(), andlocal_irq_disable()all qualifyas RCU readers.Note that this will never trigger in PREEMPT_NONE or PREEMPT_VOLUNTARYkernels that are not also built with PREEMPT_COUNT. But if you havelockdep enabled, you might as well also enable PREEMPT_COUNT.
- unrcu_pointer¶
unrcu_pointer(p)
mark a pointer as not being RCU protected
Parameters
ppointer needing to lose its __rcu property
Description
Convertsp from an __rcu pointer to a __kernel pointer.This allows an __rcu pointer to be used withxchg() and friends.
- RCU_INITIALIZER¶
RCU_INITIALIZER(v)
statically initialize an RCU-protected global variable
Parameters
vThe value to statically initialize with.
- rcu_assign_pointer¶
rcu_assign_pointer(p,v)
assign to RCU-protected pointer
Parameters
ppointer to assign to
vvalue to assign (publish)
Description
Assigns the specified value to the specified RCU-protectedpointer, ensuring that any concurrent RCU readers will seeany prior initialization.
Inserts memory barriers on architectures that require them(which is most of them), and also prevents the compiler fromreordering the code that initializes the structure after the pointerassignment. More importantly, this call documents which pointerswill be dereferenced by RCU read-side code.
In some special cases, you may useRCU_INIT_POINTER() insteadofrcu_assign_pointer().RCU_INIT_POINTER() is a bit faster dueto the fact that it does not constrain either the CPU or the compiler.That said, usingRCU_INIT_POINTER() when you should have usedrcu_assign_pointer() is a very bad thing that results inimpossible-to-diagnose memory corruption. So please be careful.See theRCU_INIT_POINTER() comment header for details.
Note thatrcu_assign_pointer() evaluates each of its arguments onlyonce, appearances notwithstanding. One of the “extra” evaluationsis intypeof() and the other visible only to sparse (__CHECKER__),neither of which actually execute the argument. As with most cppmacros, this execute-arguments-only-once property is important, soplease be careful when making changes torcu_assign_pointer() and theother macros that it invokes.
- rcu_replace_pointer¶
rcu_replace_pointer(rcu_ptr,ptr,c)
replace an RCU pointer, returning its old value
Parameters
rcu_ptrRCU pointer, whose old value is returned
ptrregular pointer
cthe lockdep conditions under which the dereference will take place
Description
Perform a replacement, wherercu_ptr is an RCU-annotatedpointer andc is the lockdep argument that is passed to thercu_dereference_protected() call used to read that pointer. The oldvalue ofrcu_ptr is returned, andrcu_ptr is set toptr.
- rcu_access_pointer¶
rcu_access_pointer(p)
fetch RCU pointer with no dereferencing
Parameters
pThe pointer to read
Description
Return the value of the specified RCU-protected pointer, but omit thelockdep checks for being in an RCU read-side critical section. This isuseful when the value of this pointer is accessed, but the pointer isnot dereferenced, for example, when testing an RCU-protected pointeragainst NULL. Althoughrcu_access_pointer() may also be used in caseswhere update-side locks prevent the value of the pointer from changing,you should instead usercu_dereference_protected() for this use case.Within an RCU read-side critical section, there is little reason tousercu_access_pointer().
It is usually best to test thercu_access_pointer() return valuedirectly in order to avoid accidental dereferences being introducedby later inattentive changes. In other words, assigning thercu_access_pointer() return value to a local variable results in anaccident waiting to happen.
It is also permissible to usercu_access_pointer() when read-sideaccess to the pointer was removed at least one grace period ago, as isthe case in the context of the RCU callback that is freeing up the data,or after asynchronize_rcu() returns. This can be useful when tearingdown multi-linked structures after a grace period has elapsed. However,rcu_dereference_protected() is normally preferred for this use case.
- rcu_dereference_check¶
rcu_dereference_check(p,c)
rcu_dereference with debug checking
Parameters
pThe pointer to read, prior to dereferencing
cThe conditions under which the dereference will take place
Description
Do anrcu_dereference(), but check that the conditions under which thedereference will take place are correct. Typically the conditionsindicate the various locking conditions that should be held at thatpoint. The check should return true if the conditions are satisfied.An implicit check for being in an RCU read-side critical section(rcu_read_lock()) is included.
For example:
bar = rcu_dereference_check(foo->bar, lockdep_is_held(
foo->lock));
could be used to indicate to lockdep that foo->bar may only be dereferencedif eitherrcu_read_lock() is held, or that the lock required to replacethe barstructat foo->bar is held.
Note that the list of conditions may also include indications of when a lockneed not be held, for example during initialisation or destruction of thetarget struct:
- bar = rcu_dereference_check(foo->bar, lockdep_is_held(
foo->lock) ||atomic_read(
foo->usage) == 0);
Inserts memory barriers on architectures that require them(currently only the Alpha), prevents the compiler from refetching(and from merging fetches), and, more importantly, documents exactlywhich pointers are protected by RCU and checks that the pointer isannotated as __rcu.
- rcu_dereference_bh_check¶
rcu_dereference_bh_check(p,c)
rcu_dereference_bh with debug checking
Parameters
pThe pointer to read, prior to dereferencing
cThe conditions under which the dereference will take place
Description
This is the RCU-bh counterpart torcu_dereference_check(). However,please note that starting in v5.0 kernels, vanilla RCU grace periodswait forlocal_bh_disable() regions of code in addition to regions ofcode demarked byrcu_read_lock() andrcu_read_unlock(). This meansthatsynchronize_rcu(), call_rcu, and friends all take not onlyrcu_read_lock() but alsorcu_read_lock_bh() into account.
- rcu_dereference_sched_check¶
rcu_dereference_sched_check(p,c)
rcu_dereference_sched with debug checking
Parameters
pThe pointer to read, prior to dereferencing
cThe conditions under which the dereference will take place
Description
This is the RCU-sched counterpart torcu_dereference_check().However, please note that starting in v5.0 kernels, vanilla RCU graceperiods wait forpreempt_disable() regions of code in addition toregions of code demarked byrcu_read_lock() andrcu_read_unlock().This means thatsynchronize_rcu(), call_rcu, and friends all take notonlyrcu_read_lock() but alsorcu_read_lock_sched() into account.
- rcu_dereference_all_check¶
rcu_dereference_all_check(p,c)
rcu_dereference_all with debug checking
Parameters
pThe pointer to read, prior to dereferencing
cThe conditions under which the dereference will take place
Description
This is similar torcu_dereference_check(), but allows protectionby all forms of vanilla RCU readers, including preemption disabled,bh-disabled, and interrupt-disabled regions of code. Note that “vanillaRCU” excludes SRCU and the various Tasks RCU flavors. Please notethat this macro should not be backported to any Linux-kernel versionpreceding v5.0 due to changes insynchronize_rcu() semantics priorto that version.
- rcu_dereference_protected¶
rcu_dereference_protected(p,c)
fetch RCU pointer when updates prevented
Parameters
pThe pointer to read, prior to dereferencing
cThe conditions under which the dereference will take place
Description
Return the value of the specified RCU-protected pointer, but omittheREAD_ONCE(). This is useful in cases where update-side locksprevent the value of the pointer from changing. Please note that thisprimitive doesnot prevent the compiler from repeating this referenceor combining it with other references, so it should not be used withoutprotection of appropriate locks.
This function is only for update-side use. Using this functionwhen protected only byrcu_read_lock() will result in infrequentbut very ugly failures.
- rcu_dereference¶
rcu_dereference(p)
fetch RCU-protected pointer for dereferencing
Parameters
pThe pointer to read, prior to dereferencing
Description
This is a simple wrapper aroundrcu_dereference_check().
- rcu_dereference_bh¶
rcu_dereference_bh(p)
fetch an RCU-bh-protected pointer for dereferencing
Parameters
pThe pointer to read, prior to dereferencing
Description
Makesrcu_dereference_check() do the dirty work.
- rcu_dereference_sched¶
rcu_dereference_sched(p)
fetch RCU-sched-protected pointer for dereferencing
Parameters
pThe pointer to read, prior to dereferencing
Description
Makesrcu_dereference_check() do the dirty work.
- rcu_dereference_all¶
rcu_dereference_all(p)
fetch RCU-all-protected pointer for dereferencing
Parameters
pThe pointer to read, prior to dereferencing
Description
Makesrcu_dereference_check() do the dirty work.
- rcu_pointer_handoff¶
rcu_pointer_handoff(p)
Hand off a pointer from RCU to other mechanism
Parameters
pThe pointer to hand off
Description
This is simply an identity function, but it documents where a pointeris handed off from RCU to some other synchronization mechanism, forexample, reference counting or locking. In C11, it would map tokill_dependency(). It could be used as follows:
rcu_read_lock();p = rcu_dereference(gp);long_lived = is_long_lived(p);if (long_lived) { if (!atomic_inc_not_zero(p->refcnt)) long_lived = false; else p = rcu_pointer_handoff(p);}rcu_read_unlock();- voidrcu_read_lock(void)¶
mark the beginning of an RCU read-side critical section
Parameters
voidno arguments
Description
Whensynchronize_rcu() is invoked on one CPU while other CPUsare within RCU read-side critical sections, then thesynchronize_rcu() is guaranteed to block until after all the otherCPUs exit their critical sections. Similarly, ifcall_rcu() is invokedon one CPU while other CPUs are within RCU read-side criticalsections, invocation of the corresponding RCU callback is deferreduntil after the all the other CPUs exit their critical sections.
Bothsynchronize_rcu() andcall_rcu() also wait for regions of codewith preemption disabled, including regions of code with interrupts orsoftirqs disabled.
Note, however, that RCU callbacks are permitted to run concurrentlywith new RCU read-side critical sections. One way that this can happenis via the following sequence of events: (1) CPU 0 enters an RCUread-side critical section, (2) CPU 1 invokescall_rcu() to registeran RCU callback, (3) CPU 0 exits the RCU read-side critical section,(4) CPU 2 enters a RCU read-side critical section, (5) the RCUcallback is invoked. This is legal, because the RCU read-side criticalsection that was running concurrently with thecall_rcu() (and whichtherefore might be referencing something that the corresponding RCUcallback would free up) has completed before the correspondingRCU callback is invoked.
RCU read-side critical sections may be nested. Any deferred actionswill be deferred until the outermost RCU read-side critical sectioncompletes.
You can avoid reading and understanding the next paragraph byfollowing this rule: don’t put anything in anrcu_read_lock() RCUread-side critical section that would block in a !PREEMPTION kernel.But if you want the full story, read on!
In non-preemptible RCU implementations (pure TREE_RCU and TINY_RCU),it is illegal to block while in an RCU read-side critical section.In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTIONkernel builds, RCU read-side critical sections may be preempted,but explicit blocking is illegal. Finally, in preemptible RCUimplementations in real-time (with -rt patchset) kernel builds, RCUread-side critical sections may be preempted and they may also block, butonly when acquiring spinlocks that are subject to priority inheritance.
- voidrcu_read_unlock(void)¶
marks the end of an RCU read-side critical section.
Parameters
voidno arguments
Description
In almost all situations,rcu_read_unlock() is immune from deadlock.This deadlock immunity also extends to the scheduler’s runqueueand priority-inheritance spinlocks, courtesy of the quiescent-statedeferral that is carried out whenrcu_read_unlock() is invoked withinterrupts disabled.
Seercu_read_lock() for more information.
- voidrcu_read_lock_bh(void)¶
mark the beginning of an RCU-bh critical section
Parameters
voidno arguments
Description
This is equivalent torcu_read_lock(), but also disables softirqs.Note that anything else that disables softirqs can also serve as an RCUread-side critical section. However, please note that this equivalenceapplies only to v5.0 and later. Before v5.0,rcu_read_lock() andrcu_read_lock_bh() were unrelated.
Note thatrcu_read_lock_bh() and the matchingrcu_read_unlock_bh()must occur in the same context, for example, it is illegal to invokercu_read_unlock_bh() from one task if the matchingrcu_read_lock_bh()was invoked from some other task.
- voidrcu_read_unlock_bh(void)¶
marks the end of a softirq-only RCU critical section
- voidrcu_read_lock_sched(void)¶
mark the beginning of a RCU-sched critical section
Parameters
voidno arguments
Description
This is equivalent torcu_read_lock(), but also disables preemption.Read-side critical sections can also be introduced by anything else thatdisables preemption, includinglocal_irq_disable() and friends. However,please note that the equivalence torcu_read_lock() applies only tov5.0 and later. Before v5.0,rcu_read_lock() andrcu_read_lock_sched()were unrelated.
Note thatrcu_read_lock_sched() and the matchingrcu_read_unlock_sched()must occur in the same context, for example, it is illegal to invokercu_read_unlock_sched() from process context if the matchingrcu_read_lock_sched() was invoked from an NMI handler.
- voidrcu_read_unlock_sched(void)¶
marks the end of a RCU-classic critical section
- RCU_INIT_POINTER¶
RCU_INIT_POINTER(p,v)
initialize an RCU protected pointer
Parameters
pThe pointer to be initialized.
vThe value to initialized the pointer to.
Description
Initialize an RCU-protected pointer in special cases where readersdo not need ordering constraints on the CPU or the compiler. Thesespecial cases are:
This use of
RCU_INIT_POINTER()is NULLing out the pointerorThe caller has taken whatever steps are required to preventRCU readers from concurrently accessing this pointeror
The referenced data structure has already been exposed toreaders either at compile time or via
rcu_assign_pointer()andYou have not madeany reader-visible changes tothis structure since thenor
It is OK for readers accessing this structure from itsnew location to see the old state of the structure. (Forexample, the changes were to statistical counters or toother state where exact synchronization is not required.)
Failure to follow these rules governing use ofRCU_INIT_POINTER() willresult in impossible-to-diagnose memory corruption. As in the structureswill look OK in crash dumps, but any concurrent RCU readers mightsee pre-initialized values of the referenced data structure. Soplease be very careful how you useRCU_INIT_POINTER()!!!
If you are creating an RCU-protected linked structure that is accessedby a single external-to-structure RCU-protected pointer, then you mayuseRCU_INIT_POINTER() to initialize the internal RCU-protectedpointers, but you must usercu_assign_pointer() to initialize theexternal-to-structure pointerafter you have completely initializedthe reader-accessible portions of the linked structure.
Note that unlikercu_assign_pointer(),RCU_INIT_POINTER() provides noordering guarantees for either the CPU or the compiler.
- RCU_POINTER_INITIALIZER¶
RCU_POINTER_INITIALIZER(p,v)
statically initialize an RCU protected pointer
Parameters
pThe pointer to be initialized.
vThe value to initialized the pointer to.
Description
GCC-style initialization for an RCU-protected pointer in a structure field.
- kfree_rcu¶
kfree_rcu(ptr,rhf)
kfree an object after a grace period.
Parameters
ptrpointer to kfree for double-argument invocations.
rhfthe name of the
structrcu_headwithin the type ofptr.
Description
Many rcu callbacks functions just callkfree() on the base structure.These functions are trivial, but their size adds up, and furthermorewhen they are used in a kernel module, that module must invoke thehigh-latencyrcu_barrier() function at module-unload time.
Thekfree_rcu() function handles this issue. In order to have a universalcallback function handling different offsets of rcu_head, the callback needsto determine the starting address of the freed object, which can be a largekmalloc or vmalloc allocation. To allow simply aligning the pointer down topage boundary for those, only offsets up to 4095 bytes can be accommodated.If the offset is larger than 4095 bytes, a compile-time error willbe generated inkvfree_rcu_arg_2(). If this error is triggered, you caneither fall back to use ofcall_rcu() or rearrange the structure toposition the rcu_head structure into the first 4096 bytes.
The object to be freed can be allocated either bykmalloc() orkmem_cache_alloc().
Note that the allowable offset might decrease in the future.
The BUILD_BUG_ON check must not involve any function calls, hence thechecks are done in macros here.
- kfree_rcu_mightsleep¶
kfree_rcu_mightsleep(ptr)
kfree an object after a grace period.
Parameters
ptrpointer to kfree for single-argument invocations.
Description
When it comes to head-less variant, only one argumentis passed and that is just a pointer which has to befreed after a grace period. Therefore the semantic is
kfree_rcu_mightsleep(ptr);
whereptr is the pointer to be freed bykvfree().
Please note, head-less way of freeing is permitted touse from a context that has to followmight_sleep()annotation. Otherwise, please switch and embed thercu_head structure within the type ofptr.
- voidrcu_head_init(structrcu_head*rhp)¶
Initialize rcu_head for
rcu_head_after_call_rcu()
Parameters
structrcu_head*rhpThe rcu_head structure to initialize.
Description
If you intend to invokercu_head_after_call_rcu() to test whether agiven rcu_head structure has already been passed tocall_rcu(), thenyou must also invoke thisrcu_head_init() function on it just afterallocating that structure. Calls to this function must not race withcalls tocall_rcu(),rcu_head_after_call_rcu(), or callback invocation.
- boolrcu_head_after_call_rcu(structrcu_head*rhp,rcu_callback_tf)¶
Has this rcu_head been passed to
call_rcu()?
Parameters
structrcu_head*rhpThe rcu_head structure to test.
rcu_callback_tfThe function passed to
call_rcu()along withrhp.
Description
Returnstrue if therhp has been passed tocall_rcu() withfunc,andfalse otherwise. Emits a warning in any other case, includingthe case whererhp has already been invoked after a grace period.Calls to this function must not race with callback invocation. One wayto avoid such races is to enclose the call torcu_head_after_call_rcu()in an RCU read-side critical section that includes a read-side fetchof the pointer to the structure containingrhp.
- voidrcu_softirq_qs(void)¶
Provide a set of RCU quiescent states in softirq processing
Parameters
voidno arguments
Description
Mark a quiescent state for RCU, Tasks RCU, and Tasks Trace RCU.This is a special-purpose function to be used in the softirqinfrastructure and perhaps the occasional long-running softirqhandler.
Note that from RCU’s viewpoint, a call torcu_softirq_qs() isequivalent to momentarily completely enabling preemption. Forexample, given this code:
local_bh_disable();do_something();rcu_softirq_qs(); // Ado_something_else();local_bh_enable(); // B
A call tosynchronize_rcu() that began concurrently with thecall todo_something() would be guaranteed to wait only untilexecution reached statement A. Without thatrcu_softirq_qs(),that samesynchronize_rcu() would instead be guaranteed to waituntil execution reached statement B.
- boolrcu_watching_snap_stopped_since(structrcu_data*rdp,intsnap)¶
Has RCU stopped watching a given CPU since the specifiedsnap?
Parameters
structrcu_data*rdpThe rcu_data corresponding to the CPU for which to check EQS.
intsnaprcu_watching snapshot taken when the CPU wasn’t in an EQS.
Description
Returns true if the CPU corresponding tordp has spent some time in anextended quiescent state sincesnap. Note that this doesn’t check if it/still/ is in an EQS, just that it went through one sincesnap.
This is meant to be used in a loop waiting for a CPU to go through an EQS.
- intrcu_is_cpu_rrupt_from_idle(void)¶
see if ‘interrupted’ from idle
Parameters
voidno arguments
Description
If the current CPU is idle and running at a first-level (not nested)interrupt, or directly, from idle, return true.
The caller must have at least disabled IRQs.
- voidrcu_irq_exit_check_preempt(void)¶
Validate that scheduling is possible
Parameters
voidno arguments
- void__rcu_irq_enter_check_tick(void)¶
Enable scheduler tick on CPU if RCU needs it.
Parameters
voidno arguments
Description
The scheduler tick is not normally enabled when CPUs enter the kernelfrom nohz_full userspace execution. After all, nohz_full userspaceexecution is an RCU quiescent state and the time executing in the kernelis quite short. Except of course when it isn’t. And it is not hard tocause a large system to spend tens of seconds or even minutes loopingin the kernel, which can cause a number of problems, include RCU CPUstall warnings.
Therefore, if a nohz_full CPU fails to report a quiescent statein a timely manner, the RCU grace-period kthread sets that CPU’s->rcu_urgent_qs flag with the expectation that the next interrupt orexception will invoke this function, which will turn on the schedulertick, which will enable RCU to detect that CPU’s quiescent states,for example, due tocond_resched() calls in CONFIG_PREEMPT=n kernels.The tick will be disabled once a quiescent state is reported forthis CPU.
Of course, in carefully tuned systems, there might never be aninterrupt or exception. In that case, the RCU grace-period kthreadwill eventually cause one to happen. However, in less carefullycontrolled environments, this function allows RCU to get what itneeds without creating otherwise useless interruptions.
- notraceboolrcu_is_watching(void)¶
RCU read-side critical sections permitted on current CPU?
Parameters
voidno arguments
Description
Returntrue if RCU is watching the running CPU andfalse otherwise.Antrue return means that this CPU can safely enter RCU read-sidecritical sections.
Although calls torcu_is_watching() from most parts of the kernelwill returntrue, there are important exceptions. For example, if thecurrent CPU is deep within its idle loop, in kernel entry/exit code,or offline,rcu_is_watching() will returnfalse.
Make notrace because it can be called by the internal functions offtrace, and making this notrace removes unnecessary recursion calls.
- voidrcu_set_gpwrap_lag(unsignedlonglag_gps)¶
Set RCU GP sequence overflow lag value.
Parameters
unsignedlonglag_gpsSet overflow lag to this many grace period worth of counterswhich is used by rcutorture to quickly force a gpwrap situation.lag_gps = 0 means we reset it back to the boot-time value.
- voidcall_rcu_hurry(structrcu_head*head,rcu_callback_tfunc)¶
Queue RCU callback for invocation after grace period, and flush all lazy callbacks (including the new one) to the main ->cblist while doing so.
Parameters
structrcu_head*headstructure to be used for queueing the RCU updates.
rcu_callback_tfuncactual callback function to be invoked after the grace period
Description
The callback function will be invoked some time after a full graceperiod elapses, in other words after all pre-existing RCU read-sidecritical sections have completed.
Use this API instead ofcall_rcu() if you don’t want the callback to bedelayed for very long periods of time, which can happen on systems withoutmemory pressure and on systems which are lightly loaded or mostly idle.This function will cause callbacks to be invoked sooner than later at theexpense of extra power. Other than that, this function is identical to, andreusescall_rcu()’s logic. Refer tocall_rcu() for more details about memoryordering and other functionality.
- voidcall_rcu(structrcu_head*head,rcu_callback_tfunc)¶
Queue an RCU callback for invocation after a grace period. By default the callbacks are ‘lazy’ and are kept hidden from the main ->cblist to prevent starting of grace periods too soon. If you desire grace periods to start very soon, use
call_rcu_hurry().
Parameters
structrcu_head*headstructure to be used for queueing the RCU updates.
rcu_callback_tfuncactual callback function to be invoked after the grace period
Description
The callback function will be invoked some time after a full graceperiod elapses, in other words after all pre-existing RCU read-sidecritical sections have completed. However, the callback functionmight well execute concurrently with RCU read-side critical sectionsthat started aftercall_rcu() was invoked.
It is perfectly legal to repost an RCU callback, potentially witha different callback function, from within its callback function.The specified function will be invoked after another full grace periodhas elapsed. This use case is similar in form to the common practiceof reposting a timer from within its own handler.
RCU read-side critical sections are delimited byrcu_read_lock()andrcu_read_unlock(), and may be nested. In addition, but only inv5.0 and later, regions of code across which interrupts, preemption,or softirqs have been disabled also serve as RCU read-side criticalsections. This includes hardware interrupt handlers, softirq handlers,and NMI handlers.
Note that all CPUs must agree that the grace period extended beyondall pre-existing RCU read-side critical section. On systems with morethan one CPU, this means that when “func()” is invoked, each CPU isguaranteed to have executed a full memory barrier since the end of itslast RCU read-side critical section whose beginning preceded the calltocall_rcu(). It also means that each CPU executing an RCU read-sidecritical section that continues beyond the start of “func()” must haveexecuted a memory barrier after thecall_rcu() but before the beginningof that RCU read-side critical section. Note that these guaranteesinclude CPUs that are offline, idle, or executing in user mode, aswell as CPUs that are executing in the kernel.
Furthermore, if CPU A invokedcall_rcu() and CPU B invoked theresulting RCU callback function “func()”, then both CPU A and CPU B areguaranteed to execute a full memory barrier during the time intervalbetween the call tocall_rcu() and the invocation of “func()” -- evenif CPU A and CPU B are the same CPU (but again only if the system hasmore than one CPU).
Implementation of these memory-ordering guarantees is described here:A Tour Through TREE_RCU’s Grace-Period Memory Ordering.
Specific tocall_rcu() (as opposed to the other call_rcu*() functions),in kernels built with CONFIG_RCU_LAZY=y,call_rcu() might delay for manyseconds before starting the grace period needed by the correspondingcallback. This delay can significantly improve energy-efficiencyon low-utilization battery-powered devices. To avoid this delay,in latency-sensitive kernel code, usecall_rcu_hurry().
- voidsynchronize_rcu(void)¶
wait until a grace period has elapsed.
Parameters
voidno arguments
Description
Control will return to the caller some time after a full graceperiod has elapsed, in other words after all currently executing RCUread-side critical sections have completed. Note, however, thatupon return fromsynchronize_rcu(), the caller might well be executingconcurrently with new RCU read-side critical sections that began whilesynchronize_rcu() was waiting.
RCU read-side critical sections are delimited byrcu_read_lock()andrcu_read_unlock(), and may be nested. In addition, but only inv5.0 and later, regions of code across which interrupts, preemption,or softirqs have been disabled also serve as RCU read-side criticalsections. This includes hardware interrupt handlers, softirq handlers,and NMI handlers.
Note that this guarantee implies further memory-ordering guarantees.On systems with more than one CPU, whensynchronize_rcu() returns,each CPU is guaranteed to have executed a full memory barrier sincethe end of its last RCU read-side critical section whose beginningpreceded the call tosynchronize_rcu(). In addition, each CPU havingan RCU read-side critical section that extends beyond the return fromsynchronize_rcu() is guaranteed to have executed a full memory barrierafter the beginning ofsynchronize_rcu() and before the beginning ofthat RCU read-side critical section. Note that these guarantees includeCPUs that are offline, idle, or executing in user mode, as well as CPUsthat are executing in the kernel.
Furthermore, if CPU A invokedsynchronize_rcu(), which returnedto its caller on CPU B, then both CPU A and CPU B are guaranteedto have executed a full memory barrier during the execution ofsynchronize_rcu() -- even if CPU A and CPU B are the same CPU (butagain only if the system has more than one CPU).
Implementation of these memory-ordering guarantees is described here:A Tour Through TREE_RCU’s Grace-Period Memory Ordering.
- voidget_completed_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Return a full pre-completed polled state cookie
Parameters
structrcu_gp_oldstate*rgospPlace to put state cookie
Description
Stores intorgosp a value that will always be treated by functionslikepoll_state_synchronize_rcu_full() as a cookie whose grace periodhas already completed.
- unsignedlongget_state_synchronize_rcu(void)¶
Snapshot current RCU state
Parameters
voidno arguments
Description
Returns a cookie that is used by a later call tocond_synchronize_rcu()orpoll_state_synchronize_rcu() to determine whether or not a fullgrace period has elapsed in the meantime.
- voidget_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Snapshot RCU state, both normal and expedited
Parameters
structrcu_gp_oldstate*rgosplocation to place combined normal/expedited grace-period state
Description
Places the normal and expedited grace-period states inrgosp. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()orpoll_state_synchronize_rcu_full() to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.The rcu_gp_oldstate structure takes up twice the memory of an unsignedlong, but is guaranteed to see all grace periods. In contrast, thecombined state occupies less memory, but can sometimes fail to takegrace periods into account.
This does not guarantee that the needed grace period will actuallystart.
- unsignedlongstart_poll_synchronize_rcu(void)¶
Snapshot and start RCU grace period
Parameters
voidno arguments
Description
Returns a cookie that is used by a later call tocond_synchronize_rcu()orpoll_state_synchronize_rcu() to determine whether or not a fullgrace period has elapsed in the meantime. If the needed grace periodis not already slated to start, notifies RCU core of the need for thatgrace period.
- voidstart_poll_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Take a full snapshot and start RCU grace period
Parameters
structrcu_gp_oldstate*rgospvalue from
get_state_synchronize_rcu_full()orstart_poll_synchronize_rcu_full()
Description
Places the normal and expedited grace-period states in*rgos. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()orpoll_state_synchronize_rcu_full() to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.If the needed grace period is not already slated to start, notifiesRCU core of the need for that grace period.
- boolpoll_state_synchronize_rcu(unsignedlongoldstate)¶
Has the specified RCU grace period completed?
Parameters
unsignedlongoldstatevalue from
get_state_synchronize_rcu()orstart_poll_synchronize_rcu()
Description
If a full RCU grace period has elapsed since the earlier call fromwhicholdstate was obtained, returntrue, otherwise returnfalse.Iffalse is returned, it is the caller’s responsibility to invoke thisfunction later on until it does returntrue. Alternatively, the callercan explicitly wait for a grace period, for example, by passingoldstateto eithercond_synchronize_rcu() orcond_synchronize_rcu_expedited()on the one hand or by directly invoking eithersynchronize_rcu() orsynchronize_rcu_expedited() on the other.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than a billion grace periods (and way more on a 64-bit system!).Those needing to keep old state values for very long time periods(many hours even on 32-bit systems) should check them occasionally andeither refresh them or set a flag indicating that the grace period hascompleted. Alternatively, they can useget_completed_synchronize_rcu()to get a guaranteed-completed grace-period state.
In addition, because oldstate compresses the grace-period state forboth normal and expedited grace periods into a single unsigned long,it can miss a grace period whensynchronize_rcu() runs concurrentlywithsynchronize_rcu_expedited(). If this is unacceptable, pleaseinstead use the_full() variant of these polling APIs.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedoldstate, and that returned at the endof this function.
- boolpoll_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Has the specified RCU grace period completed?
Parameters
structrcu_gp_oldstate*rgospvalue from
get_state_synchronize_rcu_full()orstart_poll_synchronize_rcu_full()
Description
If a full RCU grace period has elapsed since the earlier call fromwhichrgosp was obtained, return **true*, otherwise returnfalse.Iffalse is returned, it is the caller’s responsibility to invoke thisfunction later on until it does returntrue. Alternatively, the callercan explicitly wait for a grace period, for example, by passingrgosptocond_synchronize_rcu() or by directly invokingsynchronize_rcu().
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waitedfor more than a billion grace periods (and way more on a 64-bitsystem!). Those needing to keep rcu_gp_oldstate values for verylong time periods (many hours even on 32-bit systems) should checkthem occasionally and either refresh them or set a flag indicatingthat the grace period has completed. Alternatively, they can useget_completed_synchronize_rcu_full() to get a guaranteed-completedgrace-period state.
This function provides the same memory-ordering guarantees that wouldbe provided by asynchronize_rcu() that was invoked at the call tothe function that providedrgosp, and that returned at the end of thisfunction. And this guarantee requires that the root rcu_node structure’s->gp_seq field be checked instead of that of the rcu_state structure.The problem is that the just-ending grace-period’s callbacks can beinvoked between the time that the root rcu_node structure’s ->gp_seqfield is updated and the time that the rcu_state structure’s ->gp_seqfield is updated. Therefore, if a singlesynchronize_rcu() is tocause a subsequentpoll_state_synchronize_rcu_full() to returntrue,then the root rcu_node structure is the one that needs to be polled.
- voidcond_synchronize_rcu(unsignedlongoldstate)¶
Conditionally wait for an RCU grace period
Parameters
unsignedlongoldstatevalue from
get_state_synchronize_rcu(),start_poll_synchronize_rcu(), orstart_poll_synchronize_rcu_expedited()
Description
If a full RCU grace period has elapsed since the earlier call toget_state_synchronize_rcu() orstart_poll_synchronize_rcu(), just return.Otherwise, invokesynchronize_rcu() to wait for a full grace period.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedoldstate and that returned at the endof this function.
- voidcond_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Conditionally wait for an RCU grace period
Parameters
structrcu_gp_oldstate*rgospvalue from
get_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(), orstart_poll_synchronize_rcu_expedited_full()
Description
If a full RCU grace period has elapsed since the call toget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(),orstart_poll_synchronize_rcu_expedited_full() from whichrgosp wasobtained, just return. Otherwise, invokesynchronize_rcu() to waitfor a full grace period.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedrgosp and that returned at the end ofthis function.
- voidrcu_barrier(void)¶
Wait until all in-flight
call_rcu()callbacks complete.
Parameters
voidno arguments
Description
Note that this primitive does not necessarily wait for an RCU grace periodto complete. For example, if there are no RCU callbacks queued anywherein the system, thenrcu_barrier() is within its rights to returnimmediately, without waiting for anything, much less an RCU grace period.In fact,rcu_barrier() will normally not result in any RCU grace periodsbeyond those that were already destined to be executed.
In kernels built with CONFIG_RCU_LAZY=y, this function also hurries allpending lazy RCU callbacks.
- voidrcu_barrier_throttled(void)¶
Do
rcu_barrier(), but limit to one per second
Parameters
voidno arguments
Description
This can be thought of as guard rails aroundrcu_barrier() thatpermits unrestricted userspace use, at least assuming the hardware’stry_cmpxchg() is robust. There will be at most one call per second torcu_barrier() system-wide from use of this function, which means thatcallers might needlessly wait a second or three.
This is intended for use by test suites to avoid OOM by flushing RCUcallbacks from the previous test before starting the next. See thercutree.do_rcu_barrier module parameter for more information.
Why not simply makercu_barrier() more scalable? That might bethe eventual endpoint, but let’s keep it simple for the time being.Note that the module parameter infrastructure serializes calls to agiven .set() function, but should concurrent .set() invocation ever bepossible, we are ready!
- voidsynchronize_rcu_expedited(void)¶
Brute-force RCU grace period
Parameters
voidno arguments
Description
Wait for an RCU grace period, but expedite it. The basic idea is toIPI all non-idle non-nohz online CPUs. The IPI handler checks whetherthe CPU is in an RCU critical section, and if so, it sets a flag thatcauses the outermostrcu_read_unlock() to report the quiescent statefor RCU-preempt or asks the scheduler for help for RCU-sched. On theother hand, if the CPU is not in an RCU read-side critical section,the IPI handler reports the quiescent state immediately.
Although this is a great improvement over previous expeditedimplementations, it is still unfriendly to real-time workloads, so isthus not recommended for any sort of common-case code. In fact, ifyou are usingsynchronize_rcu_expedited() in a loop, please restructureyour code to batch your updates, and then use a singlesynchronize_rcu()instead.
This has the same semantics as (but is more brutal than)synchronize_rcu().
- unsignedlongstart_poll_synchronize_rcu_expedited(void)¶
Snapshot current RCU state and start expedited grace period
Parameters
voidno arguments
Description
Returns a cookie to pass to a call tocond_synchronize_rcu(),cond_synchronize_rcu_expedited(), orpoll_state_synchronize_rcu(),allowing them to determine whether or not any sort of grace period haselapsed in the meantime. If the needed expedited grace period is notalready slated to start, initiates that grace period.
- voidstart_poll_synchronize_rcu_expedited_full(structrcu_gp_oldstate*rgosp)¶
Take a full snapshot and start expedited grace period
Parameters
structrcu_gp_oldstate*rgospPlace to put snapshot of grace-period state
Description
Places the normal and expedited grace-period states in rgosp. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()orpoll_state_synchronize_rcu_full() to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.If the needed expedited grace period is not already slated to start,initiates that grace period.
- voidcond_synchronize_rcu_expedited(unsignedlongoldstate)¶
Conditionally wait for an expedited RCU grace period
Parameters
unsignedlongoldstatevalue from
get_state_synchronize_rcu(),start_poll_synchronize_rcu(), orstart_poll_synchronize_rcu_expedited()
Description
If any type of full RCU grace period has elapsed since the earliercall toget_state_synchronize_rcu(),start_poll_synchronize_rcu(),orstart_poll_synchronize_rcu_expedited(), just return. Otherwise,invokesynchronize_rcu_expedited() to wait for a full grace period.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedoldstate and that returned at the endof this function.
- voidcond_synchronize_rcu_expedited_full(structrcu_gp_oldstate*rgosp)¶
Conditionally wait for an expedited RCU grace period
Parameters
structrcu_gp_oldstate*rgospvalue from
get_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(), orstart_poll_synchronize_rcu_expedited_full()
Description
If a full RCU grace period has elapsed since the call toget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(),orstart_poll_synchronize_rcu_expedited_full() from whichrgosp wasobtained, just return. Otherwise, invokesynchronize_rcu_expedited()to wait for a full grace period.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedrgosp and that returned at the end ofthis function.
- boolrcu_read_lock_held_common(bool*ret)¶
might we be in RCU-sched read-side critical section?
Parameters
bool*retBest guess answer if lockdep cannot be relied on
Description
Returns true if lockdep must be ignored, in which case*ret containsthe best guess described below. Otherwise returns false, in whichcase*ret tells the caller nothing and the caller should insteadconsult lockdep.
If CONFIG_DEBUG_LOCK_ALLOC is selected, set*ret to nonzero iff in anRCU-sched read-side critical section. In absence ofCONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-sidecritical section unless it can prove otherwise. Note that disablingof preemption (including disabling irqs) counts as an RCU-schedread-side critical section. This is useful for debug checks in functionsthat required that they be called within an RCU-sched read-sidecritical section.
Checkdebug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.
Note that if the CPU is in the idle loop from an RCU point of view (ie:that we are in the section betweenct_idle_enter() andct_idle_exit())thenrcu_read_lock_held() sets*ret to false even if the CPU did anrcu_read_lock(). The reason for this is that RCU ignores CPUs that arein such a section, considering these as in extended quiescent state,so such a CPU is effectively never in an RCU read-side critical sectionregardless of what RCU primitives it invokes. This state of affairs isrequired --- we need to keep an RCU-free window in idle where the CPU maypossibly enter into low power mode. This way we can notice an extendedquiescent state to other CPUs that started a grace period. Otherwisewe would delay any grace period as long as we run in the idle task.
Similarly, we avoid claiming an RCU read lock held if the currentCPU is offline.
- voidrcu_async_hurry(void)¶
Make future async RCU callbacks not lazy.
Parameters
voidno arguments
Description
After a call to this function, future calls tocall_rcu()will be processed in a timely fashion.
- voidrcu_async_relax(void)¶
Make future async RCU callbacks lazy.
Parameters
voidno arguments
Description
After a call to this function, future calls tocall_rcu()will be processed in a lazy fashion.
- voidrcu_expedite_gp(void)¶
Expedite future RCU grace periods
Parameters
voidno arguments
Description
After a call to this function, future calls tosynchronize_rcu() andfriends act as the correspondingsynchronize_rcu_expedited() functionhad instead been called.
- voidrcu_unexpedite_gp(void)¶
Cancel prior
rcu_expedite_gp()invocation
Parameters
voidno arguments
Description
Undo a prior call torcu_expedite_gp(). If all prior calls torcu_expedite_gp() are undone by a subsequent call torcu_unexpedite_gp(),and if the rcu_expedited sysfs/boot parameter is not set, then allsubsequent calls tosynchronize_rcu() and friends will return totheir normal non-expedited behavior.
- intnotracercu_read_lock_held(void)¶
might we be in RCU read-side critical section?
Parameters
voidno arguments
Description
If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCUread-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,this assumes we are in an RCU read-side critical section unless it canprove otherwise. This is useful for debug checks in functions thatrequire that they be called within an RCU read-side critical section.
Checksdebug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.
Note thatrcu_read_lock() and the matchingrcu_read_unlock() mustoccur in the same context, for example, it is illegal to invokercu_read_unlock() in process context if the matchingrcu_read_lock()was invoked from within an irq handler.
Note thatrcu_read_lock() is disallowed if the CPU is either idle oroffline from an RCU perspective, so check for those as well.
- intnotracercu_read_lock_bh_held(void)¶
might we be in RCU-bh read-side critical section?
Parameters
voidno arguments
Description
Check for bottom half being disabled, which covers both theCONFIG_PROVE_RCU and not cases. Note that if someone usesrcu_read_lock_bh(), but then later enables BH, lockdep (if enabled)will show the situation. This is useful for debug checks in functionsthat require that they be called within an RCU read-side criticalsection.
Checkdebug_lockdep_rcu_enabled() to prevent false positives during boot.
Note thatrcu_read_lock_bh() is disallowed if the CPU is either idle oroffline from an RCU perspective, so check for those as well.
- voidwakeme_after_rcu(structrcu_head*head)¶
Callback function to awaken a task after grace period
Parameters
structrcu_head*headPointer to rcu_head member within rcu_synchronize structure
Description
Awaken the corresponding task now that a grace period has elapsed.
- voidinit_rcu_head_on_stack(structrcu_head*head)¶
initialize on-stack rcu_head for debugobjects
Parameters
structrcu_head*headpointer to rcu_head structure to be initialized
Description
This function informs debugobjects of a new rcu_head structure thathas been allocated as an auto variable on the stack. This functionis not required for rcu_head structures that are statically defined orthat are dynamically allocated on the heap. This function has noeffect for !CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.
- voiddestroy_rcu_head_on_stack(structrcu_head*head)¶
destroy on-stack rcu_head for debugobjects
Parameters
structrcu_head*headpointer to rcu_head structure to be initialized
Description
This function informs debugobjects that an on-stack rcu_head structureis about to go out of scope. As withinit_rcu_head_on_stack(), thisfunction is not required for rcu_head structures that are staticallydefined or that are dynamically allocated on the heap. Also as withinit_rcu_head_on_stack(), this function has no effect for!CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.
- unsignedlongget_completed_synchronize_rcu(void)¶
Return a pre-completed polled state cookie
Parameters
voidno arguments
Description
Returns a value that will always be treated by functions likepoll_state_synchronize_rcu() as a cookie whose grace period has alreadycompleted.
- unsignedlongget_completed_synchronize_srcu(void)¶
Return a pre-completed polled state cookie
Parameters
voidno arguments
Description
Returns a value thatpoll_state_synchronize_srcu() will always treatas a cookie whose grace period has already completed.
- boolsame_state_synchronize_srcu(unsignedlongoldstate1,unsignedlongoldstate2)¶
Are two old-state values identical?
Parameters
unsignedlongoldstate1First old-state value.
unsignedlongoldstate2Second old-state value.
Description
The two old-state values must have been obtained from eitherget_state_synchronize_srcu(),start_poll_synchronize_srcu(), orget_completed_synchronize_srcu(). Returnstrue if the two values areidentical andfalse otherwise. This allows structures whose lifetimesare tracked by old-state values to push these values to a list header,allowing those structures to be slightly smaller.
- intsrcu_read_lock_held(conststructsrcu_struct*ssp)¶
might we be in SRCU read-side critical section?
Parameters
conststructsrcu_struct*sspThe srcu_struct structure to check
Description
If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an SRCUread-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,this assumes we are in an SRCU read-side critical section unless it canprove otherwise.
Checksdebug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.
Note that SRCU is based on its own statemachine and it doesn’trelies on normal RCU, it can be called from the CPU whichis in the idle loop from an RCU point of view or offline.
- srcu_dereference_check¶
srcu_dereference_check(p,ssp,c)
fetch SRCU-protected pointer for later dereferencing
Parameters
pthe pointer to fetch and protect for later dereferencing
ssppointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.
ccondition to check for update-side use
Description
If PROVE_RCU is enabled, invoking this outside of an RCU read-sidecritical section will result in an RCU-lockdep splat, unlessc evaluatesto 1. Thec argument will normally be a logical expression containinglockdep_is_held() calls.
- srcu_dereference¶
srcu_dereference(p,ssp)
fetch SRCU-protected pointer for later dereferencing
Parameters
pthe pointer to fetch and protect for later dereferencing
ssppointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.
Description
Makesrcu_dereference_check() do the dirty work. If PROVE_RCUis enabled, invoking this outside of an RCU read-side criticalsection will result in an RCU-lockdep splat.
- srcu_dereference_notrace¶
srcu_dereference_notrace(p,ssp)
no tracing and no lockdep calls from here
Parameters
pthe pointer to fetch and protect for later dereferencing
ssppointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.
- intsrcu_read_lock(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to register the new reader.
Description
Enter an SRCU read-side critical section. Note that SRCU read-sidecritical sections may be nested. However, it is illegal tocall anything that waits on an SRCU grace period for the samesrcu_struct, whether directly or indirectly. Please note thatone way to indirectly wait on an SRCU grace period is to acquirea mutex that is held elsewhere while callingsynchronize_srcu() orsynchronize_srcu_expedited().
The return value fromsrcu_read_lock() is guaranteed to benon-negative. This value must be passed unaltered to the matchingsrcu_read_unlock(). Note thatsrcu_read_lock() and the matchingsrcu_read_unlock() must occur in the same context, for example, it isillegal to invokesrcu_read_unlock() in an irq handler if the matchingsrcu_read_lock() was invoked in process context. Or, for that matter toinvokesrcu_read_unlock() from one task and the matchingsrcu_read_lock()from another.
- structsrcu_ctr__percpu*srcu_read_lock_fast(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to register the new reader.
Description
Enter an SRCU read-side critical section, but for a light-weightsmp_mb()-free reader. Seesrcu_read_lock() for more information. Thisfunction is NMI-safe, in a manner similar tosrcu_read_lock_nmisafe().
Forsrcu_read_lock_fast() to be used on an srcu_struct structure,that structure must have been defined using eitherDEFINE_SRCU_FAST()orDEFINE_STATIC_SRCU_FAST() on the one hand or initialized withinit_srcu_struct_fast() on the other. Such an srcu_struct structurecannot be passed to any non-fast variant of srcu_read_{,un}lock() orsrcu_{down,up}_read(). In kernels built with CONFIG_PROVE_RCU=y,__srcu_check_read_flavor() will complain bitterly if you ignore thisrestriction.
Grace-period auto-expediting is disabled for SRCU-fast srcu_structstructures because SRCU-fast expedited grace periods invokesynchronize_rcu_expedited(), IPIs and all. If you need expeditedSRCU-fast grace periods, usesynchronize_srcu_expedited().
Thesrcu_read_lock_fast() function can be invoked only from thosecontexts where RCU is watching, that is, from contexts where it wouldbe legal to invokercu_read_lock(). Otherwise, lockdep will complain.
- structsrcu_ctr__percpu*srcu_read_lock_fast_updown(structsrcu_struct*ssp)¶
register a new reader for an SRCU-fast-updown structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to register the new reader.
Description
Enter an SRCU read-side critical section, but for a light-weightsmp_mb()-free reader. Seesrcu_read_lock() for more information.This function is compatible withsrcu_down_read_fast(), but is notNMI-safe.
Forsrcu_read_lock_fast_updown() to be used on an srcu_structstructure, that structure must have been defined using eitherDEFINE_SRCU_FAST_UPDOWN() orDEFINE_STATIC_SRCU_FAST_UPDOWN() on the onehand or initialized withinit_srcu_struct_fast_updown() on the other.Such an srcu_struct structure cannot be passed to any non-fast-updownvariant of srcu_read_{,un}lock() or srcu_{down,up}_read(). In kernelsbuilt with CONFIG_PROVE_RCU=y,__srcu_check_read_flavor() will complainbitterly if you ignore this * restriction.
Grace-period auto-expediting is disabled for SRCU-fast-updownsrcu_struct structures because SRCU-fast-updown expedited grace periodsinvokesynchronize_rcu_expedited(), IPIs and all. If you need expeditedSRCU-fast-updown grace periods, usesynchronize_srcu_expedited().
Thesrcu_read_lock_fast_updown() function can be invoked only fromthose contexts where RCU is watching, that is, from contexts whereit would be legal to invokercu_read_lock(). Otherwise, lockdep willcomplain.
- structsrcu_ctr__percpu*srcu_down_read_fast(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to register the new reader.
Description
Enter a semaphore-like SRCU read-side critical section, but fora light-weightsmp_mb()-free reader. Seesrcu_read_lock_fast() andsrcu_down_read() for more information.
The same srcu_struct may be used concurrently bysrcu_down_read_fast()andsrcu_read_lock_fast(). However, the same definition/initializationrequirements called out forsrcu_read_lock_safe() apply.
- intsrcu_read_lock_nmisafe(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to register the new reader.
Description
Enter an SRCU read-side critical section, but in an NMI-safe manner.Seesrcu_read_lock() for more information.
Ifsrcu_read_lock_nmisafe() is ever used on an srcu_struct structure,then none of the other flavors may be used, whether before, during,or after.
- intsrcu_down_read(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to register the new reader.
Description
Enter a semaphore-like SRCU read-side critical section. Note thatSRCU read-side critical sections may be nested. However, it isillegal to call anything that waits on an SRCU grace period for thesame srcu_struct, whether directly or indirectly. Please note thatone way to indirectly wait on an SRCU grace period is to acquirea mutex that is held elsewhere while callingsynchronize_srcu() orsynchronize_srcu_expedited(). But if you want lockdep to help youkeep this stuff straight, you should instead usesrcu_read_lock().
The semaphore-like nature ofsrcu_down_read() means that the matchingsrcu_up_read() can be invoked from some other context, for example,from some other task or from an irq handler. However, neithersrcu_down_read() norsrcu_up_read() may be invoked from an NMI handler.
Calls tosrcu_down_read() may be nested, similar to the manner inwhich calls todown_read() may be nested. The same srcu_struct may beused concurrently bysrcu_down_read() andsrcu_read_lock().
- voidsrcu_read_unlock(structsrcu_struct*ssp,intidx)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to unregister the old reader.
intidxreturn value from corresponding
srcu_read_lock().
Description
Exit an SRCU read-side critical section.
- voidsrcu_read_unlock_fast(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to unregister the old reader.
structsrcu_ctr__percpu*scpreturn value from corresponding
srcu_read_lock_fast().
Description
Exit a light-weight SRCU read-side critical section.
- voidsrcu_read_unlock_fast_updown(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)¶
unregister a old reader from an SRCU-fast-updown structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to unregister the old reader.
structsrcu_ctr__percpu*scpreturn value from corresponding
srcu_read_lock_fast_updown().
Description
Exit an SRCU-fast-updown read-side critical section.
- voidsrcu_up_read_fast(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to unregister the old reader.
structsrcu_ctr__percpu*scpreturn value from corresponding
srcu_read_lock_fast().
Description
Exit an SRCU read-side critical section, but not necessarily fromthe same context as the machingsrcu_down_read_fast().
- voidsrcu_read_unlock_nmisafe(structsrcu_struct*ssp,intidx)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to unregister the old reader.
intidxreturn value from corresponding
srcu_read_lock_nmisafe().
Description
Exit an SRCU read-side critical section, but in an NMI-safe manner.
- voidsrcu_up_read(structsrcu_struct*ssp,intidx)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*sspsrcu_struct in which to unregister the old reader.
intidxreturn value from corresponding
srcu_read_lock().
Description
Exit an SRCU read-side critical section, but not necessarily fromthe same context as the machingsrcu_down_read().
- voidsmp_mb__after_srcu_read_unlock(void)¶
ensure full ordering after srcu_read_unlock
Parameters
voidno arguments
Description
Converts the preceding srcu_read_unlock into a two-way memory barrier.
Call this after srcu_read_unlock, to guarantee that all memory operationsthat occur after smp_mb__after_srcu_read_unlock will appear to happen afterthe preceding srcu_read_unlock.
- voidsmp_mb__after_srcu_read_lock(void)¶
ensure full ordering after srcu_read_lock
Parameters
voidno arguments
Description
Converts the preceding srcu_read_lock into a two-way memory barrier.
Call this after srcu_read_lock, to guarantee that all memory operationsthat occur after smp_mb__after_srcu_read_lock will appear to happen afterthe preceding srcu_read_lock.
- intinit_srcu_struct(structsrcu_struct*ssp)¶
initialize a sleep-RCU structure
Parameters
structsrcu_struct*sspstructure to initialize.
Description
Use this in place ofDEFINE_SRCU() andDEFINE_STATIC_SRCU()for non-static srcu_struct structures that are to be passed tosrcu_read_lock(),srcu_read_lock_nmisafe(), and friends. It is necessaryto invoke this on a given srcu_struct before passing that srcu_structto any other function. Each srcu_struct represents a separate domainof SRCU protection.
- intinit_srcu_struct_fast(structsrcu_struct*ssp)¶
initialize a fast-reader sleep-RCU structure
Parameters
structsrcu_struct*sspstructure to initialize.
Description
Use this in place ofDEFINE_SRCU_FAST() andDEFINE_STATIC_SRCU_FAST()for non-static srcu_struct structures that are to be passed tosrcu_read_lock_fast() and friends. It is necessary to invoke this on agiven srcu_struct before passing that srcu_struct to any other function.Each srcu_struct represents a separate domain of SRCU protection.
- intinit_srcu_struct_fast_updown(structsrcu_struct*ssp)¶
initialize a fast-reader up/down sleep-RCU structure
Parameters
structsrcu_struct*sspstructure to initialize.
Description
Use this function in place ofDEFINE_SRCU_FAST_UPDOWN() andDEFINE_STATIC_SRCU_FAST_UPDOWN() for non-static srcu_structstructures that are to be passed tosrcu_read_lock_fast_updown(),srcu_down_read_fast(), and friends. It is necessary to invoke this on agiven srcu_struct before passing that srcu_struct to any other function.Each srcu_struct represents a separate domain of SRCU protection.
- boolsrcu_readers_active(structsrcu_struct*ssp)¶
returns true if there are readers. and false otherwise
Parameters
structsrcu_struct*sspwhich srcu_struct to count active readers (holding srcu_read_lock).
Description
Note that this is not an atomic primitive, and can therefore suffersevere errors when invoked on an active srcu_struct. That said, itcan be useful as an error check at cleanup time.
- voidcleanup_srcu_struct(structsrcu_struct*ssp)¶
deconstruct a sleep-RCU structure
Parameters
structsrcu_struct*sspstructure to clean up.
Description
Must invoke this after you are finished using a given srcu_struct thatwas initialized viainit_srcu_struct(), else you leak memory.
- voidcall_srcu(structsrcu_struct*ssp,structrcu_head*rhp,rcu_callback_tfunc)¶
Queue a callback for invocation after an SRCU grace period
Parameters
structsrcu_struct*sspsrcu_struct in queue the callback
structrcu_head*rhpstructure to be used for queueing the SRCU callback.
rcu_callback_tfuncfunction to be invoked after the SRCU grace period
Description
The callback function will be invoked some time after a full SRCUgrace period elapses, in other words after all pre-existing SRCUread-side critical sections have completed. However, the callbackfunction might well execute concurrently with other SRCU read-sidecritical sections that started aftercall_srcu() was invoked. SRCUread-side critical sections are delimited bysrcu_read_lock() andsrcu_read_unlock(), and may be nested.
The callback will be invoked from process context, but with bhdisabled. The callback function must therefore be fast and mustnot block.
See the description ofcall_rcu() for more detailed information onmemory ordering guarantees.
- voidsynchronize_srcu_expedited(structsrcu_struct*ssp)¶
Brute-force SRCU grace period
Parameters
structsrcu_struct*sspsrcu_struct with which to synchronize.
Description
Wait for an SRCU grace period to elapse, but be more aggressive aboutspinning rather than blocking when waiting.
Note thatsynchronize_srcu_expedited() has the same deadlock andmemory-ordering properties as doessynchronize_srcu().
- voidsynchronize_srcu(structsrcu_struct*ssp)¶
wait for prior SRCU read-side critical-section completion
Parameters
structsrcu_struct*sspsrcu_struct with which to synchronize.
Description
Wait for the count to drain to zero of both indexes. To avoid thepossible starvation ofsynchronize_srcu(), it waits for the count ofthe index=!(ssp->srcu_ctrp -ssp->sda->srcu_ctrs[0]) to drain to zeroat first, and then flip the ->srcu_ctrp and wait for the count of theother index.
Can block; must be called from process context.
Note that it is illegal to callsynchronize_srcu() from the correspondingSRCU read-side critical section; doing so will result in deadlock.However, it is perfectly legal to callsynchronize_srcu() on onesrcu_struct from some other srcu_struct’s read-side critical section,as long as the resulting graph of srcu_structs is acyclic.
There are memory-ordering constraints implied bysynchronize_srcu().On systems with more than one CPU, whensynchronize_srcu() returns,each CPU is guaranteed to have executed a full memory barrier sincethe end of its last corresponding SRCU read-side critical sectionwhose beginning preceded the call tosynchronize_srcu(). In addition,each CPU having an SRCU read-side critical section that extends beyondthe return fromsynchronize_srcu() is guaranteed to have executed afull memory barrier after the beginning ofsynchronize_srcu() and beforethe beginning of that SRCU read-side critical section. Note that theseguarantees include CPUs that are offline, idle, or executing in user mode,as well as CPUs that are executing in the kernel.
Furthermore, if CPU A invokedsynchronize_srcu(), which returnedto its caller on CPU B, then both CPU A and CPU B are guaranteedto have executed a full memory barrier during the execution ofsynchronize_srcu(). This guarantee applies even if CPU A and CPU Bare the same CPU, but again only if the system has more than one CPU.
Of course, these memory-ordering guarantees apply only whensynchronize_srcu(),srcu_read_lock(), andsrcu_read_unlock() arepassed the same srcu_struct structure.
Implementation of these memory-ordering guarantees is similar tothat ofsynchronize_rcu().
If SRCU is likely idle as determined bysrcu_should_expedite(),expedite the first request. This semantic was provided by Classic SRCU,and is relied upon by its users, so TREE SRCU must also provide it.Note that detecting idleness is heuristic and subject to both falsepositives and negatives.
- unsignedlongget_state_synchronize_srcu(structsrcu_struct*ssp)¶
Provide an end-of-grace-period cookie
Parameters
structsrcu_struct*sspsrcu_struct to provide cookie for.
Description
This function returns a cookie that can be passed topoll_state_synchronize_srcu(), which will return true if a full graceperiod has elapsed in the meantime. It is the caller’s responsibilityto make sure that grace period happens, for example, by invokingcall_srcu() after return fromget_state_synchronize_srcu().
- unsignedlongstart_poll_synchronize_srcu(structsrcu_struct*ssp)¶
Provide cookie and start grace period
Parameters
structsrcu_struct*sspsrcu_struct to provide cookie for.
Description
This function returns a cookie that can be passed topoll_state_synchronize_srcu(), which will return true if a full graceperiod has elapsed in the meantime. Unlikeget_state_synchronize_srcu(),this function also ensures that any needed SRCU grace period will bestarted. This convenience does come at a cost in terms of CPU overhead.
- boolpoll_state_synchronize_srcu(structsrcu_struct*ssp,unsignedlongcookie)¶
Has cookie’s grace period ended?
Parameters
structsrcu_struct*sspsrcu_struct to provide cookie for.
unsignedlongcookieReturn value from
get_state_synchronize_srcu()orstart_poll_synchronize_srcu().
Description
This function takes the cookie that was returned from eitherget_state_synchronize_srcu() orstart_poll_synchronize_srcu(), andreturnstrue if an SRCU grace period elapsed since the time that thecookie was created.
Because cookies are finite in size, wrapping/overflow is possible.This is more pronounced on 32-bit systems where cookies are 32 bits,where in theory wrapping could happen in about 14 hours assuming25-microsecond expedited SRCU grace periods. However, a more likelyoverflow lower bound is on the order of 24 days in the case ofone-millisecond SRCU grace periods. Of course, wrapping in a 64-bitsystem requires geologic timespans, as in more than seven million yearseven for expedited SRCU grace periods.
Wrapping/overflow is much more of an issue for CONFIG_SMP=n systemsthat also have CONFIG_PREEMPTION=n, which selects Tiny SRCU. This usesa 16-bit cookie, which rcutorture routinely wraps in a matter of afew minutes. If this proves to be a problem, this counter will beexpanded to the same size as for Tree SRCU.
- voidsrcu_barrier(structsrcu_struct*ssp)¶
Wait until all in-flight
call_srcu()callbacks complete.
Parameters
structsrcu_struct*sspsrcu_struct on which to wait for in-flight callbacks.
- voidsrcu_expedite_current(structsrcu_struct*ssp)¶
Expedite the current SRCU grace period
Parameters
structsrcu_struct*sspsrcu_struct to expedite.
Description
Cause the current SRCU grace period to become expedited. The graceperiod following the current one might also be expedited. If there isno current grace period, one might be created. If the current graceperiod is currently sleeping, that sleep will complete before expeditingwill take effect.
- unsignedlongsrcu_batches_completed(structsrcu_struct*ssp)¶
return batches completed.
Parameters
structsrcu_struct*sspsrcu_struct on which to report batch completion.
Description
Report the number of batches, correlated with, but not necessarilyprecisely the same as, the number of grace periods that have elapsed.
- voidhlist_bl_del_rcu(structhlist_bl_node*n)¶
deletes entry from hash list without re-initialization
Parameters
structhlist_bl_node*nthe element to delete from the hash list.
Note
hlist_bl_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.
In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_bl_add_head_rcu()orhlist_bl_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_bl_for_each_entry().
- voidhlist_bl_add_head_rcu(structhlist_bl_node*n,structhlist_bl_head*h)¶
Parameters
structhlist_bl_node*nthe element to add to the hash list.
structhlist_bl_head*hthe list to add to.
Description
Adds the specified element to the specified hlist_bl,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_bl_add_head_rcu()orhlist_bl_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_bl_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().
- hlist_bl_for_each_entry_rcu¶
hlist_bl_for_each_entry_rcu(tpos,pos,head,member)
iterate over rcu list of given type
Parameters
tposthe type * to use as a loop cursor.
posthe
structhlist_bl_nodeto use as a loop cursor.headthe head for your list.
memberthe name of the hlist_bl_node within the struct.
- list_for_each_rcu¶
list_for_each_rcu(pos,head)
Iterate over a list in an RCU-safe fashion
Parameters
posthe
structlist_headto use as a loop cursor.headthe head for your list.
- list_tail_rcu¶
list_tail_rcu(head)
returns the prev pointer of the head of the list
Parameters
headthe head of the list
Note
This should only be used with the list header, and even thenonly iflist_del() and similar primitives are not also used on thelist header.
- voidlist_add_rcu(structlist_head*new,structlist_head*head)¶
add a new entry to rcu-protected list
Parameters
structlist_head*newnew entry to be added
structlist_head*headlist head to add it after
Description
Insert a new entry after the specified head.This is good for implementing stacks.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_add_rcu()orlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu().
- voidlist_add_tail_rcu(structlist_head*new,structlist_head*head)¶
add a new entry to rcu-protected list
Parameters
structlist_head*newnew entry to be added
structlist_head*headlist head to add it before
Description
Insert a new entry before the specified head.This is useful for implementing queues.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_add_tail_rcu()orlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu().
- voidlist_del_rcu(structlist_head*entry)¶
deletes entry from list without re-initialization
Parameters
structlist_head*entrythe element to delete from the list.
Note
list_empty() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.
In particular, it means that we can not poison the forwardpointers that may still be used for walking the list.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_del_rcu()orlist_add_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu().
Note that the caller is not permitted to immediately freethe newly deleted entry. Instead, eithersynchronize_rcu()orcall_rcu() must be used to defer freeing until an RCUgrace period has elapsed.
- voidlist_bidir_del_rcu(structlist_head*entry)¶
deletes entry from list without re-initialization
Parameters
structlist_head*entrythe element to delete from the list.
Description
In contrast tolist_del_rcu() doesn’t poison the prev pointer thusallowing backwards traversal vialist_bidir_prev_rcu().
Note
list_empty() on entry does not return true after this becausethe entry is in a special undefined state that permits RCU-basedlockfree reverse traversal. In particular this means that we can notpoison the forward and backwards pointers that may still be used forwalking the list.
The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with another list-mutationprimitive, such aslist_bidir_del_rcu() orlist_add_rcu(), running onthis same list. However, it is perfectly legal to run concurrentlywith the _rcu list-traversal primitives, such aslist_for_each_entry_rcu().
Note thatlist_del_rcu() andlist_bidir_del_rcu() must not be used onthe same list.
Note that the caller is not permitted to immediately freethe newly deleted entry. Instead, eithersynchronize_rcu()orcall_rcu() must be used to defer freeing until an RCUgrace period has elapsed.
- voidhlist_del_init_rcu(structhlist_node*n)¶
deletes entry from hash list with re-initialization
Parameters
structhlist_node*nthe element to delete from the hash list.
Note
list_unhashed() on the node return true after this. It isuseful for RCU based read lockfree traversal if the writer sidemust know if the list entry is still hashed or already unhashed.
In particular, it means that we can not poison the forward pointersthat may still be used for walking the hash list and we can onlyzero the pprev pointer solist_unhashed() will return true afterthis.
The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with anotherlist-mutation primitive, such ashlist_add_head_rcu() orhlist_del_rcu(), running on this same list. However, it isperfectly legal to run concurrently with the _rcu list-traversalprimitives, such ashlist_for_each_entry_rcu().
- voidlist_replace_rcu(structlist_head*old,structlist_head*new)¶
replace old entry by new one
Parameters
structlist_head*oldthe element to be replaced
structlist_head*newthe new element to insert
Description
Theold entry will be replaced with thenew entry atomically fromthe perspective of concurrent readers. It is the caller’s responsibilityto synchronize with concurrent updaters, if any.
Note
old should not be empty.
- void__list_splice_init_rcu(structlist_head*list,structlist_head*prev,structlist_head*next,void(*sync)(void))¶
join an RCU-protected list into an existing list.
Parameters
structlist_head*listthe RCU-protected list to splice
structlist_head*prevpoints to the last element of the existing list
structlist_head*nextpoints to the first element of the existing list
void(*sync)(void)synchronize_rcu, synchronize_rcu_expedited, ...
Description
The list pointed to byprev andnext can be RCU-read traversedconcurrently with this function.
Note that this function blocks.
Important note: the caller must take whatever action is necessary to preventany other updates to the existing list. In principle, it is possible tomodify the list as soon assync() begins execution. If this sort of thingbecomes necessary, an alternative version based oncall_rcu() could becreated. But only if -really- needed -- there is no shortage of RCU APImembers.
- voidlist_splice_init_rcu(structlist_head*list,structlist_head*head,void(*sync)(void))¶
splice an RCU-protected list into an existing list, designed for stacks.
Parameters
structlist_head*listthe RCU-protected list to splice
structlist_head*headthe place in the existing list to splice the first list into
void(*sync)(void)synchronize_rcu, synchronize_rcu_expedited, ...
- voidlist_splice_tail_init_rcu(structlist_head*list,structlist_head*head,void(*sync)(void))¶
splice an RCU-protected list into an existing list, designed for queues.
Parameters
structlist_head*listthe RCU-protected list to splice
structlist_head*headthe place in the existing list to splice the first list into
void(*sync)(void)synchronize_rcu, synchronize_rcu_expedited, ...
- list_entry_rcu¶
list_entry_rcu(ptr,type,member)
get the struct for this entry
Parameters
ptrthe
structlist_headpointer.typethe type of the
structthisis embedded in.memberthe name of the list_head within the struct.
Description
This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu() as long as it’s guarded byrcu_read_lock().
- list_first_or_null_rcu¶
list_first_or_null_rcu(ptr,type,member)
get the first element from a list
Parameters
ptrthe list head to take the element from.
typethe type of the
structthisis embedded in.memberthe name of the list_head within the struct.
Description
Note that if the list is empty, it returns NULL.
This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu() as long as it’s guarded byrcu_read_lock().
- list_next_or_null_rcu¶
list_next_or_null_rcu(head,ptr,type,member)
get the next element from a list
Parameters
headthe head for the list.
ptrthe list head to take the next element from.
typethe type of the
structthisis embedded in.memberthe name of the list_head within the struct.
Description
Note that if the ptr is at the end of the list, NULL is returned.
This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu() as long as it’s guarded byrcu_read_lock().
- list_for_each_entry_rcu¶
list_for_each_entry_rcu(pos,head,member,cond...)
iterate over rcu list of given type
Parameters
posthe type * to use as a loop cursor.
headthe head for your list.
memberthe name of the list_head within the struct.
cond...optional lockdep expression if called from non-RCU protection.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such aslist_add_rcu()as long as the traversal is guarded byrcu_read_lock().
- list_for_each_entry_srcu¶
list_for_each_entry_srcu(pos,head,member,cond)
iterate over rcu list of given type
Parameters
posthe type * to use as a loop cursor.
headthe head for your list.
memberthe name of the list_head within the struct.
condlockdep expression for the lock required to traverse the list.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such aslist_add_rcu()as long as the traversal is guarded bysrcu_read_lock().The lockdep expressionsrcu_read_lock_held() can be passed as thecond argument from read side.
- list_entry_lockless¶
list_entry_lockless(ptr,type,member)
get the struct for this entry
Parameters
ptrthe
structlist_headpointer.typethe type of the
structthisis embedded in.memberthe name of the list_head within the struct.
Description
This primitive may safely run concurrently with the _rculist-mutation primitives such aslist_add_rcu(), but requires someimplicit RCU read-side guarding. One example is running within a specialexception-time environment where preemption is disabled and where lockdepcannot be invoked. Another example is when items are added to the list,but never deleted.
- list_for_each_entry_lockless¶
list_for_each_entry_lockless(pos,head,member)
iterate over rcu list of given type
Parameters
posthe type * to use as a loop cursor.
headthe head for your list.
memberthe name of the list_struct within the struct.
Description
This primitive may safely run concurrently with the _rculist-mutation primitives such aslist_add_rcu(), but requires someimplicit RCU read-side guarding. One example is running within a specialexception-time environment where preemption is disabled and where lockdepcannot be invoked. Another example is when items are added to the list,but never deleted.
- list_for_each_entry_continue_rcu¶
list_for_each_entry_continue_rcu(pos,head,member)
continue iteration over list of given type
Parameters
posthe type * to use as a loop cursor.
headthe head for your list.
memberthe name of the list_head within the struct.
Description
Continue to iterate over list of given type, continuing afterthe current position which must have been in the list when the RCU readlock was taken.This would typically require either that you obtained the node from aprevious walk of the list in the same RCU read-side critical section, orthat you held some sort of non-RCU reference (such as a reference count)to keep the node aliveand in the list.
This iterator is similar tolist_for_each_entry_from_rcu() exceptthis starts after the given position and that one starts at the givenposition.
- list_for_each_entry_from_rcu¶
list_for_each_entry_from_rcu(pos,head,member)
iterate over a list from current point
Parameters
posthe type * to use as a loop cursor.
headthe head for your list.
memberthe name of the list_node within the struct.
Description
Iterate over the tail of a list starting from a given position,which must have been in the list when the RCU read lock was taken.This would typically require either that you obtained the node from aprevious walk of the list in the same RCU read-side critical section, orthat you held some sort of non-RCU reference (such as a reference count)to keep the node aliveand in the list.
This iterator is similar tolist_for_each_entry_continue_rcu() exceptthis starts from the given position and that one starts from the positionafter the given position.
- voidhlist_del_rcu(structhlist_node*n)¶
deletes entry from hash list without re-initialization
Parameters
structhlist_node*nthe element to delete from the hash list.
Note
list_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.
In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry().
- voidhlist_replace_rcu(structhlist_node*old,structhlist_node*new)¶
replace old entry by new one
Parameters
structhlist_node*oldthe element to be replaced
structhlist_node*newthe new element to insert
Description
Theold entry will be replaced with thenew entry atomically fromthe perspective of concurrent readers. It is the caller’s responsibilityto synchronize with concurrent updaters, if any.
- voidhlists_swap_heads_rcu(structhlist_head*left,structhlist_head*right)¶
swap the lists the hlist heads point to
Parameters
structhlist_head*leftThe hlist head on the left
structhlist_head*rightThe hlist head on the right
Description
- The lists start out as [left ][node1 ... ] and
[right ][node2 ... ]
- The lists end up as [left ][node2 ... ]
[right ][node1 ... ]
- voidhlist_add_head_rcu(structhlist_node*n,structhlist_head*h)¶
Parameters
structhlist_node*nthe element to add to the hash list.
structhlist_head*hthe list to add to.
Description
Adds the specified element to the specified hlist,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().
- voidhlist_add_tail_rcu(structhlist_node*n,structhlist_head*h)¶
Parameters
structhlist_node*nthe element to add to the hash list.
structhlist_head*hthe list to add to.
Description
Adds the specified element to the specified hlist,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().
- voidhlist_add_before_rcu(structhlist_node*n,structhlist_node*next)¶
Parameters
structhlist_node*nthe new element to add to the hash list.
structhlist_node*nextthe existing element to add the new element before.
Description
Adds the specified element to the specified hlistbefore the specified node while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs.
- voidhlist_add_behind_rcu(structhlist_node*n,structhlist_node*prev)¶
Parameters
structhlist_node*nthe new element to add to the hash list.
structhlist_node*prevthe existing element to add the new element after.
Description
Adds the specified element to the specified hlistafter the specified node while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs.
- hlist_for_each_entry_rcu¶
hlist_for_each_entry_rcu(pos,head,member,cond...)
iterate over rcu list of given type
Parameters
posthe type * to use as a loop cursor.
headthe head for your list.
memberthe name of the hlist_node within the struct.
cond...optional lockdep expression if called from non-RCU protection.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().
- hlist_for_each_entry_srcu¶
hlist_for_each_entry_srcu(pos,head,member,cond)
iterate over rcu list of given type
Parameters
posthe type * to use as a loop cursor.
headthe head for your list.
memberthe name of the hlist_node within the struct.
condlockdep expression for the lock required to traverse the list.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded bysrcu_read_lock().The lockdep expressionsrcu_read_lock_held() can be passed as thecond argument from read side.
- hlist_for_each_entry_rcu_notrace¶
hlist_for_each_entry_rcu_notrace(pos,head,member)
iterate over rcu list of given type (for tracing)
Parameters
posthe type * to use as a loop cursor.
headthe head for your list.
memberthe name of the hlist_node within the struct.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().
This is the same ashlist_for_each_entry_rcu() except that it doesnot do any RCU debugging or tracing.
- hlist_for_each_entry_rcu_bh¶
hlist_for_each_entry_rcu_bh(pos,head,member)
iterate over rcu list of given type
Parameters
posthe type * to use as a loop cursor.
headthe head for your list.
memberthe name of the hlist_node within the struct.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().
- hlist_for_each_entry_continue_rcu¶
hlist_for_each_entry_continue_rcu(pos,member)
iterate over a hlist continuing after current point
Parameters
posthe type * to use as a loop cursor.
memberthe name of the hlist_node within the struct.
- hlist_for_each_entry_continue_rcu_bh¶
hlist_for_each_entry_continue_rcu_bh(pos,member)
iterate over a hlist continuing after current point
Parameters
posthe type * to use as a loop cursor.
memberthe name of the hlist_node within the struct.
- hlist_for_each_entry_from_rcu¶
hlist_for_each_entry_from_rcu(pos,member)
iterate over a hlist continuing from current point
Parameters
posthe type * to use as a loop cursor.
memberthe name of the hlist_node within the struct.
- voidhlist_nulls_del_init_rcu(structhlist_nulls_node*n)¶
deletes entry from hash list with re-initialization
Parameters
structhlist_nulls_node*nthe element to delete from the hash list.
Note
hlist_nulls_unhashed() on the node return true after this. It isuseful for RCU based read lockfree traversal if the writer sidemust know if the list entry is still hashed or already unhashed.
In particular, it means that we can not poison the forward pointersthat may still be used for walking the hash list and we can onlyzero the pprev pointer solist_unhashed() will return true afterthis.
The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with anotherlist-mutation primitive, such ashlist_nulls_add_head_rcu() orhlist_nulls_del_rcu(), running on this same list. However, it isperfectly legal to run concurrently with the _rcu list-traversalprimitives, such ashlist_nulls_for_each_entry_rcu().
- hlist_nulls_first_rcu¶
hlist_nulls_first_rcu(head)
returns the first element of the hash list.
Parameters
headthe head of the list.
- hlist_nulls_next_rcu¶
hlist_nulls_next_rcu(node)
returns the element of the list afternode.
Parameters
nodeelement of the list.
- hlist_nulls_pprev_rcu¶
hlist_nulls_pprev_rcu(node)
returns the dereferenced pprev ofnode.
Parameters
nodeelement of the list.
- voidhlist_nulls_del_rcu(structhlist_nulls_node*n)¶
deletes entry from hash list without re-initialization
Parameters
structhlist_nulls_node*nthe element to delete from the hash list.
Note
hlist_nulls_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.
In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()orhlist_nulls_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry().
- voidhlist_nulls_add_head_rcu(structhlist_nulls_node*n,structhlist_nulls_head*h)¶
Parameters
structhlist_nulls_node*nthe element to add to the hash list.
structhlist_nulls_head*hthe list to add to.
Description
Adds the specified element to the specified hlist_nulls,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()orhlist_nulls_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().
- voidhlist_nulls_add_tail_rcu(structhlist_nulls_node*n,structhlist_nulls_head*h)¶
Parameters
structhlist_nulls_node*nthe element to add to the hash list.
structhlist_nulls_head*hthe list to add to.
Description
Adds the specified element to the specified hlist_nulls,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()orhlist_nulls_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().
- voidhlist_nulls_replace_rcu(structhlist_nulls_node*old,structhlist_nulls_node*new)¶
replace an old entry by a new one
Parameters
structhlist_nulls_node*oldthe element to be replaced
structhlist_nulls_node*newthe new element to insert
Description
Replace the old entry with the new one in a RCU-protected hlist_nulls, whilepermitting racing traversals.
The caller must take whatever precautions are necessary (such as holdingappropriate locks) to avoid racing with another list-mutation primitive, suchashlist_nulls_add_head_rcu() orhlist_nulls_del_rcu(), running on this samelist. However, it is perfectly legal to run concurrently with the _rculist-traversal primitives, such ashlist_nulls_for_each_entry_rcu().
- voidhlist_nulls_replace_init_rcu(structhlist_nulls_node*old,structhlist_nulls_node*new)¶
replace an old entry by a new one and initialize the old
Parameters
structhlist_nulls_node*oldthe element to be replaced
structhlist_nulls_node*newthe new element to insert
Description
Replace the old entry with the new one in a RCU-protected hlist_nulls, whilepermitting racing traversals, and reinitialize the old entry.
Note
old must be hashed.
The caller must take whatever precautions are necessary (such as holdingappropriate locks) to avoid racing with another list-mutation primitive, suchashlist_nulls_add_head_rcu() orhlist_nulls_del_rcu(), running on this samelist. However, it is perfectly legal to run concurrently with the _rculist-traversal primitives, such ashlist_nulls_for_each_entry_rcu().
- hlist_nulls_for_each_entry_rcu¶
hlist_nulls_for_each_entry_rcu(tpos,pos,head,member)
iterate over rcu list of given type
Parameters
tposthe type * to use as a loop cursor.
posthe
structhlist_nulls_nodeto use as a loop cursor.headthe head of the list.
memberthe name of the hlist_nulls_node within the struct.
Description
Thebarrier() is needed to make sure compiler doesn’t cache first element [1],as this loop can be restarted [2][1] Documentation/memory-barriers.txt around line 1533[2]Using RCU hlist_nulls to protect list and objects around line 146
- hlist_nulls_for_each_entry_safe¶
hlist_nulls_for_each_entry_safe(tpos,pos,head,member)
iterate over list of given type safe against removal of list entry
Parameters
tposthe type * to use as a loop cursor.
posthe
structhlist_nulls_nodeto use as a loop cursor.headthe head of the list.
memberthe name of the hlist_nulls_node within the struct.
- boolrcu_sync_is_idle(structrcu_sync*rsp)¶
Are readers permitted to use their fastpaths?
Parameters
structrcu_sync*rspPointer to rcu_sync structure to use for synchronization
Description
Returns true if readers are permitted to use their fastpaths. Must beinvoked within some flavor of RCU read-side critical section.
- voidrcu_sync_init(structrcu_sync*rsp)¶
Initialize an rcu_sync structure
Parameters
structrcu_sync*rspPointer to rcu_sync structure to be initialized
- voidrcu_sync_func(structrcu_head*rhp)¶
Callback function managing reader access to fastpath
Parameters
structrcu_head*rhpPointer to rcu_head in rcu_sync structure to use for synchronization
Description
This function is passed tocall_rcu() function byrcu_sync_enter() andrcu_sync_exit(), so that it is invoked after a grace period following thethat invocation of enter/exit.
If it is called byrcu_sync_enter() it signals that all the readers wereswitched onto slow path.
If it is called byrcu_sync_exit() it takes action based on events thathave taken place in the meantime, so that closely spacedrcu_sync_enter()andrcu_sync_exit() pairs need not wait for a grace period.
If anotherrcu_sync_enter() is invoked before the grace periodended, reset state to allow the nextrcu_sync_exit() to let thereaders back onto their fastpaths (after a grace period). If bothanotherrcu_sync_enter() and its matchingrcu_sync_exit() are invokedbefore the grace period ended, re-invokecall_rcu() on behalf of thatrcu_sync_exit(). Otherwise, set all state back to idle so that readerscan again use their fastpaths.
- voidrcu_sync_enter(structrcu_sync*rsp)¶
Force readers onto slowpath
Parameters
structrcu_sync*rspPointer to rcu_sync structure to use for synchronization
Description
This function is used by updaters who need readers to make use ofa slowpath during the update. After this function returns, allsubsequent calls torcu_sync_is_idle() will return false, whichtells readers to stay off their fastpaths. A later call torcu_sync_exit() re-enables reader fastpaths.
When called in isolation,rcu_sync_enter() must wait for a graceperiod, however, closely spaced calls torcu_sync_enter() canoptimize away the grace-period wait via a state machine implementedbyrcu_sync_enter(),rcu_sync_exit(), andrcu_sync_func().
- voidrcu_sync_exit(structrcu_sync*rsp)¶
Allow readers back onto fast path after grace period
Parameters
structrcu_sync*rspPointer to rcu_sync structure to use for synchronization
Description
This function is used by updaters who have completed, and can thereforenow allow readers to make use of their fastpaths after a grace periodhas elapsed. After this grace period has completed, all subsequentcalls torcu_sync_is_idle() will return true, which tells readers thatthey can once again use their fastpaths.
- voidrcu_sync_dtor(structrcu_sync*rsp)¶
Clean up an rcu_sync structure
Parameters
structrcu_sync*rspPointer to rcu_sync structure to be cleaned up
- structrcu_tasks_percpu¶
Per-CPU component of definition for a Tasks-RCU-like mechanism.
Definition:
struct rcu_tasks_percpu { struct rcu_segcblist cblist; raw_spinlock_t lock; unsigned long rtp_jiffies; unsigned long rtp_n_lock_retries; struct timer_list lazy_timer; unsigned int urgent_gp; struct work_struct rtp_work; struct irq_work rtp_irq_work; struct rcu_head barrier_q_head; struct list_head rtp_blkd_tasks; struct list_head rtp_exit_list; int cpu; int index; struct rcu_tasks *rtpp;};Members
cblistCallback list.
lockLock protecting per-CPU callback list.
rtp_jiffiesJiffies counter value for statistics.
rtp_n_lock_retriesRough lock-contention statistic.
lazy_timerTimer to unlazify callbacks.
urgent_gpNumber of additional non-lazy grace periods.
rtp_workWork queue for invoking callbacks.
rtp_irq_workIRQ work queue for deferred wakeups.
barrier_q_headRCU callback for barrier operation.
rtp_blkd_tasksList of tasks blocked as readers.
rtp_exit_listList of tasks in the latter portion of
do_exit().cpuCPU number corresponding to this entry.
indexIndex of this CPU in rtpcp_array of the rcu_tasks structure.
rtppPointer to the rcu_tasks structure.
- structrcu_tasks¶
Definition for a Tasks-RCU-like mechanism.
Definition:
struct rcu_tasks { struct rcuwait cbs_wait; raw_spinlock_t cbs_gbl_lock; struct mutex tasks_gp_mutex; int gp_state; int gp_sleep; int init_fract; unsigned long gp_jiffies; unsigned long gp_start; unsigned long tasks_gp_seq; unsigned long n_ipis; unsigned long n_ipis_fails; struct task_struct *kthread_ptr; unsigned long lazy_jiffies; rcu_tasks_gp_func_t gp_func; pregp_func_t pregp_func; pertask_func_t pertask_func; postscan_func_t postscan_func; holdouts_func_t holdouts_func; postgp_func_t postgp_func; call_rcu_func_t call_func; unsigned int wait_state; struct rcu_tasks_percpu __percpu *rtpcpu; struct rcu_tasks_percpu **rtpcp_array; int percpu_enqueue_shift; int percpu_enqueue_lim; int percpu_dequeue_lim; unsigned long percpu_dequeue_gpseq; struct mutex barrier_q_mutex; atomic_t barrier_q_count; struct completion barrier_q_completion; unsigned long barrier_q_seq; unsigned long barrier_q_start; char *name; char *kname;};Members
cbs_waitRCU wait allowing a new callback to get kthread’s attention.
cbs_gbl_lockLock protecting callback list.
tasks_gp_mutexMutex protecting grace period, needed during mid-boot dead zone.
gp_stateGrace period’s most recent state transition (debugging).
gp_sleepPer-grace-period sleep to prevent CPU-bound looping.
init_fractInitial backoff sleep interval.
gp_jiffiesTime of lastgp_state transition.
gp_startMost recent grace-period start in jiffies.
tasks_gp_seqNumber of grace periods completed since boot in upper bits.
n_ipisNumber of IPIs sent to encourage grace periods to end.
n_ipis_failsNumber of IPI-send failures.
kthread_ptrThis flavor’s grace-period/callback-invocation kthread.
lazy_jiffiesNumber of jiffies to allow callbacks to be lazy.
gp_funcThis flavor’s grace-period-wait function.
pregp_funcThis flavor’s pre-grace-period function (optional).
pertask_funcThis flavor’s per-task scan function (optional).
postscan_funcThis flavor’s post-task scan function (optional).
holdouts_funcThis flavor’s holdout-list scan function (optional).
postgp_funcThis flavor’s post-grace-period function (optional).
call_funcThis flavor’s
call_rcu()-equivalent function.wait_stateTask state for synchronous grace-period waits (default TASK_UNINTERRUPTIBLE).
rtpcpuThis flavor’s rcu_tasks_percpu structure.
rtpcp_arrayArray of pointers to rcu_tasks_percpu structure of CPUs in cpu_possible_mask.
percpu_enqueue_shiftShift down CPU ID this much when enqueuing callbacks.
percpu_enqueue_limNumber of per-CPU callback queues in use for enqueuing.
percpu_dequeue_limNumber of per-CPU callback queues in use for dequeuing.
percpu_dequeue_gpseqRCU grace-period number to propagate enqueue limit to dequeuers.
barrier_q_mutexSerialize barrier operations.
barrier_q_countNumber of queues being waited on.
barrier_q_completionBarrier wait/wakeup mechanism.
barrier_q_seqSequence number for barrier operations.
barrier_q_startMost recent barrier start in jiffies.
nameThis flavor’s textual name.
knameThis flavor’s kthread name.
- voidcall_rcu_tasks(structrcu_head*rhp,rcu_callback_tfunc)¶
Queue an RCU for invocation task-based grace period
Parameters
structrcu_head*rhpstructure to be used for queueing the RCU updates.
rcu_callback_tfuncactual callback function to be invoked after the grace period
Description
The callback function will be invoked some time after a full graceperiod elapses, in other words after all currently executing RCUread-side critical sections have completed.call_rcu_tasks() assumesthat the read-side critical sections end at a voluntary contextswitch (not a preemption!),cond_resched_tasks_rcu_qs(), entry into idle,or transition to usermode execution. As such, there are no read-sideprimitives analogous torcu_read_lock() andrcu_read_unlock() becausethis primitive is intended to determine that all tasks have passedthrough a safe state, not so much for data-structure synchronization.
See the description ofcall_rcu() for more detailed information onmemory ordering guarantees.
- voidsynchronize_rcu_tasks(void)¶
wait until an rcu-tasks grace period has elapsed.
Parameters
voidno arguments
Description
Control will return to the caller some time after a full rcu-tasksgrace period has elapsed, in other words after all currentlyexecuting rcu-tasks read-side critical sections have elapsed. Theseread-side critical sections are delimited by calls toschedule(),cond_resched_tasks_rcu_qs(), idle execution, userspace execution, callstosynchronize_rcu_tasks(), and (in theory, anyway)cond_resched().
This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of functionpreambles and profiling hooks. Thesynchronize_rcu_tasks() functionis not (yet) intended for heavy use from multiple CPUs.
See the description ofsynchronize_rcu() for more detailed informationon memory ordering guarantees.
- voidrcu_barrier_tasks(void)¶
Wait for in-flight
call_rcu_tasks()callbacks.
Parameters
voidno arguments
Description
Although the current implementation is guaranteed to wait, it is notobligated to, for example, if there are no pending callbacks.
- voidsynchronize_rcu_tasks_rude(void)¶
wait for a rude rcu-tasks grace period
Parameters
voidno arguments
Description
Control will return to the caller some time after a rude rcu-tasksgrace period has elapsed, in other words after all currentlyexecuting rcu-tasks read-side critical sections have elapsed. Theseread-side critical sections are delimited by calls toschedule(),cond_resched_tasks_rcu_qs(), userspace execution (which is a schedulablecontext), and (in theory, anyway)cond_resched().
This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of function preamblesand profiling hooks. Thesynchronize_rcu_tasks_rude() function is not(yet) intended for heavy use from multiple CPUs.
See the description ofsynchronize_rcu() for more detailed informationon memory ordering guarantees.
- voidcall_rcu_tasks_trace(structrcu_head*rhp,rcu_callback_tfunc)¶
Queue a callback trace task-based grace period
Parameters
structrcu_head*rhpstructure to be used for queueing the RCU updates.
rcu_callback_tfuncactual callback function to be invoked after the grace period
Description
The callback function will be invoked some time after a trace rcu-tasksgrace period elapses, in other words after all currently executingtrace rcu-tasks read-side critical sections have completed. Theseread-side critical sections are delimited by calls torcu_read_lock_trace()andrcu_read_unlock_trace().
See the description ofcall_rcu() for more detailed information onmemory ordering guarantees.
- voidsynchronize_rcu_tasks_trace(void)¶
wait for a trace rcu-tasks grace period
Parameters
voidno arguments
Description
Control will return to the caller some time after a trace rcu-tasksgrace period has elapsed, in other words after all currently executingtrace rcu-tasks read-side critical sections have elapsed. These read-sidecritical sections are delimited by calls torcu_read_lock_trace()andrcu_read_unlock_trace().
This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of function preamblesand profiling hooks. Thesynchronize_rcu_tasks_trace() function is not(yet) intended for heavy use from multiple CPUs.
See the description ofsynchronize_rcu() for more detailed informationon memory ordering guarantees.
- voidrcu_barrier_tasks_trace(void)¶
Wait for in-flight
call_rcu_tasks_trace()callbacks.
Parameters
voidno arguments
Description
Although the current implementation is guaranteed to wait, it is notobligated to, for example, if there are no pending callbacks.
- voidrcu_cpu_stall_reset(void)¶
restart stall-warning timeout for current grace period
Parameters
voidno arguments
Description
To perform the reset request from the caller, disable stall detection until3 fqs loops have passed. This is required to ensure a fresh jiffies isloaded. It should be safe to do from the fqs loop as enough timerinterrupts and context switches should have passed.
The caller must disable hard irqs.
- intrcu_stall_chain_notifier_register(structnotifier_block*n)¶
Add an RCU CPU stall notifier
Parameters
structnotifier_block*nEntry to add.
Description
Adds an RCU CPU stall notifier to an atomic notifier chain.Theaction passed to a notifier will beRCU_STALL_NOTIFY_NORM orfriends. Thedata will be the duration of the stalled grace period,in jiffies, coerced to a void* pointer.
Returns 0 on success,-EEXIST on error.
- intrcu_stall_chain_notifier_unregister(structnotifier_block*n)¶
Remove an RCU CPU stall notifier
Parameters
structnotifier_block*nEntry to add.
Description
Removes an RCU CPU stall notifier from an atomic notifier chain.
Returns zero on success,-ENOENT on failure.
- voidrcu_read_lock_trace(void)¶
mark beginning of RCU-trace read-side critical section
Parameters
voidno arguments
Description
Whensynchronize_rcu_tasks_trace() is invoked by one task, then thattask is guaranteed to block until all other tasks exit their read-sidecritical sections. Similarly, ifcall_rcu_trace() is invoked on onetask while other tasks are within RCU read-side critical sections,invocation of the corresponding RCU callback is deferred until afterthe all the other tasks exit their critical sections.
For more details, please see the documentation forrcu_read_lock().
- voidrcu_read_unlock_trace(void)¶
mark end of RCU-trace read-side critical section
Parameters
voidno arguments
Description
Pairs with a preceding call torcu_read_lock_trace(), and nesting isallowed. Invoking arcu_read_unlock_trace() when there is no matchingrcu_read_lock_trace() is verboten, and will result in lockdep complaints.
For more details, please see the documentation forrcu_read_unlock().
- synchronize_rcu_mult¶
synchronize_rcu_mult(...)
Wait concurrently for multiple grace periods
Parameters
...List of
call_rcu()functions for different grace periods to wait on
Description
This macro waits concurrently for multiple types of RCU grace periods.For example, synchronize_rcu_mult(call_rcu, call_rcu_tasks) would waiton concurrent RCU and RCU-tasks grace periods. Waiting on a given SRCUdomain requires you to write a wrapper function for that SRCU domain’scall_srcu() function, with this wrapper supplying the pointer to thecorresponding srcu_struct.
Note thatcall_rcu_hurry() should be used instead ofcall_rcu()because in kernels built with CONFIG_RCU_LAZY=y the delay between theinvocation ofcall_rcu() and that of the corresponding RCU callbackcan be multiple seconds.
The first argument tells Tiny RCU’s_wait_rcu_gp() not tobother waiting for RCU. The reason for this is because anywheresynchronize_rcu_mult() can be called is automatically already a fullgrace period.
- voidrcuref_init(rcuref_t*ref,unsignedintcnt)¶
Initialize a rcuref reference count with the given reference count
Parameters
rcuref_t*refPointer to the reference count
unsignedintcntThe initial reference count typically ‘1’
- unsignedintrcuref_read(rcuref_t*ref)¶
Read the number of held reference counts of a rcuref
Parameters
rcuref_t*refPointer to the reference count
Return
The number of held references (0 ... N). The value 0 does notindicate that it is safe to schedule the object, protected by this referencecounter, for deconstruction.If you want to know if the reference counter has been marked DEAD (assignaled byrcuref_put()) please usercuread_is_dead().
- boolrcuref_is_dead(rcuref_t*ref)¶
Check if the rcuref has been already marked dead
Parameters
rcuref_t*refPointer to the reference count
Return
True if the object has been marked DEAD. This signals that a previousinvocation ofrcuref_put() returned true on this reference counter meaningthe protected object can safely be scheduled for deconstruction.Otherwise, returns false.
- boolrcuref_get(rcuref_t*ref)¶
Acquire one reference on a rcuref reference count
Parameters
rcuref_t*refPointer to the reference count
Description
Similar toatomic_inc_not_zero() but saturates at RCUREF_MAXREF.
Provides no memory ordering, it is assumed the caller has guaranteed theobject memory to be stable (RCU, etc.). It does provide a control dependencyand thereby orders future stores. See documentation in lib/rcuref.c
True if a reference was successfully acquired
Return
False if the attempt to acquire a reference failed. This happenswhen the last reference has been put already
- boolrcuref_put_rcusafe(rcuref_t*ref)¶
Release one reference for a rcuref reference count RCU safe
Parameters
rcuref_t*refPointer to the reference count
Description
Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such thatfree()must come after.
Can be invoked from contexts, which guarantee that no grace period canhappen which would free the object concurrently if the decrement dropsthe last reference and the slowpath races against a concurrentget() andput() pair.rcu_read_lock()’ed and atomic contexts qualify.
False if there are still active references or the
put()racedwith a concurrentget()/put()pair. Caller is not allowed torelease the protected object.
Return
True if this was the last reference with no future referencespossible. This signals the caller that it can safely release theobject which is protected by the reference counter.
- boolrcuref_put(rcuref_t*ref)¶
Release one reference for a rcuref reference count
Parameters
rcuref_t*refPointer to the reference count
Description
Can be invoked from any context.
Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such thatfree()must come after.
True if this was the last reference with no future referencespossible. This signals the caller that it can safely schedule theobject, which is protected by the reference counter, fordeconstruction.
False if there are still active references or the
put()racedwith a concurrentget()/put()pair. Caller is not allowed todeconstruct the protected object.
- boolsame_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp1,structrcu_gp_oldstate*rgosp2)¶
Are two old-state values identical?
Parameters
structrcu_gp_oldstate*rgosp1First old-state value.
structrcu_gp_oldstate*rgosp2Second old-state value.
Description
The two old-state values must have been obtained from eitherget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(),orget_completed_synchronize_rcu_full(). Returnstrue if the twovalues are identical andfalse otherwise. This allows structureswhose lifetimes are tracked by old-state values to push these valuesto a list header, allowing those structures to be slightly smaller.
Note that equality is judged on a bitwise basis, so that anrcu_gp_oldstate structure with an already-completed state in one fieldwill compare not-equal to a structure with an already-completed statein the other field. After all, thercu_gp_oldstate structure is opaqueso how did such a situation come to pass in the first place?