English

Chinese (Simplified)

The Linux Kernel API¶

Basic C Library Functions¶

When writing drivers, you cannot in general use routines which are fromthe C Library. Some of the functions have been found generally usefuland they are listed below. The behaviour of these functions may varyslightly from those defined by ANSI, and these deviations are noted inthe text.

String Conversions¶

unsignedlonglongsimple_strtoull(constchar*cp,char**endp,unsignedintbase)¶: convert a string to an unsigned long long

Parameters

constchar*cp: The start of the string
char**endp: A pointer to the end of the parsed string will be placed here
unsignedintbase: The number base to use

Description

This function has caveats. Please use kstrtoull instead.

unsignedlongsimple_strtoul(constchar*cp,char**endp,unsignedintbase)¶: convert a string to an unsigned long

Parameters

constchar*cp: The start of the string
char**endp: A pointer to the end of the parsed string will be placed here
unsignedintbase: The number base to use

Description

This function has caveats. Please use kstrtoul instead.

longsimple_strtol(constchar*cp,char**endp,unsignedintbase)¶: convert a string to a signed long

Parameters

constchar*cp: The start of the string
char**endp: A pointer to the end of the parsed string will be placed here
unsignedintbase: The number base to use

Description

This function has caveats. Please use kstrtol instead.

longlongsimple_strtoll(constchar*cp,char**endp,unsignedintbase)¶: convert a string to a signed long long

Parameters

constchar*cp: The start of the string
char**endp: A pointer to the end of the parsed string will be placed here
unsignedintbase: The number base to use

Description

This function has caveats. Please use kstrtoll instead.

intvsnprintf(char*buf,size_tsize,constchar*fmt_str,va_listargs)¶: Format a string and place it in a buffer

Parameters

char*buf: The buffer to place the result into
size_tsize: The size of the buffer, including the trailing null space
constchar*fmt_str: The format string to use
va_listargs: Arguments for the format string

Description

This function generally follows C99 vsnprintf, but has someextensions and a few limitations:

``n`` is unsupported
``p*`` is handled bypointer()

Seepointer() orHow to get printk format specifiers right for moreextensive description.

Please update the documentation in both places when making changes

The return value is the number of characters which wouldbe generated for the given input, excluding the trailing‘0’, as per ISO C99. If you want to have the exactnumber of characters written intobuf as return value(not including the trailing ‘0’), usevscnprintf(). If thereturn is greater than or equal tosize, the resultingstring is truncated.

If you’re not already dealing with a va_list consider usingsnprintf().

intvscnprintf(char*buf,size_tsize,constchar*fmt,va_listargs)¶: Format a string and place it in a buffer

Parameters

char*buf: The buffer to place the result into
size_tsize: The size of the buffer, including the trailing null space
constchar*fmt: The format string to use
va_listargs: Arguments for the format string

Description

The return value is the number of characters which have been written intothebuf not including the trailing ‘0’. Ifsize is == 0 the functionreturns 0.

If you’re not already dealing with a va_list consider usingscnprintf().

See thevsnprintf() documentation for format string extensions over C99.

intsnprintf(char*buf,size_tsize,constchar*fmt,...)¶: Format a string and place it in a buffer

Parameters

char*buf: The buffer to place the result into
size_tsize: The size of the buffer, including the trailing null space
constchar*fmt: The format string to use
...: Arguments for the format string

Description

The return value is the number of characters which would begenerated for the given input, excluding the trailing null,as per ISO C99. If the return is greater than or equal tosize, the resulting string is truncated.

See thevsnprintf() documentation for format string extensions over C99.

intscnprintf(char*buf,size_tsize,constchar*fmt,...)¶: Format a string and place it in a buffer

Parameters

char*buf: The buffer to place the result into
size_tsize: The size of the buffer, including the trailing null space
constchar*fmt: The format string to use
...: Arguments for the format string

Description

The return value is the number of characters written intobuf not includingthe trailing ‘0’. Ifsize is == 0 the function returns 0.

intvsprintf(char*buf,constchar*fmt,va_listargs)¶: Format a string and place it in a buffer

Parameters

char*buf: The buffer to place the result into
constchar*fmt: The format string to use
va_listargs: Arguments for the format string

Description

The return value is the number of characters written intobuf not includingthe trailing ‘0’. Usevsnprintf() orvscnprintf() in order to avoidbuffer overflows.

If you’re not already dealing with a va_list consider usingsprintf().

See thevsnprintf() documentation for format string extensions over C99.

intsprintf(char*buf,constchar*fmt,...)¶: Format a string and place it in a buffer

Parameters

char*buf: The buffer to place the result into
constchar*fmt: The format string to use
...: Arguments for the format string

Description

The return value is the number of characters written intobuf not includingthe trailing ‘0’. Usesnprintf() orscnprintf() in order to avoidbuffer overflows.

See thevsnprintf() documentation for format string extensions over C99.

intvbin_printf(u32*bin_buf,size_tsize,constchar*fmt_str,va_listargs)¶: Parse a format string and place args’ binary value in a buffer

Parameters

u32*bin_buf: The buffer to place args’ binary value
size_tsize: The size of the buffer(by words(32bits), not characters)
constchar*fmt_str: The format string to use
va_listargs: Arguments for the format string

Description

The format follows C99 vsnprintf, exceptn is ignored, and its argumentis skipped.

The return value is the number of words(32bits) which would be generated forthe given input.

NOTE

If the return value is greater thansize, the resulting bin_buf is NOTvalid forbstr_printf().

intbstr_printf(char*buf,size_tsize,constchar*fmt_str,constu32*bin_buf)¶: Format a string from binary arguments and place it in a buffer

Parameters

char*buf: The buffer to place the result into
size_tsize: The size of the buffer, including the trailing null space
constchar*fmt_str: The format string to use
constu32*bin_buf: Binary arguments for the format string

Description

This function like C99 vsnprintf, but the difference is that vsnprintf getsarguments from stack, and bstr_printf gets arguments frombin_buf which isa binary buffer that generated by vbin_printf.

The format follows C99 vsnprintf, but has some extensions:: see vsnprintf comment for details.

intvsscanf(constchar*buf,constchar*fmt,va_listargs)¶: Unformat a buffer into a list of arguments

Parameters

constchar*buf: input buffer
constchar*fmt: format of buffer
va_listargs: arguments

intsscanf(constchar*buf,constchar*fmt,...)¶: Unformat a buffer into a list of arguments

Parameters

constchar*buf: input buffer
constchar*fmt: formatting of buffer
...: resulting arguments

intkstrtoul(constchar*s,unsignedintbase,unsignedlong*res)¶: convert a string to an unsigned long

Parameters

constchar*s: The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbase: The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedlong*res: Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoul(). Return code must be checked.

intkstrtol(constchar*s,unsignedintbase,long*res)¶: convert a string to a long

Parameters

constchar*s: The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbase: The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
long*res: Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtol(). Return code must be checked.

intkstrtoull(constchar*s,unsignedintbase,unsignedlonglong*res)¶: convert a string to an unsigned long long

Parameters

constchar*s: The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbase: The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedlonglong*res: Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoull(). Return code must be checked.

intkstrtoll(constchar*s,unsignedintbase,longlong*res)¶: convert a string to a long long

Parameters

constchar*s: The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbase: The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
longlong*res: Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoll(). Return code must be checked.

intkstrtouint(constchar*s,unsignedintbase,unsignedint*res)¶: convert a string to an unsigned int

Parameters

constchar*s: The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbase: The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedint*res: Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoul(). Return code must be checked.

intkstrtoint(constchar*s,unsignedintbase,int*res)¶: convert a string to an int

Parameters

constchar*s: The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbase: The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
int*res: Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtol(). Return code must be checked.

intkstrtobool(constchar*s,bool*res)¶: convert common user inputs into boolean values

Parameters

constchar*s: input string
bool*res: result

Description

This routine returns 0 iff the first character is one of ‘YyTt1NnFf0’, or[oO][NnFf] for “on” and “off”. Otherwise it will return -EINVAL. Valuepointed to by res is updated upon finding a match.

intstring_get_size(u64size,u64blk_size,constenumstring_size_unitsunits,char*buf,intlen)¶: get the size in the specified units

Parameters

u64size: The size to be converted in blocks
u64blk_size: Size of the block (use 1 for size in bytes)
constenumstring_size_unitsunits: Units to use (powers of 1000 or 1024), whether to include space separator
char*buf: buffer to format to
intlen: length of buffer

Description

This function returns a string formatted to 3 significant figuresgiving the size in the required units.buf should have room forat least 9 bytes and will always be zero terminated.

Return value: number of characters of output that would have been written(which may be greater than len, if output was truncated).

intparse_int_array_user(constchar__user*from,size_tcount,int**array)¶: Split string into a sequence of integers

Parameters

constchar__user*from: The user space buffer to read from
size_tcount: The maximum number of bytes to read
int**array: Returned pointer to sequence of integers

Description

On successarray is allocated and initialized with a sequence ofintegers extracted from thefrom plus an additional element thatbegins the sequence and specifies the integers count.

Caller takes responsibility for freeingarray when it is no longerneeded.

intstring_unescape(char*src,char*dst,size_tsize,unsignedintflags)¶: unquote characters in the given string

Parameters

char*src: source buffer (escaped)
char*dst: destination buffer (unescaped)
size_tsize: size of the destination buffer (0 to unlimit)
unsignedintflags: combination of the flags.

Description

The function unquotes characters in the given string.

Because the size of the output will be the same as or less than the size ofthe input, the transformation may be performed in place.

Caller must provide valid source and destination pointers. Be aware thatdestination buffer will always be NULL-terminated. Source string must beNULL-terminated as well. The supported flags are:

UNESCAPE_SPACE:        '\f' - form feed        '\n' - new line        '\r' - carriage return        '\t' - horizontal tab        '\v' - vertical tabUNESCAPE_OCTAL:        '\NNN' - byte with octal value NNN (1 to 3 digits)UNESCAPE_HEX:        '\xHH' - byte with hexadecimal value HH (1 to 2 digits)UNESCAPE_SPECIAL:        '\"' - double quote        '\\' - backslash        '\a' - alert (BEL)        '\e' - escapeUNESCAPE_ANY:        all previous together

Return

The amount of the characters processed to the destination buffer excludingtrailing ‘0’ is returned.

intstring_escape_mem(constchar*src,size_tisz,char*dst,size_tosz,unsignedintflags,constchar*only)¶: quote characters in the given memory buffer

Parameters

constchar*src: source buffer (unescaped)
size_tisz: source buffer size
char*dst: destination buffer (escaped)
size_tosz: destination buffer size
unsignedintflags: combination of the flags
constchar*only: NULL-terminated string containing characters used to limitthe selected escape class. If characters are included inonlythat would not normally be escaped by the classes selectedinflags, they will be copied todst unescaped.

Description

The process of escaping byte buffer includes several parts. They are appliedin the following sequence.

The character is not matched to the one fromonly string and thusmust go as-is to the output.
The character is matched to the printable and ASCII classes, if asked,and in case of match it passes through to the output.
The character is matched to the printable or ASCII class, if asked,and in case of match it passes through to the output.
The character is checked if it falls into the class given byflags.ESCAPE_OCTAL andESCAPE_HEX are going last since they cover anycharacter. Note that they actually can’t go together, otherwiseESCAPE_HEX will be ignored.

Caller must provide valid source and destination pointers. Be aware thatdestination buffer will not be NULL-terminated, thus caller have to appendit if needs. The supported flags are:

%ESCAPE_SPACE: (special white space, not space itself)        '\f' - form feed        '\n' - new line        '\r' - carriage return        '\t' - horizontal tab        '\v' - vertical tab%ESCAPE_SPECIAL:        '\"' - double quote        '\\' - backslash        '\a' - alert (BEL)        '\e' - escape%ESCAPE_NULL:        '\0' - null%ESCAPE_OCTAL:        '\NNN' - byte with octal value NNN (3 digits)%ESCAPE_ANY:        all previous together%ESCAPE_NP:        escape only non-printable characters, checked by isprint()%ESCAPE_ANY_NP:        all previous together%ESCAPE_HEX:        '\xHH' - byte with hexadecimal value HH (2 digits)%ESCAPE_NA:        escape only non-ascii characters, checked by isascii()%ESCAPE_NAP:        escape only non-printable or non-ascii characters%ESCAPE_APPEND:        append characters from @only to be escaped by the given classes

ESCAPE_APPEND would help to pass additional characters to the escaped, whenone ofESCAPE_NP,ESCAPE_NA, orESCAPE_NAP is provided.

One notable caveat, theESCAPE_NAP,ESCAPE_NP andESCAPE_NA have thehigher priority than the rest of the flags (ESCAPE_NAP is the highest).It doesn’t make much sense to use either of them withoutESCAPE_OCTALorESCAPE_HEX, because they cover most of the other character classes.ESCAPE_NAP can utilizeESCAPE_SPACE orESCAPE_SPECIAL in addition tothe above.

Return

The total size of the escaped output that would be generated forthe given input and flags. To check whether the output wastruncated, compare the return value to osz. There is room left indst for a ‘0’ terminator if and only if ret < osz.

char**kasprintf_strarray(gfp_tgfp,constchar*prefix,size_tn)¶: allocate and fill array of sequential strings

Parameters

gfp_tgfp: flags for the slab allocator
constchar*prefix: prefix to be used
size_tn: amount of lines to be allocated and filled

Description

Allocates and fillsn strings using pattern “s-````zu”, where prefixis provided by caller. The caller is responsible to free them withkfree_strarray() after use.

Returns array of strings or NULL when memory can’t be allocated.

voidkfree_strarray(char**array,size_tn)¶: free a number of dynamically allocated strings contained in an array and the array itself

Parameters

char**array: Dynamically allocated array of strings to free.
size_tn: Number of strings (starting from the beginning of the array) to free.

Description

Passing a non-NULLarray andn == 0 as well as NULLarray are validuse-cases. Ifarray is NULL, the function does nothing.

char*skip_spaces(constchar*str)¶: Removes leading whitespace fromstr.

Parameters

constchar*str: The string to be stripped.

Description

Returns a pointer to the first non-whitespace character instr.

char*strim(char*s)¶: Removes leading and trailing whitespace froms.

Parameters

char*s: The string to be stripped.

Description

Note that the first trailing whitespace is replaced with aNUL-terminatorin the given strings. Returns a pointer to the first non-whitespacecharacter ins.

boolsysfs_streq(constchar*s1,constchar*s2)¶: return true if strings are equal, modulo trailing newline

Parameters

constchar*s1: one string
constchar*s2: another string

Description

This routine returns true iff two strings are equal, treating bothNUL and newline-then-NUL as equivalent string terminations. It’sgeared for use with sysfs input strings, which generally terminatewith newlines but are compared against values without newlines.

intmatch_string(constchar*const*array,size_tn,constchar*string)¶: matches given string in an array

Parameters

constchar*const*array: array of strings
size_tn: number of strings in the array or -1 for NULL terminated arrays
constchar*string: string to match with

Description

This routine will look for a string in an array of strings up to then-th element in the array or until the first NULL element.

Historically the value of -1 forn, was used to search in arrays thatare NULL terminated. However, the function does not make a distinctionwhen finishing the search: eithern elements have been compared ORthe first NULL element was found.

Return

index of astring in thearray if matches, or-EINVAL otherwise.

int__sysfs_match_string(constchar*const*array,size_tn,constchar*str)¶: matches given string in an array

Parameters

constchar*const*array: array of strings
size_tn: number of strings in the array or -1 for NULL terminated arrays
constchar*str: string to match with

Description

Returns index ofstr in thearray or -EINVAL, just likematch_string().Uses sysfs_streq instead of strcmp for matching.

This routine will look for a string in an array of strings up to then-th element in the array or until the first NULL element.

char*strreplace(char*str,charold,charnew)¶: Replace all occurrences of character in string.

Parameters

char*str: The string to operate on.
charold: The character being replaced.
charnew: The characterold is replaced with.

Description

Replaces the eachold character with anew one in the given stringstr.

Return

pointer to the stringstr itself.

voidmemcpy_and_pad(void*dest,size_tdest_len,constvoid*src,size_tcount,intpad)¶: Copy one buffer to another with padding

Parameters

void*dest: Where to copy to
size_tdest_len: The destination buffer size
constvoid*src: Where to copy from
size_tcount: The number of bytes to copy
intpad: Character to use for padding if space is left in destination.

String Manipulation¶

unsafe_memcpy¶

unsafe_memcpy(dst,src,bytes,justification)

memcpy implementation with no FORTIFY bounds checking

Parameters

dst: Destination memory address to write to
src: Source memory address to read from
bytes: How many bytes to write todst fromsrc
justification: Free-form text or comment describing why the use is needed

Description

This should be used for corner cases where the compiler cannot do theright thing, or during transitions between APIs, etc. It should be usedvery rarely, and includes a place for justification detailing where boundschecking has happened, and why existing solutions cannot be employed.

char*strncpy(char*constp,constchar*q,__kernel_size_tsize)¶: Copy a string to memory with non-guaranteed NUL padding

Parameters

char*constp: pointer to destination of copy
constchar*q: pointer to NUL-terminated source string to copy
__kernel_size_tsize: bytes to write atp

Description

If strlen(q) >=size, the copy ofq will stop aftersize bytes,andp will NOT be NUL-terminated

If strlen(q) <size, following the copy ofq, trailing NUL byteswill be written top untilsize total bytes have been written.

Do not use this function. While FORTIFY_SOURCE tries to avoidover-reads ofq, it cannot defend against writing unterminatedresults top. Usingstrncpy() remains ambiguous and fragile.Instead, please choose an alternative, so that the expectationofp’s contents is unambiguous:

p needs to be:	padded tosize	not padded
NUL-terminated	`strscpy_pad()`	`strscpy()`
not NUL-terminated	`strtomem_pad()`	`strtomem()`

Note strscpy*()’s differing return values for detecting truncation,and strtomem*()’s expectation that the destination is marked with__nonstring when it is a character array.

__kernel_size_tstrnlen(constchar*constp,__kernel_size_tmaxlen)¶: Return bounded count of characters in a NUL-terminated string

Parameters

constchar*constp: pointer to NUL-terminated string to count.
__kernel_size_tmaxlen: maximum number of characters to count.

Description

Returns number of characters inp (NOT including the final NUL), ormaxlen, if no NUL has been found up to there.

strlen¶

strlen(p)

Return count of characters in a NUL-terminated string

Parameters

p: pointer to NUL-terminated string to count.

Description

Do not use this function unless the string length is known atcompile-time. Whenp is unterminated, this function may crashor return unexpected counts that could lead to memory contentexposures. Preferstrnlen().

Returns number of characters inp (NOT including the final NUL).

size_tstrlcat(char*constp,constchar*constq,size_tavail)¶: Append a string to an existing string

Parameters

char*constp: pointer toNUL-terminated string to append to
constchar*constq: pointer toNUL-terminated string to append from
size_tavail: Maximum bytes available inp

Description

AppendsNUL-terminated stringq after theNUL-terminatedstring atp, but will not write beyondavail bytes total,potentially truncating the copy fromq.p will stayNUL-terminated only if aNUL already existed withintheavail bytes ofp. If so, the resulting number ofbytes copied fromq will be at most “avail - strlen(p) - 1”.

Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when the sizesofp andq are known to the compiler. Prefer building thestring with formatting, viascnprintf(), seq_buf, or similar.

Returns total bytes that _would_ have been contained bypregardless of truncation, similar tosnprintf(). If returnvalue is >=avail, the string has been truncated.

char*strcat(char*constp,constchar*q)¶: Append a string to an existing string

Parameters

char*constp: pointer to NUL-terminated string to append to
constchar*q: pointer to NUL-terminated source string to append from

Description

Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when thedestination buffer size is known to the compiler. Preferbuilding the string with formatting, viascnprintf() or similar.At the very least, usestrncat().

Returnsp.

char*strncat(char*constp,constchar*constq,__kernel_size_tcount)¶: Append a string to an existing string

Parameters

char*constp: pointer to NUL-terminated string to append to
constchar*constq: pointer to source string to append from
__kernel_size_tcount: Maximum bytes to read fromq

Description

Appends at mostcount bytes fromq (stopping at the firstNUL byte) after the NUL-terminated string atp.p will beNUL-terminated.

Returnsp.

char*strcpy(char*constp,constchar*constq)¶: Copy a string into another string buffer

Parameters

char*constp: pointer to destination of copy
constchar*constq: pointer to NUL-terminated source string to copy

Description

Do not use this function. While FORTIFY_SOURCE tries to avoidoverflows, this is only possible when the sizes ofq andp areknown to the compiler. Preferstrscpy(), though note its differentreturn values for detecting truncation.

Returnsp.

intstrncasecmp(constchar*s1,constchar*s2,size_tlen)¶: Case insensitive, length-limited string comparison

Parameters

constchar*s1: One string
constchar*s2: The other string
size_tlen: the maximum number of characters to compare

char*stpcpy(char*__restrict__dest,constchar*__restrict__src)¶: copy a string from src to dest returning a pointer to the new end of dest, including src’sNUL-terminator. May overrun dest.

Parameters

char*__restrict__dest: pointer to end of string being copied into. Must be large enoughto receive copy.
constchar*__restrict__src: pointer to the beginning of string being copied from. Must not overlapdest.

Description

stpcpy differs from strcpy in a key way: the return value is a pointerto the newNUL-terminating character indest. (For strcpy, the returnvalue is a pointer to the start ofdest). This interface is consideredunsafe as it doesn’t perform bounds checking of the inputs. As such it’snot recommended for usage. Instead, its definition is provided in casethe compiler lowers other libcalls to stpcpy.

intstrcmp(constchar*cs,constchar*ct)¶: Compare two strings

Parameters

constchar*cs: One string
constchar*ct: Another string

intstrncmp(constchar*cs,constchar*ct,size_tcount)¶: Compare two length-limited strings

Parameters

constchar*cs: One string
constchar*ct: Another string
size_tcount: The maximum number of bytes to compare

char*strchr(constchar*s,intc)¶: Find the first occurrence of a character in a string

Parameters

constchar*s: The string to be searched
intc: The character to search for

Description

Note that theNUL-terminator is considered part of the string, and canbe searched for.

char*strchrnul(constchar*s,intc)¶: Find and return a character in a string, or end of string

Parameters

constchar*s: The string to be searched
intc: The character to search for

Description

Returns pointer to first occurrence of ‘c’ in s. If c is not found, thenreturn a pointer to the null byte at the end of s.

char*strrchr(constchar*s,intc)¶: Find the last occurrence of a character in a string

Parameters

constchar*s: The string to be searched
intc: The character to search for

char*strnchr(constchar*s,size_tcount,intc)¶: Find a character in a length limited string

Parameters

constchar*s: The string to be searched
size_tcount: The number of characters to be searched
intc: The character to search for

Description

Note that theNUL-terminator is considered part of the string, and canbe searched for.

size_tstrspn(constchar*s,constchar*accept)¶: Calculate the length of the initial substring ofs which only contain letters inaccept

Parameters

constchar*s: The string to be searched
constchar*accept: The string to search for

size_tstrcspn(constchar*s,constchar*reject)¶: Calculate the length of the initial substring ofs which does not contain letters inreject

Parameters

constchar*s: The string to be searched
constchar*reject: The string to avoid

char*strpbrk(constchar*cs,constchar*ct)¶: Find the first occurrence of a set of characters

Parameters

constchar*cs: The string to be searched
constchar*ct: The characters to search for

char*strsep(char**s,constchar*ct)¶: Split a string into tokens

Parameters

char**s: The string to be searched
constchar*ct: The characters to search for

Description

strsep() updatess to point after the token, ready for the next call.

It returns empty tokens, too, behaving exactly like the libc functionof that name. In fact, it was stolen from glibc2 and de-fancy-fied.Same semantics, slimmer shape. ;)

void*memset(void*s,intc,size_tcount)¶: Fill a region of memory with the given value

Parameters

void*s: Pointer to the start of the area.
intc: The byte to fill the area with
size_tcount: The size of the area.

Description

Do not usememset() to access IO space, usememset_io() instead.

void*memset16(uint16_t*s,uint16_tv,size_tcount)¶: Fill a memory area with a uint16_t

Parameters

uint16_t*s: Pointer to the start of the area.
uint16_tv: The value to fill the area with
size_tcount: The number of values to store

Description

Differs frommemset() in that it fills with a uint16_t insteadof a byte. Remember thatcount is the number of uint16_ts tostore, not the number of bytes.

void*memset32(uint32_t*s,uint32_tv,size_tcount)¶: Fill a memory area with a uint32_t

Parameters

uint32_t*s: Pointer to the start of the area.
uint32_tv: The value to fill the area with
size_tcount: The number of values to store

Description

Differs frommemset() in that it fills with a uint32_t insteadof a byte. Remember thatcount is the number of uint32_ts tostore, not the number of bytes.

void*memset64(uint64_t*s,uint64_tv,size_tcount)¶: Fill a memory area with a uint64_t

Parameters

uint64_t*s: Pointer to the start of the area.
uint64_tv: The value to fill the area with
size_tcount: The number of values to store

Description

Differs frommemset() in that it fills with a uint64_t insteadof a byte. Remember thatcount is the number of uint64_ts tostore, not the number of bytes.

void*memcpy(void*dest,constvoid*src,size_tcount)¶: Copy one area of memory to another

Parameters

void*dest: Where to copy to
constvoid*src: Where to copy from
size_tcount: The size of the area.

Description

You should not use this function to access IO space, usememcpy_toio()ormemcpy_fromio() instead.

void*memmove(void*dest,constvoid*src,size_tcount)¶: Copy one area of memory to another

Parameters

void*dest: Where to copy to
constvoid*src: Where to copy from
size_tcount: The size of the area.

Description

Unlikememcpy(),memmove() copes with overlapping areas.

__visibleintmemcmp(constvoid*cs,constvoid*ct,size_tcount)¶: Compare two areas of memory

Parameters

constvoid*cs: One area of memory
constvoid*ct: Another area of memory
size_tcount: The size of the area.

intbcmp(constvoid*a,constvoid*b,size_tlen)¶: returns 0 if and only if the buffers have identical contents.

Parameters

constvoid*a: pointer to first buffer.
constvoid*b: pointer to second buffer.
size_tlen: size of buffers.

Description

The sign or magnitude of a non-zero return value has no particularmeaning, and architectures may implement their own more efficientbcmp(). Sowhile this particular implementation is a simple (tail) call to memcmp, donot rely on anything but whether the return value is zero or non-zero.

void*memscan(void*addr,intc,size_tsize)¶: Find a character in an area of memory.

Parameters

void*addr: The memory area
intc: The byte to search for
size_tsize: The size of the area.

Description

returns the address of the first occurrence ofc, or 1 byte pastthe area ifc is not found

char*strstr(constchar*s1,constchar*s2)¶: Find the first substring in aNUL terminated string

Parameters

constchar*s1: The string to be searched
constchar*s2: The string to search for

char*strnstr(constchar*s1,constchar*s2,size_tlen)¶: Find the first substring in a length-limited string

Parameters

constchar*s1: The string to be searched
constchar*s2: The string to search for
size_tlen: the maximum number of characters to search

void*memchr(constvoid*s,intc,size_tn)¶: Find a character in an area of memory.

Parameters

constvoid*s: The memory area
intc: The byte to search for
size_tn: The size of the area.

Description

returns the address of the first occurrence ofc, orNULLifc is not found

void*memchr_inv(constvoid*start,intc,size_tbytes)¶: Find an unmatching character in an area of memory.

Parameters

constvoid*start: The memory area
intc: Find a character other than c
size_tbytes: The size of the area.

Description

returns the address of the first character other thanc, orNULLif the whole buffer contains justc.

void*memdup_array_user(constvoid__user*src,size_tn,size_tsize)¶: duplicate array from user space

Parameters

constvoid__user*src: source address in user space
size_tn: number of array members to copy
size_tsize: size of one array member

Return

anERR_PTR() on failure. Result is physicallycontiguous, to be freed bykfree().

void*vmemdup_array_user(constvoid__user*src,size_tn,size_tsize)¶: duplicate array from user space

Parameters

constvoid__user*src: source address in user space
size_tn: number of array members to copy
size_tsize: size of one array member

Return

anERR_PTR() on failure. Result may be notphysically contiguous. Usekvfree() to free.

strscpy¶

strscpy(dst,src,...)

Copy a C-string into a sized buffer

Parameters

dst: Where to copy the string to
src: Where to copy the string from
...: Size of destination buffer (optional)

Description

Copy the source stringsrc, or as much of it as fits, into thedestinationdst buffer. The behavior is undefined if the stringbuffers overlap. The destinationdst buffer is always NUL terminated,unless it’s zero-sized.

The size argument... is only required whendst is not an array, orwhen the copy needs to be smaller than sizeof(dst).

Preferred tostrncpy() since it always returns a valid string, anddoesn’t unnecessarily force the tail of the destination buffer to bezero padded. If padding is desired please usestrscpy_pad().

Returns the number of characters copied indst (not including thetrailingNUL) or -E2BIG ifsize is 0 or the copy fromsrc wastruncated.

strscpy_pad¶

strscpy_pad(dst,src,...)

Copy a C-string into a sized buffer

Parameters

dst: Where to copy the string to
src: Where to copy the string from
...: Size of destination buffer

Description

Copy the string, or as much of it as fits, into the dest buffer. Thebehavior is undefined if the string buffers overlap. The destinationbuffer is alwaysNUL terminated, unless it’s zero-sized.

If the source string is shorter than the destination buffer, theremaining bytes in the buffer will be filled withNUL bytes.

For full explanation of why you may want to consider using the‘strscpy’ functions please see the function docstring forstrscpy().

Return

The number of characters copied (not including the trailingNULs)
-E2BIG if count is 0 orsrc was truncated.

boolmem_is_zero(constvoid*s,size_tn)¶: Check if an area of memory is all 0’s.

Parameters

constvoid*s: The memory area
size_tn: The size of the area

Return

True if the area of memory is all 0’s.

sysfs_match_string¶

sysfs_match_string(_a,_s)

matches given string in an array

Parameters

_a: array of strings
_s: string to match with

Description

Helper for__sysfs_match_string(). Calculates the size ofa automatically.

voidmemzero_explicit(void*s,size_tcount)¶: Fill a region of memory (e.g. sensitive keying data) with 0s.

Parameters

void*s: Pointer to the start of the area.
size_tcount: The size of the area.

Note

usually usingmemset() is just fine (!), but in caseswhere clearing out _local_ data at the end of a scope isnecessary,memzero_explicit() should be used instead inorder to prevent the compiler from optimising away zeroing.

memzero_explicit() doesn’t need an arch-specific version asit just invokes the one ofmemset() implicitly.

constchar*kbasename(constchar*path)¶: return the last part of a pathname.

Parameters

constchar*path: path to extract the filename from.

Return

Pointer to the filename portion insidepath. If no ‘/’ exists,returnspath unchanged.

strtomem_pad¶

strtomem_pad(dest,src,pad)

Copy NUL-terminated string to non-NUL-terminated buffer

Parameters

dest: Pointer of destination character array (marked as __nonstring)
src: Pointer to NUL-terminated string
pad: Padding character to fill any remaining bytes ofdest after copy

Description

This is a replacement forstrncpy() uses where the destination is nota NUL-terminated string, but with bounds checking on the source size, andan explicit padding character. If padding is not required, usestrtomem().

Note that the size ofdest is not an argument, as the length ofdestmust be discoverable by the compiler.

strtomem¶

strtomem(dest,src)

Copy NUL-terminated string to non-NUL-terminated buffer

Parameters

dest: Pointer of destination character array (marked as __nonstring)
src: Pointer to NUL-terminated string

Description

This is a replacement forstrncpy() uses where the destination is nota NUL-terminated string, but with bounds checking on the source size, andwithout trailing padding. If padding is required, usestrtomem_pad().

Note that the size ofdest is not an argument, as the length ofdestmust be discoverable by the compiler.

memtostr¶

memtostr(dest,src)

Copy a possibly non-NUL-term string to a NUL-term string

Parameters

dest: Pointer to destination NUL-terminates string
src: Pointer to character array (likely marked as __nonstring)

Description

This is a replacement forstrncpy() uses where the source is nota NUL-terminated string.

Note that sizes ofdest andsrc must be known at compile-time.

memtostr_pad¶

memtostr_pad(dest,src)

Copy a possibly non-NUL-term string to a NUL-term string with NUL padding in the destination

Parameters

dest: Pointer to destination NUL-terminates string
src: Pointer to character array (likely marked as __nonstring)

Description

This is a replacement forstrncpy() uses where the source is nota NUL-terminated string.

Note that sizes ofdest andsrc must be known at compile-time.

memset_after¶

memset_after(obj,v,member)

Set a value after astructmember to the end of a struct

Parameters

obj: Address of targetstructinstance
v: Byte value to repeatedly write
member: after whichstructmember to start writing bytes

Description

This is good for clearing padding following the given member.

memset_startat¶

memset_startat(obj,v,member)

Set a value starting at a member to the end of a struct

Parameters

obj: Address of targetstructinstance
v: Byte value to repeatedly write
member: structmember to start writing at

Description

Note that if there is padding between the prior member and the targetmember,memset_after() should be used to clear the prior padding.

size_tstr_has_prefix(constchar*str,constchar*prefix)¶: Test if a string has a given prefix

Parameters

constchar*str: The string to test
constchar*prefix: The string to see ifstr starts with

Description

A common way to test a prefix of a string is to do:: strncmp(str, prefix, sizeof(prefix) - 1)

But this can lead to bugs due to typos, or if prefix is a pointerand not a constant. Instead usestr_has_prefix().

Return

strlen(prefix) ifstr starts withprefix
0 ifstr does not start withprefix

boolstrstarts(constchar*str,constchar*prefix)¶: doesstr start withprefix?

Parameters

constchar*str: string to examine
constchar*prefix: prefix to look for.

Return

True ifstr begins withprefix. False in all other cases.

boolstrends(constchar*str,constchar*suffix)¶: Check if a string ends with another string.

Parameters

constchar*str: NULL-terminated string to check againstsuffix
constchar*suffix: NULL-terminated string defining the suffix to look for instr

Return

True ifstr ends withsuffix. False in all other cases.

char*kstrdup(constchar*s,gfp_tgfp)¶: allocate space for and copy an existing string

Parameters

constchar*s: the string to duplicate
gfp_tgfp: the GFP mask used in thekmalloc() call when allocating memory

Return

newly allocated copy ofs orNULL in case of error

constchar*kstrdup_const(constchar*s,gfp_tgfp)¶: conditionally duplicate an existing const string

Parameters

constchar*s: the string to duplicate
gfp_tgfp: the GFP mask used in thekmalloc() call when allocating memory

Note

Strings allocated by kstrdup_const should be freed by kfree_const andmust not be passed tokrealloc().

Return

source string if it is in .rodata section otherwisefallback to kstrdup.

char*kstrndup(constchar*s,size_tmax,gfp_tgfp)¶: allocate space for and copy an existing string

Parameters

constchar*s: the string to duplicate
size_tmax: read at mostmax chars froms
gfp_tgfp: the GFP mask used in thekmalloc() call when allocating memory

Note

Usekmemdup_nul() instead if the size is known exactly.

Return

newly allocated copy ofs orNULL in case of error

void*kmemdup(constvoid*src,size_tlen,gfp_tgfp)¶: duplicate region of memory

Parameters

constvoid*src: memory region to duplicate
size_tlen: memory region length
gfp_tgfp: GFP mask to use

Return

newly allocated copy ofsrc orNULL in case of error,result is physically contiguous. Usekfree() to free.

char*kmemdup_nul(constchar*s,size_tlen,gfp_tgfp)¶: Create a NUL-terminated string from unterminated data

Parameters

constchar*s: The data to stringify
size_tlen: The size of the data
gfp_tgfp: the GFP mask used in thekmalloc() call when allocating memory

Return

newly allocated copy ofs with NUL-termination orNULL incase of error

void*memdup_user(constvoid__user*src,size_tlen)¶: duplicate memory region from user space

Parameters

constvoid__user*src: source address in user space
size_tlen: number of bytes to copy

Return

anERR_PTR() on failure. Result is physicallycontiguous, to be freed bykfree().

void*vmemdup_user(constvoid__user*src,size_tlen)¶: duplicate memory region from user space

Parameters

constvoid__user*src: source address in user space
size_tlen: number of bytes to copy

Return

anERR_PTR() on failure. Result may be notphysically contiguous. Usekvfree() to free.

char*strndup_user(constchar__user*s,longn)¶: duplicate an existing string from user space

Parameters

constchar__user*s: The string to duplicate
longn: Maximum number of bytes to copy, including the trailing NUL.

Return

newly allocated copy ofs or anERR_PTR() in case of error

void*memdup_user_nul(constvoid__user*src,size_tlen)¶: duplicate memory region from user space and NUL-terminate

Parameters

constvoid__user*src: source address in user space
size_tlen: number of bytes to copy

Return

anERR_PTR() on failure.

Basic Kernel Library Functions¶

The Linux kernel provides more basic utility functions.

Bit Operations¶

voidset_bit(longnr,volatileunsignedlong*addr)¶: Atomically set a bit in memory

Parameters

longnr: the bit to set
volatileunsignedlong*addr: the address to start counting from

Description

This is a relaxed atomic operation (no implied memory barriers).

Note thatnr may be almost arbitrarily large; this function is notrestricted to acting on a single-word quantity.

voidclear_bit(longnr,volatileunsignedlong*addr)¶: Clears a bit in memory

Parameters

longnr: Bit to clear
volatileunsignedlong*addr: Address to start counting from

Description

This is a relaxed atomic operation (no implied memory barriers).

voidchange_bit(longnr,volatileunsignedlong*addr)¶: Toggle a bit in memory

Parameters

longnr: Bit to change
volatileunsignedlong*addr: Address to start counting from

Description

This is a relaxed atomic operation (no implied memory barriers).

Note thatnr may be almost arbitrarily large; this function is notrestricted to acting on a single-word quantity.

booltest_and_set_bit(longnr,volatileunsignedlong*addr)¶: Set a bit and return its old value

Parameters

longnr: Bit to set
volatileunsignedlong*addr: Address to count from

Description

This is an atomic fully-ordered operation (implied full memory barrier).

booltest_and_clear_bit(longnr,volatileunsignedlong*addr)¶: Clear a bit and return its old value

Parameters

longnr: Bit to clear
volatileunsignedlong*addr: Address to count from

Description

This is an atomic fully-ordered operation (implied full memory barrier).

booltest_and_change_bit(longnr,volatileunsignedlong*addr)¶: Change a bit and return its old value

Parameters

longnr: Bit to change
volatileunsignedlong*addr: Address to count from

Description

This is an atomic fully-ordered operation (implied full memory barrier).

void___set_bit(unsignedlongnr,volatileunsignedlong*addr)¶: Set a bit in memory

Parameters

unsignedlongnr: the bit to set
volatileunsignedlong*addr: the address to start counting from

Description

Unlikeset_bit(), this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.

void___clear_bit(unsignedlongnr,volatileunsignedlong*addr)¶: Clears a bit in memory

Parameters

unsignedlongnr: the bit to clear
volatileunsignedlong*addr: the address to start counting from

Description

Unlikeclear_bit(), this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.

void___change_bit(unsignedlongnr,volatileunsignedlong*addr)¶: Toggle a bit in memory

Parameters

unsignedlongnr: the bit to change
volatileunsignedlong*addr: the address to start counting from

Description

Unlikechange_bit(), this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.

bool___test_and_set_bit(unsignedlongnr,volatileunsignedlong*addr)¶: Set a bit and return its old value

Parameters

unsignedlongnr: Bit to set
volatileunsignedlong*addr: Address to count from

Description

This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.

bool___test_and_clear_bit(unsignedlongnr,volatileunsignedlong*addr)¶: Clear a bit and return its old value

Parameters

unsignedlongnr: Bit to clear
volatileunsignedlong*addr: Address to count from

Description

This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.

bool___test_and_change_bit(unsignedlongnr,volatileunsignedlong*addr)¶: Change a bit and return its old value

Parameters

unsignedlongnr: Bit to change
volatileunsignedlong*addr: Address to count from

Description

This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.

bool_test_bit(unsignedlongnr,volatileconstunsignedlong*addr)¶: Determine whether a bit is set

Parameters

unsignedlongnr: bit number to test
constvolatileunsignedlong*addr: Address to start counting from

bool_test_bit_acquire(unsignedlongnr,volatileconstunsignedlong*addr)¶: Determine, with acquire semantics, whether a bit is set

Parameters

unsignedlongnr: bit number to test
constvolatileunsignedlong*addr: Address to start counting from

voidclear_bit_unlock(longnr,volatileunsignedlong*addr)¶: Clear a bit in memory, for unlock

Parameters

longnr: the bit to set
volatileunsignedlong*addr: the address to start counting from

Description

This operation is atomic and provides release barrier semantics.

void__clear_bit_unlock(longnr,volatileunsignedlong*addr)¶: Clears a bit in memory

Parameters

longnr: Bit to clear
volatileunsignedlong*addr: Address to start counting from

Description

This is a non-atomic operation but implies a release barrier before thememory operation. It can be used for an unlock if no other CPUs canconcurrently modify other bits in the word.

booltest_and_set_bit_lock(longnr,volatileunsignedlong*addr)¶: Set a bit and return its old value, for lock

Parameters

longnr: Bit to set
volatileunsignedlong*addr: Address to count from

Description

This operation is atomic and provides acquire barrier semantics ifthe returned value is 0.It can be used to implement bit locks.

boolxor_unlock_is_negative_byte(unsignedlongmask,volatileunsignedlong*addr)¶: XOR a single byte in memory and test if it is negative, for unlock.

Parameters

unsignedlongmask: Change the bits which are set in this mask.
volatileunsignedlong*addr: The address of the word containing the byte to change.

Description

Changes some of bits 0-6 in the word pointed to byaddr.This operation is atomic and provides release barrier semantics.Used to optimise some folio operations which are commonly pairedwith an unlock or end of writeback. Bit 7 is used as PG_waiters toindicate whether anybody is waiting for the unlock.

Return

Whether the top bit of the byte is set.

Bitmap Operations¶

bitmaps provide an array of bits, implemented using anarray of unsigned longs. The number of valid bits in agiven bitmap does _not_ need to be an exact multiple ofBITS_PER_LONG.

The possible unused bits in the last, partially used wordof a bitmap are ‘don’t care’. The implementation makesno particular effort to keep them zero. It ensures thattheir value will not affect the results of any operation.The bitmap operations that return Boolean (bitmap_empty,for example) or scalar (bitmap_weight, for example) resultscarefully filter out these unused bits from impacting theirresults.

The byte ordering of bitmaps is more natural on littleendian architectures. See the big-endian headersinclude/asm-ppc64/bitops.h and include/asm-s390/bitops.hfor the best explanations of this ordering.

The DECLARE_BITMAP(name,bits) macro, in linux/types.h, can be usedto declare an array named ‘name’ of just enough unsigned longs tocontain all bit positions from 0 to ‘bits’ - 1.

The available bitmap operations and their rough meaning in thecase that the bitmap is a single unsigned long are thus:

The generated code is more efficient when nbits is known atcompile-time and at most BITS_PER_LONG.

bitmap_zero(dst, nbits)                     *dst = 0ULbitmap_fill(dst, nbits)                     *dst = ~0ULbitmap_copy(dst, src, nbits)                *dst = *srcbitmap_and(dst, src1, src2, nbits)          *dst = *src1 & *src2bitmap_or(dst, src1, src2, nbits)           *dst = *src1 | *src2bitmap_weighted_or(dst, src1, src2, nbits)  *dst = *src1 | *src2. Returns Hamming Weight of dstbitmap_xor(dst, src1, src2, nbits)          *dst = *src1 ^ *src2bitmap_andnot(dst, src1, src2, nbits)       *dst = *src1 & ~(*src2)bitmap_complement(dst, src, nbits)          *dst = ~(*src)bitmap_equal(src1, src2, nbits)             Are *src1 and *src2 equal?bitmap_intersects(src1, src2, nbits)        Do *src1 and *src2 overlap?bitmap_subset(src1, src2, nbits)            Is *src1 a subset of *src2?bitmap_empty(src, nbits)                    Are all bits zero in *src?bitmap_full(src, nbits)                     Are all bits set in *src?bitmap_weight(src, nbits)                   Hamming Weight: number set bitsbitmap_weight_and(src1, src2, nbits)        Hamming Weight of and'ed bitmapbitmap_weight_andnot(src1, src2, nbits)     Hamming Weight of andnot'ed bitmapbitmap_set(dst, pos, nbits)                 Set specified bit areabitmap_clear(dst, pos, nbits)               Clear specified bit areabitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free areabitmap_find_next_zero_area_off(buf, len, pos, n, mask, mask_off)  as abovebitmap_shift_right(dst, src, n, nbits)      *dst = *src >> nbitmap_shift_left(dst, src, n, nbits)       *dst = *src << nbitmap_cut(dst, src, first, n, nbits)       Cut n bits from first, copy restbitmap_replace(dst, old, new, mask, nbits)  *dst = (*old & ~(*mask)) | (*new & *mask)bitmap_scatter(dst, src, mask, nbits)       *dst = map(dense, sparse)(src)bitmap_gather(dst, src, mask, nbits)        *dst = map(sparse, dense)(src)bitmap_remap(dst, src, old, new, nbits)     *dst = map(old, new)(src)bitmap_bitremap(oldbit, old, new, nbits)    newbit = map(old, new)(oldbit)bitmap_onto(dst, orig, relmap, nbits)       *dst = orig relative to relmapbitmap_fold(dst, orig, sz, nbits)           dst bits = orig bits mod szbitmap_parse(buf, buflen, dst, nbits)       Parse bitmap dst from kernel bufbitmap_parse_user(ubuf, ulen, dst, nbits)   Parse bitmap dst from user bufbitmap_parselist(buf, dst, nbits)           Parse bitmap dst from kernel bufbitmap_parselist_user(buf, dst, nbits)      Parse bitmap dst from user bufbitmap_find_free_region(bitmap, bits, order)  Find and allocate bit regionbitmap_release_region(bitmap, pos, order)   Free specified bit regionbitmap_allocate_region(bitmap, pos, order)  Allocate specified bit regionbitmap_from_arr32(dst, buf, nbits)          Copy nbits from u32[] buf to dstbitmap_from_arr64(dst, buf, nbits)          Copy nbits from u64[] buf to dstbitmap_to_arr32(buf, src, nbits)            Copy nbits from buf to u32[] dstbitmap_to_arr64(buf, src, nbits)            Copy nbits from buf to u64[] dstbitmap_get_value8(map, start)               Get 8bit value from map at startbitmap_set_value8(map, value, start)        Set 8bit value to map at startbitmap_read(map, start, nbits)              Read an nbits-sized value from                                            map at startbitmap_write(map, value, start, nbits)      Write an nbits-sized value to                                            map at start

Note,bitmap_zero() andbitmap_fill() operate over the region ofunsigned longs, that is, bits behind bitmap till the unsigned longboundary will be zeroed or filled as well. Consider to usebitmap_clear() orbitmap_set() to make explicit zeroing or fillingrespectively.

Also the following operations in asm/bitops.h apply to bitmaps.:

set_bit(bit, addr)                  *addr |= bitclear_bit(bit, addr)                *addr &= ~bitchange_bit(bit, addr)               *addr ^= bittest_bit(bit, addr)                 Is bit set in *addr?test_and_set_bit(bit, addr)         Set bit and return old valuetest_and_clear_bit(bit, addr)       Clear bit and return old valuetest_and_change_bit(bit, addr)      Change bit and return old valuefind_first_zero_bit(addr, nbits)    Position first zero bit in *addrfind_first_bit(addr, nbits)         Position first set bit in *addrfind_next_zero_bit(addr, nbits, bit)                                    Position next zero bit in *addr >= bitfind_next_bit(addr, nbits, bit)     Position next set bit in *addr >= bitfind_next_and_bit(addr1, addr2, nbits, bit)                                    Same as find_next_bit, but in                                    (*addr1 & *addr2)

void__bitmap_shift_right(unsignedlong*dst,constunsignedlong*src,unsignedshift,unsignednbits)¶: logical right shift of the bits in a bitmap

Parameters

unsignedlong*dst: destination bitmap
constunsignedlong*src: source bitmap
unsignedshift: shift by this many bits
unsignednbits: bitmap size, in bits

Description

Shifting right (dividing) means moving bits in the MS -> LS bitdirection. Zeros are fed into the vacated MS positions and theLS bits shifted off the bottom are lost.

void__bitmap_shift_left(unsignedlong*dst,constunsignedlong*src,unsignedintshift,unsignedintnbits)¶: logical left shift of the bits in a bitmap

Parameters

unsignedlong*dst: destination bitmap
constunsignedlong*src: source bitmap
unsignedintshift: shift by this many bits
unsignedintnbits: bitmap size, in bits

Description

Shifting left (multiplying) means moving bits in the LS -> MSdirection. Zeros are fed into the vacated LS bit positionsand those MS bits shifted off the top are lost.

voidbitmap_cut(unsignedlong*dst,constunsignedlong*src,unsignedintfirst,unsignedintcut,unsignedintnbits)¶: remove bit region from bitmap and right shift remaining bits

Parameters

unsignedlong*dst: destination bitmap, might overlap with src
constunsignedlong*src: source bitmap
unsignedintfirst: start bit of region to be removed
unsignedintcut: number of bits to remove
unsignedintnbits: bitmap size, in bits

Description

Set the n-th bit ofdst iff the n-th bit ofsrc is set andn is less thanfirst, or the m-th bit ofsrc is set for anym such thatfirst <= n < nbits, and m = n +cut.

In pictures, example for a big-endian 32-bit architecture:

Thesrc bitmap is:

31                                   63|                                    |10000000 11000001 11110010 00010101  10000000 11000001 01110010 00010101                |  |              |                                    |               16  14             0                                   32

ifcut is 3, andfirst is 14, bits 14-16 insrc are cut anddst is:

31                                   63|                                    |10110000 00011000 00110010 00010101  00010000 00011000 00101110 01000010                   |              |                                    |                   14 (bit 17     0                                   32                       from @src)

Note thatdst andsrc might overlap partially or entirely.

This is implemented in the obvious way, with a shift and carrystep for each moved bit. Optimisation is left as an exercisefor the compiler.

unsignedlongbitmap_find_next_zero_area_off(unsignedlong*map,unsignedlongsize,unsignedlongstart,unsignedintnr,unsignedlongalign_mask,unsignedlongalign_offset)¶: find a contiguous aligned zero area

Parameters

unsignedlong*map: The address to base the search on
unsignedlongsize: The bitmap size in bits
unsignedlongstart: The bitnumber to start searching at
unsignedintnr: The number of zeroed bits we’re looking for
unsignedlongalign_mask: Alignment mask for zero area
unsignedlongalign_offset: Alignment offset for zero area.

Description

Thealign_mask should be one less than a power of 2; the effect is thatthe bit offset of all zero areas this function finds plusalign_offsetis multiple of that power of 2.

voidbitmap_remap(unsignedlong*dst,constunsignedlong*src,constunsignedlong*old,constunsignedlong*new,unsignedintnbits)¶: Apply map defined by a pair of bitmaps to another bitmap

Parameters

unsignedlong*dst: remapped result
constunsignedlong*src: subset to be remapped
constunsignedlong*old: defines domain of map
constunsignedlong*new: defines range of map
unsignedintnbits: number of bits in each of these bitmaps

Description

Letold andnew define a mapping of bit positions, such thatwhatever position is held by the n-th set bit inold is mappedto the n-th set bit innew. In the more general case, allowingfor the possibility that the weight ‘w’ ofnew is less than theweight ofold, map the position of the n-th set bit inold tothe position of the m-th set bit innew, where m == n % w.

If either of theold andnew bitmaps are empty, or ifsrc anddst point to the same location, then this routine copiessrctodst.

The positions of unset bits inold are mapped to themselves(the identity map).

Apply the above specified mapping tosrc, placing the result indst, clearing any bits previously set indst.

For example, lets say thatold has bits 4 through 7 set, andnew has bits 12 through 15 set. This defines the mapping of bitposition 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all otherbit positions unchanged. So if saysrc comes into this routinewith bits 1, 5 and 7 set, thendst should leave with bits 1,13 and 15 set.

intbitmap_bitremap(intoldbit,constunsignedlong*old,constunsignedlong*new,intbits)¶: Apply map defined by a pair of bitmaps to a single bit

Parameters

intoldbit: bit position to be mapped
constunsignedlong*old: defines domain of map
constunsignedlong*new: defines range of map
intbits: number of bits in each of these bitmaps

Description

The positions of unset bits inold are mapped to themselves(the identity map).

Apply the above specified mapping to bit positionoldbit, returningthe new bit position.

voidbitmap_from_arr32(unsignedlong*bitmap,constu32*buf,unsignedintnbits)¶: copy the contents of u32 array of bits to bitmap

Parameters

unsignedlong*bitmap: array of unsigned longs, the destination bitmap
constu32*buf: array of u32 (in host byte order), the source bitmap
unsignedintnbits: number of bits inbitmap

voidbitmap_to_arr32(u32*buf,constunsignedlong*bitmap,unsignedintnbits)¶: copy the contents of bitmap to a u32 array of bits

Parameters

u32*buf: array of u32 (in host byte order), the dest bitmap
constunsignedlong*bitmap: array of unsigned longs, the source bitmap
unsignedintnbits: number of bits inbitmap

voidbitmap_from_arr64(unsignedlong*bitmap,constu64*buf,unsignedintnbits)¶: copy the contents of u64 array of bits to bitmap

Parameters

unsignedlong*bitmap: array of unsigned longs, the destination bitmap
constu64*buf: array of u64 (in host byte order), the source bitmap
unsignedintnbits: number of bits inbitmap

voidbitmap_to_arr64(u64*buf,constunsignedlong*bitmap,unsignedintnbits)¶: copy the contents of bitmap to a u64 array of bits

Parameters

u64*buf: array of u64 (in host byte order), the dest bitmap
constunsignedlong*bitmap: array of unsigned longs, the source bitmap
unsignedintnbits: number of bits inbitmap

intbitmap_pos_to_ord(constunsignedlong*buf,unsignedintpos,unsignedintnbits)¶: find ordinal of set bit at given position in bitmap

Parameters

constunsignedlong*buf: pointer to a bitmap
unsignedintpos: a bit position inbuf (0 <=pos <nbits)
unsignedintnbits: number of valid bit positions inbuf

Description

Map the bit at positionpos inbuf (of lengthnbits) to theordinal of which set bit it is. If it is not set or ifposis not a valid bit position, map to -1.

If for example, just bits 4 through 7 are set inbuf, thenposvalues 4 through 7 will get mapped to 0 through 3, respectively,and otherpos values will get mapped to -1. Whenpos value 7gets mapped to (returns)ord value 3 in this example, that meansthat bit 7 is the 3rd (starting with 0th) set bit inbuf.

The bit positions 0 throughbits are valid positions inbuf.

voidbitmap_onto(unsignedlong*dst,constunsignedlong*orig,constunsignedlong*relmap,unsignedintbits)¶: translate one bitmap relative to another

Parameters

unsignedlong*dst: resulting translated bitmap
constunsignedlong*orig: original untranslated bitmap
constunsignedlong*relmap: bitmap relative to which translated
unsignedintbits: number of bits in each of these bitmaps

Description

Set the n-th bit ofdst iff there exists some m such that then-th bit ofrelmap is set, the m-th bit oforig is set, andthe n-th bit ofrelmap is also the m-th _set_ bit ofrelmap.(If you understood the previous sentence the first time yourread it, you’re overqualified for your current job.)

In other words,orig is mapped onto (surjectively)dst,using the map { <n, m> | the n-th bit ofrelmap is them-th set bit ofrelmap }.

Any set bits inorig above bit number W, where W is theweight of (number of set bits in)relmap are mapped nowhere.In particular, if for all bits m set inorig, m >= W, thendst will end up empty. In situations where the possibilityof such an empty result is not desired, one way to avoid it isto use thebitmap_fold() operator, below, to first fold theorig bitmap over itself so that all its set bits x are in therange 0 <= x < W. Thebitmap_fold() operator does this bysetting the bit (m % W) indst, for each bit (m) set inorig.

Example [1] for bitmap_onto():

Let’s sayrelmap has bits 30-39 set, andorig has bits1, 3, 5, 7, 9 and 11 set. Then on return from this routine,dst will have bits 31, 33, 35, 37 and 39 set.

When bit 0 is set inorig, it means turn on the bit indst corresponding to whatever is the first bit (if any)that is turned on inrelmap. Since bit 0 was off in theabove example, we leave off that bit (bit 30) indst.

When bit 1 is set inorig (as in the above example), itmeans turn on the bit indst corresponding to whateveris the second bit that is turned on inrelmap. The secondbit inrelmap that was turned on in the above example wasbit 31, so we turned on bit 31 indst.

Similarly, we turned on bits 33, 35, 37 and 39 indst,because they were the 4th, 6th, 8th and 10th set bitsset inrelmap, and the 4th, 6th, 8th and 10th bits oforig (i.e. bits 3, 5, 7 and 9) were also set.

When bit 11 is set inorig, it means turn on the bit indst corresponding to whatever is the twelfth bit that isturned on inrelmap. In the above example, there wereonly ten bits turned on inrelmap (30..39), so that bit11 was set inorig had no affect ondst.

Example [2] for bitmap_fold() + bitmap_onto():

Let’s sayrelmap has these ten bits set:

40 41 42 43 45 48 53 61 74 95

(for the curious, that’s 40 plus the first ten terms of theFibonacci sequence.)

Further lets say we use the following code, invokingbitmap_fold() then bitmap_onto, as suggested above toavoid the possibility of an emptydst result:

unsigned long *tmp;     // a temporary bitmap's bitsbitmap_fold(tmp, orig, bitmap_weight(relmap, bits), bits);bitmap_onto(dst, tmp, relmap, bits);

Then this table shows what various values ofdst would be, forvariousorig’s. I list the zero-based positions of each set bit.The tmp column shows the intermediate result, as computed byusingbitmap_fold() to fold theorig bitmap modulo ten(the weight ofrelmap):

orig
tmp
dst
0
0
40
1
1
41
9
9
95
10
0
40[1]
1 3 5 7
1 3 5 7
41 43 48 61
0 1 2 3 4
0 1 2 3 4
40 41 42 43 45
0 9 18 27
0 9 8 7
40 61 74 95
0 10 20 30
0
40
0 11 22 33
0 1 2 3
40 41 42 43
0 12 24 36
0 2 4 6
40 42 45 53
78 102 211
1 2 8
41 42 74[1]

[1](1,2)

For these marked lines, if we hadn’t first donebitmap_fold()into tmp, then thedst result would have been empty.

If either oforig orrelmap is empty (no set bits), thendstwill be returned empty.

If (as explained above) the only set bits inorig are in positionsm where m >= W, (where W is the weight ofrelmap) thendst willonce again be returned empty.

All bits indst not set by the above rule are cleared.

voidbitmap_fold(unsignedlong*dst,constunsignedlong*orig,unsignedintsz,unsignedintnbits)¶: fold larger bitmap into smaller, modulo specified size

Parameters

unsignedlong*dst: resulting smaller bitmap
constunsignedlong*orig: original larger bitmap
unsignedintsz: specified size
unsignedintnbits: number of bits in each of these bitmaps

Description

For each bit oldbit inorig, set bit oldbit modsz indst.Clear all other bits indst. See further the comment andExample [2] forbitmap_onto() for why and how to use this.

unsignedlongbitmap_find_next_zero_area(unsignedlong*map,unsignedlongsize,unsignedlongstart,unsignedintnr,unsignedlongalign_mask)¶: find a contiguous aligned zero area

Parameters

unsignedlong*map: The address to base the search on
unsignedlongsize: The bitmap size in bits
unsignedlongstart: The bitnumber to start searching at
unsignedintnr: The number of zeroed bits we’re looking for
unsignedlongalign_mask: Alignment mask for zero area

Description

Thealign_mask should be one less than a power of 2; the effect is thatthe bit offset of all zero areas this function finds is multiples of thatpower of 2. Aalign_mask of 0 means no alignment is required.

boolbitmap_or_equal(constunsignedlong*src1,constunsignedlong*src2,constunsignedlong*src3,unsignedintnbits)¶: Check whether the or of two bitmaps is equal to a third

Parameters

constunsignedlong*src1: Pointer to bitmap 1
constunsignedlong*src2: Pointer to bitmap 2 will be or’ed with bitmap 1
constunsignedlong*src3: Pointer to bitmap 3. Compare to the result of*src1 |*src2
unsignedintnbits: number of bits in each of these bitmaps

Return

True if (*src1 |*src2) ==*src3, false otherwise

voidbitmap_scatter(unsignedlong*dst,constunsignedlong*src,constunsignedlong*mask,unsignedintnbits)¶: Scatter a bitmap according to the given mask

Parameters

unsignedlong*dst: scattered bitmap
constunsignedlong*src: gathered bitmap
constunsignedlong*mask: mask representing bits to assign to in the scattered bitmap
unsignedintnbits: number of bits in each of these bitmaps

Description

Scatters bitmap with sequential bits according to the givenmask.

Example

Ifsrc bitmap = 0x005a, withmask = 0x1313,dst will be 0x0302.

Or in binary formsrcmaskdst0000000001011010 0001001100010011 0000001100000010

(Bits 0, 1, 2, 3, 4, 5 are copied to the bits 0, 1, 4, 8, 9, 12)

A more ‘visual’ description of the operation:

src:  0000000001011010                ||||||         +------+|||||         |  +----+||||         |  |+----+|||         |  ||   +-+||         |  ||   |  ||mask: ...v..vv...v..vv      ...0..11...0..10dst:  0000001100000010

A relationship exists betweenbitmap_scatter() andbitmap_gather(). Seebitmap_gather() for the bitmap gather detailed operations. TL;DR:bitmap_gather() can be seen as the ‘reverse’bitmap_scatter() operation.

voidbitmap_gather(unsignedlong*dst,constunsignedlong*src,constunsignedlong*mask,unsignedintnbits)¶: Gather a bitmap according to given mask

Parameters

unsignedlong*dst: gathered bitmap
constunsignedlong*src: scattered bitmap
constunsignedlong*mask: mask representing bits to extract from in the scattered bitmap
unsignedintnbits: number of bits in each of these bitmaps

Description

Gathers bitmap with sparse bits according to the givenmask.

Example

Ifsrc bitmap = 0x0302, withmask = 0x1313,dst will be 0x001a.

Or in binary formsrcmaskdst0000001100000010 0001001100010011 0000000000011010

(Bits 0, 1, 4, 8, 9, 12 are copied to the bits 0, 1, 2, 3, 4, 5)

A more ‘visual’ description of the operation:

mask: ...v..vv...v..vvsrc:  0000001100000010         ^  ^^   ^   0         |  ||   |  10         |  ||   > 010         |  |+--> 1010         |  +--> 11010         +----> 011010dst:  0000000000011010

A relationship exists betweenbitmap_gather() andbitmap_scatter(). Seebitmap_scatter() for the bitmap scatter detailed operations. TL;DR:bitmap_scatter() can be seen as the ‘reverse’bitmap_gather() operation.

Suppose scattered computed using bitmap_scatter(scattered, src, mask, n).The operation bitmap_gather(result, scattered, mask, n) leads to a resultequal or equivalent to src.

The result can be ‘equivalent’ becausebitmap_scatter() andbitmap_gather()are not bijective.The result and src values are equivalent in that sense that a call tobitmap_scatter(res, src, mask, n) and a call tobitmap_scatter(res, result, mask, n) will lead to the same res value.

voidbitmap_release_region(unsignedlong*bitmap,unsignedintpos,intorder)¶: release allocated bitmap region

Parameters

unsignedlong*bitmap: array of unsigned longs corresponding to the bitmap
unsignedintpos: beginning of bit region to release
intorder: region size (log base 2 of number of bits) to release

Description

This is the complement to__bitmap_find_free_region() and releasesthe found region (by clearing it in the bitmap).

intbitmap_allocate_region(unsignedlong*bitmap,unsignedintpos,intorder)¶: allocate bitmap region

Parameters

unsignedlong*bitmap: array of unsigned longs corresponding to the bitmap
unsignedintpos: beginning of bit region to allocate
intorder: region size (log base 2 of number of bits) to allocate

Description

Allocate (set bits in) a specified region of a bitmap.

Return

0 on success, or-EBUSY if specified region wasn’tfree (not all bits were zero).

intbitmap_find_free_region(unsignedlong*bitmap,unsignedintbits,intorder)¶: find a contiguous aligned mem region

Parameters

unsignedlong*bitmap: array of unsigned longs corresponding to the bitmap
unsignedintbits: number of bits in the bitmap
intorder: region size (log base 2 of number of bits) to find

Description

Find a region of free (zero) bits in abitmap ofbits bits andallocate them (set them to one). Only consider regions of lengtha power (order) of two, aligned to that power of two, whichmakes the search algorithm much faster.

Return

the bit offset in bitmap of the allocated region,or -errno on failure.

BITMAP_FROM_U64¶

BITMAP_FROM_U64(n)

Represent u64 value in the format suitable for bitmap.

Parameters

n: u64 value

Description

Linux bitmaps are internally arrays of unsigned longs, i.e. 32-bitintegers in 32-bit environment, and 64-bit integers in 64-bit one.

There are four combinations of endianness and length of the word in linuxABIs: LE64, BE64, LE32 and BE32.

On 64-bit kernels 64-bit LE and BE numbers are naturally ordered inbitmaps and therefore don’t require any special handling.

On 32-bit kernels 32-bit LE ABI orders lo word of 64-bit number in memoryprior to hi, and 32-bit BE orders hi word prior to lo. The bitmap on theother hand is represented as an array of 32-bit words and the position ofbit N may therefore be calculated as: word #(N/32) and bit #(N``32``) in thatword. For example, bit #42 is located at 10th position of 2nd word.It matches 32-bit LE ABI, and we can simply let the compiler store 64-bitvalues in memory as it usually does. But for BE we need to swap hi and lowords manually.

With all that, the macroBITMAP_FROM_U64() does explicit reordering of hi andlo parts of u64. For LE32 it does nothing, and for BE environment it swapshi and lo words, as is expected by bitmap.

voidbitmap_from_u64(unsignedlong*dst,u64mask)¶: Check and swap words within u64.

Parameters

unsignedlong*dst: destination bitmap
u64mask: source bitmap

Description

In 32-bit Big Endian kernel, when using(u32*)(:c:type:`val`)[*]to read u64 mask, we will get the wrong word.That is(u32*)(:c:type:`val`)[0] gets the upper 32 bits,but we expect the lower 32-bits of u64.

unsignedlongbitmap_read(constunsignedlong*map,unsignedlongstart,unsignedlongnbits)¶: read a value of n-bits from the memory region

Parameters

constunsignedlong*map: address to the bitmap memory region
unsignedlongstart: bit offset of the n-bit value
unsignedlongnbits: size of value in bits, nonzero, up to BITS_PER_LONG

Return

value ofnbits bits located at thestart bit offset within themap memory region. Fornbits = 0 andnbits > BITS_PER_LONG the returnvalue is undefined.

voidbitmap_write(unsignedlong*map,unsignedlongvalue,unsignedlongstart,unsignedlongnbits)¶: write n-bit value within a memory region

Parameters

unsignedlong*map: address to the bitmap memory region
unsignedlongvalue: value to write, clamped to nbits
unsignedlongstart: bit offset of the n-bit value
unsignedlongnbits: size of value in bits, nonzero, up to BITS_PER_LONG.

Description

bitmap_write() behaves as-if implemented asnbits calls of__assign_bit(),i.e. bits beyondnbits are ignored:

for (bit = 0; bit < nbits; bit++)
__assign_bit(start + bit, bitmap, val & BIT(bit));

Fornbits == 0 andnbits > BITS_PER_LONG no writes are performed.

Command-line Parsing¶

intget_option(char**str,int*pint)¶: Parse integer from an option string

Parameters

char**str: option string
int*pint: (optional output) integer value parsed fromstr

Description

Read an int from an option string; if available accept a subsequentcomma as well.
Whenpint is NULL the function can be used as a validator ofthe current option in the string.
Return values:0 - no int in string1 - int found, no subsequent comma2 - int found including a subsequent comma3 - hyphen found to denote a range
Leading hyphen without integer is no integer case, but we consume itfor the sake of simplification.

char*get_options(constchar*str,intnints,int*ints)¶: Parse a string into a list of integers

Parameters

constchar*str: String to be parsed
intnints: size of integer array
int*ints: integer array (must have room for at least one element)

Description

This function parses a string containing a comma-separatedlist of integers, a hyphen-separated range of _positive_ integers,or a combination of both. The parse halts when the array isfull, or when no more numbers can be retrieved from thestring.
Whennints is 0, the function just validates the givenstr andreturns the amount of parseable integers as described below.
The first element is filled by the number of collected integersin the range. The rest is what was parsed from thestr.
Return value is the character in the string which causedthe parse to end (typically a null terminator, ifstr iscompletely parseable).

unsignedlonglongmemparse(constchar*ptr,char**retptr)¶: parse a string with mem suffixes into a number

Parameters

constchar*ptr: Where parse begins
char**retptr: (output) Optional pointer to next char after parse completes

Description

Parses a string into a number. The number stored atptr ispotentially suffixed with K, M, G, T, P, E.

Error Pointers¶

IS_ERR_VALUE¶

IS_ERR_VALUE(x)

Detect an error pointer.

Parameters

x: The pointer to check.

Description

LikeIS_ERR(), but does not generate a compiler warning if result is unused.

void*ERR_PTR(longerror)¶: Create an error pointer.

Parameters

longerror: A negative error code.

Description

Encodeserror into a pointer value. Users should consider the resultopaque and not assume anything about how the error is encoded.

Return

A pointer witherror encoded within its value.

INIT_ERR_PTR¶

INIT_ERR_PTR(error)

Init a const error pointer.

Parameters

error: A negative error code.

Description

LikeERR_PTR(), but usable to initialize static variables.

longPTR_ERR(__forceconstvoid*ptr)¶: Extract the error code from an error pointer.

Parameters

__forceconstvoid*ptr: An error pointer.

Return

The error code withinptr.

boolIS_ERR(__forceconstvoid*ptr)¶: Detect an error pointer.

Parameters

__forceconstvoid*ptr: The pointer to check.

Return

true ifptr is an error pointer, false otherwise.

boolIS_ERR_OR_NULL(__forceconstvoid*ptr)¶: Detect an error pointer or a null pointer.

Parameters

__forceconstvoid*ptr: The pointer to check.

Description

LikeIS_ERR(), but also returns true for a null pointer.

void*ERR_CAST(__forceconstvoid*ptr)¶: Explicitly cast an error-valued pointer to another pointer type

Parameters

__forceconstvoid*ptr: The pointer to cast.

Description

Explicitly cast an error-valued pointer to another pointer type in such away as to make it clear that’s what’s going on.

intPTR_ERR_OR_ZERO(__forceconstvoid*ptr)¶: Extract the error code from a pointer if it has one.

Parameters

__forceconstvoid*ptr: A potential error pointer.

Description

Convenience function that can be used inside a function that returnsan error code to propagate errors received as error pointers.For example,returnPTR_ERR_OR_ZERO(ptr); replaces:

if(IS_ERR(ptr))returnPTR_ERR(ptr);elsereturn0;

Return

The error code withinptr if it is an error pointer; 0 otherwise.

Sorting¶

voidsort_r(void*base,size_tnum,size_tsize,cmp_r_func_tcmp_func,swap_r_func_tswap_func,constvoid*priv)¶: sort an array of elements

Parameters

void*base: pointer to data to sort
size_tnum: number of elements
size_tsize: size of each element
cmp_r_func_tcmp_func: pointer to comparison function
swap_r_func_tswap_func: pointer to swap function or NULL
constvoid*priv: third argument passed to comparison function

Description

This function does a heapsort on the given array. You may providea swap_func function if you need to do something more than a memorycopy (e.g. fix up pointers or auxiliary data), but the built-in swapavoids a slow retpoline and so is significantly faster.

The comparison function must adhere to specific mathematicalproperties to ensure correct and stable sorting:- Antisymmetry: cmp_func(a, b) must return the opposite sign ofcmp_func(b, a).- Transitivity: if cmp_func(a, b) <= 0 and cmp_func(b, c) <= 0, thencmp_func(a, c) <= 0.

Sorting time is O(n log n) both on average and worst-case. Whilequicksort is slightly faster on average, it suffers from exploitableO(n*n) worst-case behavior and extra memory requirements that makeit less suitable for kernel use.

voidsort_r_nonatomic(void*base,size_tnum,size_tsize,cmp_r_func_tcmp_func,swap_r_func_tswap_func,constvoid*priv)¶: sort an array of elements, with cond_resched

Parameters

void*base: pointer to data to sort
size_tnum: number of elements
size_tsize: size of each element
cmp_r_func_tcmp_func: pointer to comparison function
swap_r_func_tswap_func: pointer to swap function or NULL
constvoid*priv: third argument passed to comparison function

Description

Same as sort_r, but preferred for larger arrays as it does a periodiccond_resched().

voidlist_sort(void*priv,structlist_head*head,list_cmp_func_tcmp)¶: sort a list

Parameters

void*priv: private data, opaque tolist_sort(), passed tocmp
structlist_head*head: the list to sort
list_cmp_func_tcmp: the elements comparison function

Description

The comparison functioncmp must return > 0 ifa should sort afterb (”a >b” if you want an ascending sort), and <= 0 ifa shouldsort beforebor their original order should be preserved. It isalways called with the element that came first in the input ina,and list_sort is a stable sort, so it is not necessary to distinguishthea <b anda ==b cases.

The comparison function must adhere to specific mathematical propertiesto ensure correct and stable sorting:- Antisymmetry: cmp(a,b) must return the opposite sign ofcmp(b,a).- Transitivity: if cmp(a,b) <= 0 and cmp(b,c) <= 0, thencmp(a,c) <= 0.

This is compatible with two styles ofcmp function:- The traditional style which returns <0 / =0 / >0, or- Returning a boolean 0/1.The latter offers a chance to save a few cycles in the comparison(which is used by e.g.plug_ctx_cmp() in block/blk-mq.c).

A good way to write a multi-word comparison is:

if (a->high != b->high)        return a->high > b->high;if (a->middle != b->middle)        return a->middle > b->middle;return a->low > b->low;

This mergesort is as eager as possible while always performing at least2:1 balanced merges. Given two pending sublists of size 2^k, they aremerged to a size-2^(k+1) list as soon as we have 2^k following elements.

Thus, it will avoid cache thrashing as long as 3*2^k elements canfit into the cache. Not quite as good as a fully-eager bottom-upmergesort, but it does use 0.2*n fewer comparisons, so is faster inthe common case that everything fits into L1.

The merging is controlled by “count”, the number of elements in thepending lists. This is beautifully simple code, but rather subtle.

Each time we increment “count”, we set one bit (bit k) and clearbits k-1 .. 0. Each time this happens (except the very first timefor each bit, when count increments to 2^k), we merge two lists ofsize 2^k into one list of size 2^(k+1).

This merge happens exactly when the count reaches an odd multiple of2^k, which is when we have 2^k elements pending in smaller lists,so it’s safe to merge away two lists of size 2^k.

After this happens twice, we have created two lists of size 2^(k+1),which will be merged into a list of size 2^(k+2) before we createa third list of size 2^(k+1), so there are never more than two pending.

The number of pending lists of size 2^k is determined by thestate of bit k of “count” plus two extra pieces of information:

The state of bit k-1 (when k == 0, consider bit -1 always set), and
Whether the higher-order bits are zero or non-zero (i.e.is count >= 2^(k+1)).

There are six states we distinguish. “x” represents some arbitrarybits, and “y” represents some arbitrary non-zero bits:0: 00x: 0 pending of size 2^k; x pending of sizes < 2^k1: 01x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k2: x10x: 0 pending of size 2^k; 2^k + x pending of sizes < 2^k3: x11x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k4: y00x: 1 pending of size 2^k; 2^k + x pending of sizes < 2^k5: y01x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k(merge and loop back to state 2)

We gain lists of size 2^k in the 2->3 and 4->5 transitions (becausebit k-1 is set while the more significant bits are non-zero) andmerge them away in the 5->2 transition. Note in particular that justbefore the 5->2 transition, all lower-order bits are 11 (state 3),so there is one list of each smaller size.

When we reach the end of the input, we merge all the pendinglists, from smallest to largest. If you work through cases 2 to5 above, you can see that the number of elements we merge with a listof size 2^k varies from 2^(k-1) (cases 3 and 5 when x == 0) to2^(k+1) - 1 (second merge of case 5 when x == 2^(k-1) - 1).

Text Searching¶

INTRODUCTION

The textsearch infrastructure provides text searching facilities forboth linear and non-linear data. Individual search algorithms areimplemented in modules and chosen by the user.

ARCHITECTURE

  User  +----------------+  |        finish()|<--------------(6)-----------------+  |get_next_block()|<--------------(5)---------------+ |  |                |                     Algorithm   | |  |                |                    +------------------------------+  |                |                    |  init()   find()   destroy() |  |                |                    +------------------------------+  |                |       Core API           ^       ^          ^  |                |      +---------------+  (2)     (4)        (8)  |             (1)|----->| prepare()     |---+       |          |  |             (3)|----->| find()/next() |-----------+          |  |             (7)|----->| destroy()     |----------------------+  +----------------+      +---------------+(1) User configures a search by calling textsearch_prepare() specifying    the search parameters such as the pattern and algorithm name.(2) Core requests the algorithm to allocate and initialize a search    configuration according to the specified parameters.(3) User starts the search(es) by calling textsearch_find() or    textsearch_next() to fetch subsequent occurrences. A state variable    is provided to the algorithm to store persistent variables.(4) Core eventually resets the search offset and forwards the find()    request to the algorithm.(5) Algorithm calls get_next_block() provided by the user continuously    to fetch the data to be searched in block by block.(6) Algorithm invokes finish() after the last call to get_next_block    to clean up any leftovers from get_next_block. (Optional)(7) User destroys the configuration by calling textsearch_destroy().(8) Core notifies the algorithm to destroy algorithm specific    allocations. (Optional)

USAGE

Before a search can be performed, a configuration must be createdby callingtextsearch_prepare() specifying the searching algorithm,the pattern to look for and flags. As a flag, you can set TS_IGNORECASEto perform case insensitive matching. But it might slow downperformance of algorithm, so you should use it at own your risk.The returned configuration may then be used for an arbitraryamount of times and even in parallel as long as a separatestructts_state variable is provided to every instance.
The actual search is performed by either callingtextsearch_find_continuous() for linear data or by providingan ownget_next_block() implementation andcallingtextsearch_find(). Both functions returnthe position of the first occurrence of the pattern or UINT_MAX ifno match was found. Subsequent occurrences can be found by callingtextsearch_next() regardless of the linearity of the data.
Once you’re done using a configuration it must be given back viatextsearch_destroy.

EXAMPLE:

int pos;struct ts_config *conf;struct ts_state state;const char *pattern = "chicken";const char *example = "We dance the funky chicken";conf = textsearch_prepare("kmp", pattern, strlen(pattern),                          GFP_KERNEL, TS_AUTOLOAD);if (IS_ERR(conf)) {    err = PTR_ERR(conf);    goto errout;}pos = textsearch_find_continuous(conf, &state, example, strlen(example));if (pos != UINT_MAX)    panic("Oh my god, dancing chickens at %d\n", pos);textsearch_destroy(conf);

inttextsearch_register(structts_ops*ops)¶: register a textsearch module

Parameters

structts_ops*ops: operations lookup table

Description

This function must be called by textsearch modules to announcetheir presence. The specified &**ops** must havename set to aunique identifier and the callbacksfind(),init(),get_pattern(),andget_pattern_len() must be implemented.

Returns 0 or -EEXISTS if another module has already registeredwith same name.

inttextsearch_unregister(structts_ops*ops)¶: unregister a textsearch module

Parameters

structts_ops*ops: operations lookup table

Description

This function must be called by textsearch modules to announcetheir disappearance for examples when the module gets unloaded.Theops parameter must be the same as the one during theregistration.

Returns 0 on success or -ENOENT if no matching textsearchregistration was found.

unsignedinttextsearch_find_continuous(structts_config*conf,structts_state*state,constvoid*data,unsignedintlen)¶: search a pattern in continuous/linear data

Parameters

structts_config*conf: search configuration
structts_state*state: search state
constvoid*data: data to search in
unsignedintlen: length of data

Description

A simplified version oftextsearch_find() for continuous/linear data.Calltextsearch_next() to retrieve subsequent matches.

Returns the position of first occurrence of the pattern orUINT_MAX if no occurrence was found.

structts_config*textsearch_prepare(constchar*algo,constvoid*pattern,unsignedintlen,gfp_tgfp_mask,intflags)¶: Prepare a search

Parameters

constchar*algo: name of search algorithm
constvoid*pattern: pattern data
unsignedintlen: length of pattern
gfp_tgfp_mask: allocation mask
intflags: search flags

Description

Looks up the search algorithm module and creates a new textsearchconfiguration for the specified pattern.

Note

The format of the pattern may not be compatible between: the various search algorithms.

Returns a new textsearch configuration according to the specifiedparameters or aERR_PTR(). If a zero length pattern is passed, thisfunction returns EINVAL.

voidtextsearch_destroy(structts_config*conf)¶: destroy a search configuration

Parameters

structts_config*conf: search configuration

Description

Releases all references of the configuration and freesup the memory.

unsignedinttextsearch_next(structts_config*conf,structts_state*state)¶: continue searching for a pattern

Parameters

structts_config*conf: search configuration
structts_state*state: search state

Description

Continues a search looking for more occurrences of the pattern.textsearch_find() must be called to find the first occurrencein order to reset the state.

Returns the position of the next occurrence of the pattern orUINT_MAX if not match was found.

unsignedinttextsearch_find(structts_config*conf,structts_state*state)¶: start searching for a pattern

Parameters

structts_config*conf: search configuration
structts_state*state: search state

Description

Returns the position of first occurrence of the pattern orUINT_MAX if no match was found.

void*textsearch_get_pattern(structts_config*conf)¶: return head of the pattern

Parameters

structts_config*conf: search configuration

unsignedinttextsearch_get_pattern_len(structts_config*conf)¶: return length of the pattern

Parameters

structts_config*conf: search configuration

CRC and Math Functions in Linux¶

Arithmetic Overflow Checking¶

check_add_overflow¶

check_add_overflow(a,b,d)

Calculate addition with overflow checking

Parameters

a: first addend
b: second addend
d: pointer to store sum

Description

Returns true on wrap-around, false otherwise.

*d holds the results of the attempted addition, regardless of whetherwrap-around occurred.

wrapping_add¶

wrapping_add(type,a,b)

Intentionally perform a wrapping addition

Parameters

type: type for result of calculation
a: first addend
b: second addend

Description

Return the potentially wrapped-around addition withouttripping any wrap-around sanitizers that may be enabled.

wrapping_assign_add¶

wrapping_assign_add(var,offset)

Intentionally perform a wrapping increment assignment

Parameters

var: variable to be incremented
offset: amount to add

Description

Incrementsvar byoffset with wrap-around. Returns the resultingvalue ofvar. Will not trip any wrap-around sanitizers.

Returns the new value ofvar.

check_sub_overflow¶

check_sub_overflow(a,b,d)

Calculate subtraction with overflow checking

Parameters

a: minuend; value to subtract from
b: subtrahend; value to subtract froma
d: pointer to store difference

Description

Returns true on wrap-around, false otherwise.

*d holds the results of the attempted subtraction, regardless of whetherwrap-around occurred.

wrapping_sub¶

wrapping_sub(type,a,b)

Intentionally perform a wrapping subtraction

Parameters

type: type for result of calculation
a: minuend; value to subtract from
b: subtrahend; value to subtract froma

Description

Return the potentially wrapped-around subtraction withouttripping any wrap-around sanitizers that may be enabled.

wrapping_assign_sub¶

wrapping_assign_sub(var,offset)

Intentionally perform a wrapping decrement assign

Parameters

var: variable to be decremented
offset: amount to subtract

Description

Decrementsvar byoffset with wrap-around. Returns the resultingvalue ofvar. Will not trip any wrap-around sanitizers.

Returns the new value ofvar.

check_mul_overflow¶

check_mul_overflow(a,b,d)

Calculate multiplication with overflow checking

Parameters

a: first factor
b: second factor
d: pointer to store product

Description

Returns true on wrap-around, false otherwise.

*d holds the results of the attempted multiplication, regardless of whetherwrap-around occurred.

wrapping_mul¶

wrapping_mul(type,a,b)

Intentionally perform a wrapping multiplication

Parameters

type: type for result of calculation
a: first factor
b: second factor

Description

Return the potentially wrapped-around multiplication withouttripping any wrap-around sanitizers that may be enabled.

check_shl_overflow¶

check_shl_overflow(a,s,d)

Calculate a left-shifted value and check overflow

Parameters

a: Value to be shifted
s: How many bits left to shift
d: Pointer to where to store the result

Description

Computes*d = (a <<s)

Returns true if ‘*d’ cannot hold the result or when ‘a <<s’ doesn’tmake sense. Example conditions:

‘a <<s’ causes bits to be lost when stored in*d.
‘s’ is garbage (e.g. negative) or so large that the result of‘a <<s’ is guaranteed to be 0.
‘a’ is negative.
‘a <<s’ sets the sign bit, if any, in ‘*d’.

‘*d’ will hold the results of the attempted shift, but is notconsidered “safe for use” if true is returned.

overflows_type¶

overflows_type(n,T)

helper for checking the overflows between value, variables, or data type

Parameters

n: source constant value or variable to be checked
T: destination variable or data type proposed to storex

Description

Compares thex expression for whether or not it can safely fit inthe storage of the type inT.x andT can have different types.Ifx is a constant expression, this will also resolve to a constantexpression.

Return

true if overflow can occur, false otherwise.

range_overflows¶

range_overflows(start,size,max)

Check if a range is out of bounds

Parameters

start: Start of the range.
size: Size of the range.
max: Exclusive upper boundary.

Description

A strict check to determine if the range [start,start +size) isinvalid with respect to the allowable range [0,max). Any rangestarting at or beyondmax is considered an overflow, even ifsize is 0.

Return

true if the range is out of bounds.

range_overflows_t¶

range_overflows_t(type,start,size,max)

Check if a range is out of bounds

Parameters

type: Data type to use.
start: Start of the range.
size: Size of the range.
max: Exclusive upper boundary.

Description

Same asrange_overflows() but forcing the parameters totype.

Return

true if the range is out of bounds.

range_end_overflows¶

range_end_overflows(start,size,max)

Check if a range’s endpoint is out of bounds

Parameters

start: Start of the range.
size: Size of the range.
max: Exclusive upper boundary.

Description

Checks only if the endpoint of a range (start +size) exceedsmax.Unlikerange_overflows(), a zero-sized range at the boundary (start ==max)is not considered an overflow. Useful for iterator-style checks.

Return

true if the endpoint exceeds the boundary.

range_end_overflows_t¶

range_end_overflows_t(type,start,size,max)

Check if a range’s endpoint is out of bounds

Parameters

type: Data type to use.
start: Start of the range.
size: Size of the range.
max: Exclusive upper boundary.

Description

Same asrange_end_overflows() but forcing the parameters totype.

Return

true if the endpoint exceeds the boundary.

castable_to_type¶

castable_to_type(n,T)

like__same_type(), but also allows for casted literals

Parameters

n: variable or constant value
T: variable or data type

Description

Unlike the__same_type() macro, this allows a constant value as thefirst argument. If this value would not overflow into an assignmentof the second argument’s type, it returns true. Otherwise, this fallsback to__same_type().

size_tsize_mul(size_tfactor1,size_tfactor2)¶: Calculate size_t multiplication with saturation at SIZE_MAX

Parameters

size_tfactor1: first factor
size_tfactor2: second factor

Return

calculatefactor1 *factor2, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Thelvalue must be size_t to avoid implicit type conversion.

size_tsize_add(size_taddend1,size_taddend2)¶: Calculate size_t addition with saturation at SIZE_MAX

Parameters

size_taddend1: first addend
size_taddend2: second addend

Return

calculateaddend1 +addend2, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Thelvalue must be size_t to avoid implicit type conversion.

size_tsize_sub(size_tminuend,size_tsubtrahend)¶: Calculate size_t subtraction with saturation at SIZE_MAX

Parameters

size_tminuend: value to subtract from
size_tsubtrahend: value to subtract fromminuend

Return

calculateminuend -subtrahend, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Forcomposition with thesize_add() andsize_mul() helpers, neitherargument may be SIZE_MAX (or the result with be forced to SIZE_MAX).The lvalue must be size_t to avoid implicit type conversion.

array_size¶

array_size(a,b)

Calculate size of 2-dimensional array.

Parameters

a: dimension one
b: dimension two

Description

Calculates size of 2-dimensional array:a *b.

Return

number of bytes needed to represent the array or SIZE_MAX onoverflow.

array3_size¶

array3_size(a,b,c)

Calculate size of 3-dimensional array.

Parameters

a: dimension one
b: dimension two
c: dimension three

Description

Calculates size of 3-dimensional array:a *b *c.

Return

number of bytes needed to represent the array or SIZE_MAX onoverflow.

flex_array_size¶

flex_array_size(p,member,count)

Calculate size of a flexible array member within an enclosing structure.

Parameters

p: Pointer to the structure.
member: Name of the flexible array member.
count: Number of elements in the array.

Description

Calculates size of a flexible array ofcount number ofmemberelements, at the end of structurep.

Return

number of bytes needed or SIZE_MAX on overflow.

struct_size¶

struct_size(p,member,count)

Calculate size of structure with trailing flexible array.

Parameters

p: Pointer to the structure.
member: Name of the array member.
count: Number of elements in the array.

Description

Calculates size of memory needed for structure ofp followed by anarray ofcount number ofmember elements.

Return

number of bytes needed or SIZE_MAX on overflow.

struct_size_t¶

struct_size_t(type,member,count)

Calculate size of structure with trailing flexible array

Parameters

type: structure type name.
member: Name of the array member.
count: Number of elements in the array.

Description

Calculates size of memory needed for structuretype followed by anarray ofcount number ofmember elements. Prefer usingstruct_size()when possible instead, to keep calculations associated with a specificinstance variable of typetype.

Return

number of bytes needed or SIZE_MAX on overflow.

struct_offset¶

struct_offset(p,member)

Calculate the offset of a member within a struct

Parameters

p: Pointer to the struct
member: Name of the member to get the offset of

Description

Calculates the offset of a particularmember of the structure pointedto byp.

Return

number of bytes to the location ofmember.

__DEFINE_FLEX¶

__DEFINE_FLEX(type,name,member,count,trailer...)

helper macro forDEFINE_FLEX() family. Enables caller macro to pass arbitrary trailing expressions

Parameters

type: structure type name, including “struct” keyword.
name: Name for a variable to define.
member: Name of the array member.
count: Number of elements in the array; must be compile-time const.
trailer...: Trailing expressions for attributes and/or initializers.

_DEFINE_FLEX¶

_DEFINE_FLEX(type,name,member,count,initializer...)

helper macro forDEFINE_FLEX() family. Enables caller macro to pass (different) initializer.

Parameters

type: structure type name, including “struct” keyword.
name: Name for a variable to define.
member: Name of the array member.
count: Number of elements in the array; must be compile-time const.
initializer...: Initializer expression (e.g., pass= { } at minimum).

DEFINE_RAW_FLEX¶

DEFINE_RAW_FLEX(type,name,member,count)

Define an on-stack instance of structure with a trailing flexible array member, when it does not have a __counted_by annotation.

Parameters

type: structure type name, including “struct” keyword.
name: Name for a variable to define.
member: Name of the array member.
count: Number of elements in the array; must be compile-time const.

Description

Define a zeroed, on-stack, instance oftype structure with a trailingflexible array member.Use __struct_size(name) to get compile-time size of it afterwards.Use __member_size(name->member) to get compile-time size ofname members.Use STACK_FLEX_ARRAY_SIZE(name,member) to get compile-time number ofelements in arraymember.

DEFINE_FLEX¶

DEFINE_FLEX(TYPE,NAME,MEMBER,COUNTER,COUNT)

Define an on-stack instance of structure with a trailing flexible array member.

Parameters

TYPE: structure type name, including “struct” keyword.
NAME: Name for a variable to define.
MEMBER: Name of the array member.
COUNTER: Name of the __counted_by member.
COUNT: Number of elements in the array; must be compile-time const.

Description

Define a zeroed, on-stack, instance ofTYPE structure with a trailingflexible array member.Use __struct_size(NAME) to get compile-time size of it afterwards.Use __member_size(NAME->member) to get compile-time size ofNAME members.Use STACK_FLEX_ARRAY_SIZE(name,member) to get compile-time number ofelements in arraymember.

STACK_FLEX_ARRAY_SIZE¶

STACK_FLEX_ARRAY_SIZE(name,array)

helper macro forDEFINE_FLEX() family. Returns the number of elements inarray.

Parameters

name: Name for a variable defined inDEFINE_RAW_FLEX()/DEFINE_FLEX().
array: Name of the array member.

CRC Functions¶

uint8_tcrc4(uint8_tc,uint64_tx,intbits)¶: calculate the 4-bit crc of a value.

Parameters

uint8_tc: starting crc4
uint64_tx: value to checksum
intbits: number of bits inx to checksum

Description

Returns the crc4 value ofx, using polynomial 0b10111.

Thex value is treated as left-aligned, and bits abovebits are ignoredin the crc calculations.

u8crc7_be(u8crc,constu8*buffer,size_tlen)¶: update the CRC7 for the data buffer

Parameters

u8crc: previous CRC7 value
constu8*buffer: data pointer
size_tlen: number of bytes in the buffer

Context

any

Description

Returns the updated CRC7 value.The CRC7 is left-aligned in the byte (the lsbit is always 0), as thatmakes the computation easier, and all callers want it in that form.

voidcrc8_populate_msb(u8table[CRC8_TABLE_SIZE],u8polynomial)¶: fill crc table for given polynomial in reverse bit order.

Parameters

u8table[CRC8_TABLE_SIZE]: table to be filled.
u8polynomial: polynomial for which table is to be filled.

voidcrc8_populate_lsb(u8table[CRC8_TABLE_SIZE],u8polynomial)¶: fill crc table for given polynomial in regular bit order.

Parameters

u8table[CRC8_TABLE_SIZE]: table to be filled.
u8polynomial: polynomial for which table is to be filled.

u8crc8(constu8table[CRC8_TABLE_SIZE],constu8*pdata,size_tnbytes,u8crc)¶: calculate a crc8 over the given input data.

Parameters

constu8table[CRC8_TABLE_SIZE]: crc table used for calculation.
constu8*pdata: pointer to data buffer.
size_tnbytes: number of bytes in data buffer.
u8crc: previous returned crc8 value.

u16crc16(u16crc,constu8*p,size_tlen)¶: compute the CRC-16 for the data buffer

Parameters

u16crc: previous CRC value
constu8*p: data pointer
size_tlen: number of bytes in the buffer

Description

Returns the updated CRC value.

u16crc_ccitt(u16crc,u8const*buffer,size_tlen)¶: recompute the CRC (CRC-CCITT variant) for the data buffer

Parameters

u16crc: previous CRC value
u8const*buffer: data pointer
size_tlen: number of bytes in the buffer

u16crc_itu_t(u16crc,constu8*buffer,size_tlen)¶: Compute the CRC-ITU-T for the data buffer

Parameters

u16crc: previous CRC value
constu8*buffer: data pointer
size_tlen: number of bytes in the buffer

Description

Returns the updated CRC value

u32crc32_le(u32crc,constvoid*p,size_tlen)¶: Compute least-significant-bit-first IEEE CRC-32

Parameters

u32crc: Initial CRC value. ~0 (recommended) or 0 for a new CRC computation, orthe previous CRC value if computing incrementally.
constvoid*p: Pointer to the data buffer
size_tlen: Length of data in bytes

Description

This implements the CRC variant that is often known as the IEEE CRC-32, orsimply CRC-32, and is widely used in Ethernet and other applications:

Polynomial: x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 +
x^7 + x^5 + x^4 + x^2 + x^1 + x^0
Bit order: Least-significant-bit-first
Polynomial in integer form: 0xedb88320

This doesnot invert the CRC at the beginning or end. The caller isexpected to do that if it needs to. Inverting at both ends is recommended.

For new applications, prefer to use CRC-32C instead. Seecrc32c().

Context

Any context

Return

The new CRC value

u32crc32_be(u32crc,constvoid*p,size_tlen)¶: Compute most-significant-bit-first IEEE CRC-32

Parameters

u32crc: Initial CRC value. ~0 (recommended) or 0 for a new CRC computation, orthe previous CRC value if computing incrementally.
constvoid*p: Pointer to the data buffer
size_tlen: Length of data in bytes

Description

crc32_be() is the same ascrc32_le() except thatcrc32_be() computes themost-significant-bit-first variant of the CRC. I.e., within each byte, themost significant bit is processed first (treated as highest order polynomialcoefficient). The same bit order is also used for the CRC value itself:

Polynomial: x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 +
x^7 + x^5 + x^4 + x^2 + x^1 + x^0
Bit order: Most-significant-bit-first
Polynomial in integer form: 0x04c11db7

Context

Any context

Return

The new CRC value

u32crc32c(u32crc,constvoid*p,size_tlen)¶: Compute CRC-32C

Parameters

u32crc: Initial CRC value. ~0 (recommended) or 0 for a new CRC computation, orthe previous CRC value if computing incrementally.
constvoid*p: Pointer to the data buffer
size_tlen: Length of data in bytes

Description

This implements CRC-32C, i.e. the Castagnoli CRC. This is the recommendedCRC variant to use in new applications that want a 32-bit CRC.

Polynomial: x^32 + x^28 + x^27 + x^26 + x^25 + x^23 + x^22 + x^20 + x^19 +
x^18 + x^14 + x^13 + x^11 + x^10 + x^9 + x^8 + x^6 + x^0
Bit order: Least-significant-bit-first
Polynomial in integer form: 0x82f63b78

This doesnot invert the CRC at the beginning or end. The caller isexpected to do that if it needs to. Inverting at both ends is recommended.

Context

Any context

Return

The new CRC value

u64crc64_be(u64crc,constvoid*p,size_tlen)¶: Calculate bitwise big-endian ECMA-182 CRC64

Parameters

u64crc: seed value for computation. 0 or (u64)~0 for a new CRC calculation,or the previous crc64 value if computing incrementally.
constvoid*p: pointer to buffer over which CRC64 is run
size_tlen: length of bufferp

u64crc64_nvme(u64crc,constvoid*p,size_tlen)¶: Calculate CRC64-NVME

Parameters

u64crc: seed value for computation. 0 for a new CRC calculation, or theprevious crc64 value if computing incrementally.
constvoid*p: pointer to buffer over which CRC64 is run
size_tlen: length of bufferp

Description

This computes the CRC64 defined in the NVME NVM Command Set Specification,including the bitwise inversion at the beginning and end.

Base 2 log and power Functions¶

boolis_power_of_2(unsignedlongn)¶: check if a value is a power of two

Parameters

unsignedlongn: the value to check

Description

Determine whether some value is a power of two, where zero isnot considered a power of two.

Return

true ifn is a power of 2, otherwise false.

unsignedlong__roundup_pow_of_two(unsignedlongn)¶: round up to nearest power of two

Parameters

unsignedlongn: value to round up

unsignedlong__rounddown_pow_of_two(unsignedlongn)¶: round down to nearest power of two

Parameters

unsignedlongn: value to round down

const_ilog2¶

const_ilog2(n)

log base 2 of 32-bit or a 64-bit constant unsigned value

Parameters

n: parameter

Description

Use this where sparse expects a true constant expression, e.g. for arrayindices.

ilog2¶

ilog2(n)

log base 2 of 32-bit or a 64-bit unsigned value

Parameters

n: parameter

Description

constant-capable log of base 2 calculation- this can be used to initialise global variables from constant data, hencethe massive ternary operator construction

selects the appropriately-sized optimised version depending on sizeof(n)

roundup_pow_of_two¶

roundup_pow_of_two(n)

round the given value up to nearest power of two

Parameters

n: parameter

Description

round the given value up to the nearest power of two- the result is undefined when n == 0- this can be used to initialise global variables from constant data

rounddown_pow_of_two¶

rounddown_pow_of_two(n)

round the given value down to nearest power of two

Parameters

n: parameter

Description

round the given value down to the nearest power of two- the result is undefined when n == 0- this can be used to initialise global variables from constant data

order_base_2¶

order_base_2(n)

calculate the (rounded up) base 2 order of the argument

Parameters

n: parameter

Description

The first few values calculated by this routine:: ob2(0) = 0ob2(1) = 0ob2(2) = 1ob2(3) = 2ob2(4) = 2ob2(5) = 3... and so on.

bits_per¶

bits_per(n)

calculate the number of bits required for the argument

Parameters

n: parameter

Description

This is constant-capable and can be used for compile timeinitializations, e.g bitfields.

The first few values calculated by this routine:bf(0) = 1bf(1) = 1bf(2) = 2bf(3) = 2bf(4) = 3... and so on.

unsignedintmax_pow_of_two_factor(unsignedintn)¶: return highest power-of-2 factor

Parameters

unsignedintn: parameter

Description

find highest power-of-2 which is evenly divisible into n.0 is returned for n == 0 or 1.

Integer log and power Functions¶

unsignedintintlog2(u32value)¶: computes log2 of a value; the result is shifted left by 24 bits

Parameters

u32value: The value (must be != 0)

Description

to use rational values you can use the following method:

intlog2(value) = intlog2(value * 2^x) - x * 2^24

Some usecase examples:

intlog2(8) will give 3 << 24 = 3 * 2^24
intlog2(9) will give 3 << 24 + ... = 3.16... * 2^24
intlog2(1.5) = intlog2(3) - 2^24 = 0.584... * 2^24

Return

log2(value) * 2^24

unsignedintintlog10(u32value)¶: computes log10 of a value; the result is shifted left by 24 bits

Parameters

u32value: The value (must be != 0)

Description

to use rational values you can use the following method:

intlog10(value) = intlog10(value * 10^x) - x * 2^24

An usecase example:

intlog10(1000) will give 3 << 24 = 3 * 2^24
due to the implementation intlog10(1000) might be not exactly 3 * 2^24

look at intlog2 for similar examples

Return

log10(value) * 2^24

u64int_pow(u64base,unsignedintexp)¶: computes the exponentiation of the given base and exponent

Parameters

u64base: base which will be raised to the given power
unsignedintexp: power to be raised to

Description

Computes: pow(base, exp), i.e.base raised to theexp power

unsignedlongint_sqrt(unsignedlongx)¶: computes the integer square root

Parameters

unsignedlongx: integer of which to calculate the sqrt

Description

Computes: floor(sqrt(x))

u32int_sqrt64(u64x)¶: strongly typed int_sqrt function when minimum 64 bit input is expected.

Parameters

u64x: 64bit integer of which to calculate the sqrt

Division Functions¶

do_div¶

do_div(n,base)

returns 2 values: calculate remainder and update new dividend

Parameters

n: uint64_t dividend (will be updated)
base: uint32_t divisor

Description

Summary:uint32_tremainder=n%base;n=n/base;

Return

(uint32_t)remainder

NOTE

macro parametern is evaluated multiple times,beware of side effects!

u64div_u64_rem(u64dividend,u32divisor,u32*remainder)¶: unsigned 64bit divide with 32bit divisor with remainder

Parameters

u64dividend: unsigned 64bit dividend
u32divisor: unsigned 32bit divisor
u32*remainder: pointer to unsigned 32bit remainder

Return

sets*remainder, then returns dividend / divisor

Description

This is commonly provided by 32bit archs to provide an optimized 64bitdivide.

s64div_s64_rem(s64dividend,s32divisor,s32*remainder)¶: signed 64bit divide with 32bit divisor with remainder

Parameters

s64dividend: signed 64bit dividend
s32divisor: signed 32bit divisor
s32*remainder: pointer to signed 32bit remainder

Return

sets*remainder, then returns dividend / divisor

u64div64_u64_rem(u64dividend,u64divisor,u64*remainder)¶: unsigned 64bit divide with 64bit divisor and remainder

Parameters

u64dividend: unsigned 64bit dividend
u64divisor: unsigned 64bit divisor
u64*remainder: pointer to unsigned 64bit remainder

Return

sets*remainder, then returns dividend / divisor

u64div64_u64(u64dividend,u64divisor)¶: unsigned 64bit divide with 64bit divisor

Parameters

u64dividend: unsigned 64bit dividend
u64divisor: unsigned 64bit divisor

Return

dividend / divisor

s64div64_s64(s64dividend,s64divisor)¶: signed 64bit divide with 64bit divisor

Parameters

s64dividend: signed 64bit dividend
s64divisor: signed 64bit divisor

Return

dividend / divisor

u64div_u64(u64dividend,u32divisor)¶: unsigned 64bit divide with 32bit divisor

Parameters

u64dividend: unsigned 64bit dividend
u32divisor: unsigned 32bit divisor

Description

This is the most common 64bit divide and should be used if possible,as many 32bit archs can optimize this variant better than a full 64bitdivide.

Return

dividend / divisor

s64div_s64(s64dividend,s32divisor)¶: signed 64bit divide with 32bit divisor

Parameters

s64dividend: signed 64bit dividend
s32divisor: signed 32bit divisor

Return

dividend / divisor

u64mul_u64_add_u64_div_u64(u64a,u64b,u64c,u64d)¶: unsigned 64bit multiply, add, and divide

Parameters

u64a: first unsigned 64bit multiplicand
u64b: second unsigned 64bit multiplicand
u64c: unsigned 64bit addend
u64d: unsigned 64bit divisor

Description

Multiply two 64bit values together to generate a 128bit productadd a third value and then divide by a fourth.The Generic code divides by 0 ifd is zero and returns ~0 on overflow.Architecture specific code may trap on zero or overflow.

Return

(a *b +c) /d

mul_u64_u64_div_u64¶

mul_u64_u64_div_u64(a,b,d)

unsigned 64bit multiply and divide

Parameters

a: first unsigned 64bit multiplicand
b: second unsigned 64bit multiplicand
d: unsigned 64bit divisor

Description

Multiply two 64bit values together to generate a 128bit productand then divide by a third value.The Generic code divides by 0 ifd is zero and returns ~0 on overflow.Architecture specific code may trap on zero or overflow.

Return

a *b /d

mul_u64_u64_div_u64_roundup¶

mul_u64_u64_div_u64_roundup(a,b,d)

unsigned 64bit multiply and divide rounded up

Parameters

a: first unsigned 64bit multiplicand
b: second unsigned 64bit multiplicand
d: unsigned 64bit divisor

Description

Multiply two 64bit values together to generate a 128bit productand then divide and round up.The Generic code divides by 0 ifd is zero and returns ~0 on overflow.Architecture specific code may trap on zero or overflow.

Return

(a *b +d - 1) /d

DIV64_U64_ROUND_UP¶

DIV64_U64_ROUND_UP(ll,d)

unsigned 64bit divide with 64bit divisor rounded up

Parameters

ll: unsigned 64bit dividend
d: unsigned 64bit divisor

Description

Divide unsigned 64bit dividend by unsigned 64bit divisorand round up.

Return

dividend / divisor rounded up

DIV_U64_ROUND_UP¶

DIV_U64_ROUND_UP(ll,d)

unsigned 64bit divide with 32bit divisor rounded up

Parameters

ll: unsigned 64bit dividend
d: unsigned 32bit divisor

Description

Divide unsigned 64bit dividend by unsigned 32bit divisorand round up.

Return

dividend / divisor rounded up

DIV64_U64_ROUND_CLOSEST¶

DIV64_U64_ROUND_CLOSEST(dividend,divisor)

unsigned 64bit divide with 64bit divisor rounded to nearest integer

Parameters

dividend: unsigned 64bit dividend
divisor: unsigned 64bit divisor

Description

Divide unsigned 64bit dividend by unsigned 64bit divisorand round to closest integer.

Return

dividend / divisor rounded to nearest integer

DIV_U64_ROUND_CLOSEST¶

DIV_U64_ROUND_CLOSEST(dividend,divisor)

unsigned 64bit divide with 32bit divisor rounded to nearest integer

Parameters

dividend: unsigned 64bit dividend
divisor: unsigned 32bit divisor

Description

Divide unsigned 64bit dividend by unsigned 32bit divisorand round to closest integer.

Return

dividend / divisor rounded to nearest integer

DIV_S64_ROUND_CLOSEST¶

DIV_S64_ROUND_CLOSEST(dividend,divisor)

signed 64bit divide with 32bit divisor rounded to nearest integer

Parameters

dividend: signed 64bit dividend
divisor: signed 32bit divisor

Description

Divide signed 64bit dividend by signed 32bit divisorand round to closest integer.

Return

dividend / divisor rounded to nearest integer

u64roundup_u64(u64x,u32y)¶: Round up a 64bit value to the next specified 32bit multiple

Parameters

u64x: the value to up
u32y: 32bit multiple to round up to

Description

Roundsx to the next multiple ofy. For 32bitx values, see roundup andthe fasterround_up() for powers of 2.

Return

rounded up value.

unsignedlonggcd(unsignedlonga,unsignedlongb)¶: calculate and return the greatest common divisor of 2 unsigned longs

Parameters

unsignedlonga: first value
unsignedlongb: second value

UUID/GUID¶

voidgenerate_random_uuid(unsignedcharuuid[16])¶: generate a random UUID

Parameters

unsignedcharuuid[16]: where to put the generated UUID

Description

Random UUID interface

Used to create a Boot ID or a filesystem UUID/GUID, but can beuseful for other kernel drivers.

booluuid_is_valid(constchar*uuid)¶: checks if a UUID string is valid

Parameters

constchar*uuid: UUID string to check

Description

It checks if the UUID string is following the format:: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

where x is a hex digit.

Return

true if input is valid UUID string.

Kernel IPC facilities¶

IPC utilities¶

intipc_init(void)¶: initialise ipc subsystem

Parameters

void: no arguments

Description

The various sysv ipc resources (semaphores, messages and sharedmemory) are initialised.

A callback routine is registered into the memory hotplug notifierchain: since msgmni scales to lowmem this callback routine will becalled upon successful memory add / remove to recompute msmgni.

voidipc_init_ids(structipc_ids*ids)¶: initialise ipc identifiers

Parameters

structipc_ids*ids: ipc identifier set

Description

Set up the sequence range to use for the ipc identifier range (limitedbelow ipc_mni) then initialise the keys hashtable and ids idr.

voidipc_init_proc_interface(constchar*path,constchar*header,intids,int(*show)(structseq_file*,void*))¶: create a proc interface for sysipc types using a seq_file interface.

Parameters

constchar*path: Path in procfs
constchar*header: Banner to be printed at the beginning of the file.
intids: ipc id table to iterate.
int(*show)(structseq_file*,void*): show routine.

structkern_ipc_perm*ipc_findkey(structipc_ids*ids,key_tkey)¶: find a key in an ipc identifier set

Parameters

structipc_ids*ids: ipc identifier set
key_tkey: key to find

Description

Returns the locked pointer to the ipc structure if found or NULLotherwise. If key is found ipc points to the owning ipc structure

Called with writer ipc_ids.rwsem held.

intipc_addid(structipc_ids*ids,structkern_ipc_perm*new,intlimit)¶: add an ipc identifier

Parameters

structipc_ids*ids: ipc identifier set
structkern_ipc_perm*new: new ipc permission set
intlimit: limit for the number of used ids

Description

Add an entry ‘new’ to the ipc ids idr. The permissions object isinitialised and the first free entry is set up and the index assignedis returned. The ‘new’ entry is returned in a locked state on success.

On failure the entry is not locked and a negative err-code is returned.The caller must useipc_rcu_putref() to free the identifier.

Called with writer ipc_ids.rwsem held.

intipcget_new(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)¶: create a new ipc object

Parameters

structipc_namespace*ns: ipc namespace
structipc_ids*ids: ipc identifier set
conststructipc_ops*ops: the actual creation routine to call
structipc_params*params: its parameters

Description

This routine is called by sys_msgget,sys_semget() andsys_shmget()when the key is IPC_PRIVATE.

intipc_check_perms(structipc_namespace*ns,structkern_ipc_perm*ipcp,conststructipc_ops*ops,structipc_params*params)¶: check security and permissions for an ipc object

Parameters

structipc_namespace*ns: ipc namespace
structkern_ipc_perm*ipcp: ipc permission set
conststructipc_ops*ops: the actual security routine to call
structipc_params*params: its parameters

Description

This routine is called bysys_msgget(),sys_semget() andsys_shmget()when the key is not IPC_PRIVATE and that key already exists in theds IDR.

On success, the ipc id is returned.

It is called with ipc_ids.rwsem and ipcp->lock held.

intipcget_public(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)¶: get an ipc object or create a new one

Parameters

structipc_namespace*ns: ipc namespace
structipc_ids*ids: ipc identifier set
conststructipc_ops*ops: the actual creation routine to call
structipc_params*params: its parameters

Description

This routine is called by sys_msgget,sys_semget() andsys_shmget()when the key is not IPC_PRIVATE.It adds a new entry if the key is not found and does some permission/ security checkings if the key is found.

On success, the ipc id is returned.

voidipc_kht_remove(structipc_ids*ids,structkern_ipc_perm*ipcp)¶: remove an ipc from the key hashtable

Parameters

structipc_ids*ids: ipc identifier set
structkern_ipc_perm*ipcp: ipc perm structure containing the key to remove

Description

ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.

intipc_search_maxidx(structipc_ids*ids,intlimit)¶: search for the highest assigned index

Parameters

structipc_ids*ids: ipc identifier set
intlimit: known upper limit for highest assigned index

Description

The function determines the highest assigned index inids. It is intendedto be called when ids->max_idx needs to be updated.Updating ids->max_idx is necessary when the current highest index ipcobject is deleted.If no ipc object is allocated, then -1 is returned.

ipc_ids.rwsem needs to be held by the caller.

voidipc_rmid(structipc_ids*ids,structkern_ipc_perm*ipcp)¶: remove an ipc identifier

Parameters

structipc_ids*ids: ipc identifier set
structkern_ipc_perm*ipcp: ipc perm structure containing the identifier to remove

Description

ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.

voidipc_set_key_private(structipc_ids*ids,structkern_ipc_perm*ipcp)¶: switch the key of an existing ipc to IPC_PRIVATE

Parameters

structipc_ids*ids: ipc identifier set
structkern_ipc_perm*ipcp: ipc perm structure containing the key to modify

Description

ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.

intipcperms(structipc_namespace*ns,structkern_ipc_perm*ipcp,shortflag)¶: check ipc permissions

Parameters

structipc_namespace*ns: ipc namespace
structkern_ipc_perm*ipcp: ipc permission set
shortflag: desired permission set

Description

Check user, group, other permissions for accessto ipc resources. return 0 if allowed

flag will most probably be 0 orS_...UGO from <linux/stat.h>

voidkernel_to_ipc64_perm(structkern_ipc_perm*in,structipc64_perm*out)¶: convert kernel ipc permissions to user

Parameters

structkern_ipc_perm*in: kernel permissions
structipc64_perm*out: new style ipc permissions

Description

Turn the kernel objectin into a set of permissions descriptionsfor returning to userspace (out).

voidipc64_perm_to_ipc_perm(structipc64_perm*in,structipc_perm*out)¶: convert new ipc permissions to old

Parameters

structipc64_perm*in: new style ipc permissions
structipc_perm*out: old style ipc permissions

Description

Turn the new style permissions objectin into a compatibilityobject and store it into theout pointer.

structkern_ipc_perm*ipc_obtain_object_idr(structipc_ids*ids,intid)¶: Look for an id in the ipc ids idr and return associated ipc object.

Parameters

structipc_ids*ids: ipc identifier set
intid: ipc id to look for

Description

Call inside the RCU critical section.The ipc object isnot locked on exit.

structkern_ipc_perm*ipc_obtain_object_check(structipc_ids*ids,intid)¶: Similar toipc_obtain_object_idr() but also checks the ipc object sequence number.

Parameters

structipc_ids*ids: ipc identifier set
intid: ipc id to look for

Description

Call inside the RCU critical section.The ipc object isnot locked on exit.

intipcget(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)¶: Common sys_*get() code

Parameters

structipc_namespace*ns: namespace
structipc_ids*ids: ipc identifier set
conststructipc_ops*ops: operations to be called on ipc object creation, permission checksand further checks
structipc_params*params: the parameters needed by the previous operations.

Description

Common routine called bysys_msgget(),sys_semget() andsys_shmget().

intipc_update_perm(structipc64_perm*in,structkern_ipc_perm*out)¶: update the permissions of an ipc object

Parameters

structipc64_perm*in: the permission given as input.
structkern_ipc_perm*out: the permission of the ipc to set.

structkern_ipc_perm*ipcctl_obtain_check(structipc_namespace*ns,structipc_ids*ids,intid,intcmd,structipc64_perm*perm,intextra_perm)¶: retrieve an ipc object and check permissions

Parameters

structipc_namespace*ns: ipc namespace
structipc_ids*ids: the table of ids where to look for the ipc
intid: the id of the ipc to retrieve
intcmd: the cmd to check
structipc64_perm*perm: the permission to set
intextra_perm: one extra permission parameter used by msq

Description

This function does some common audit and permissions check for some IPC_XXXcmd and is called from semctl_down, shmctl_down and msgctl_down.

It:

retrieves the ipc object with the given id in the given table.
performs some audit and permission check, depending on the given cmd
returns a pointer to the ipc object or otherwise, the correspondingerror.

Call holding the both the rwsem and the rcu read lock.

intipc_parse_version(int*cmd)¶: ipc call version

Parameters

int*cmd: pointer to command

Description

Return IPC_64 for new style IPC and IPC_OLD for old style IPC.Thecmd value is turned from an encoding command and version intojust the command code.

structkern_ipc_perm*sysvipc_find_ipc(structipc_ids*ids,loff_t*pos)¶: Find and lock the ipc structure based on seq pos

Parameters

structipc_ids*ids: ipc identifier set
loff_t*pos: expected position

Description

The function finds an ipc structure, based on the sequence filepositionpos. If there is no ipc structure at positionpos, thenthe successor is selected.If a structure is found, then it is locked (bothrcu_read_lock() andipc_lock_object()) andpos is set to the position needed to locatethe found ipc structure.If nothing is found (i.e. EOF),pos is not modified.

The function returns the found ipc structure, or NULL at EOF.

FIFO Buffer¶

kfifo interface¶

DECLARE_KFIFO_PTR¶

DECLARE_KFIFO_PTR(fifo,type)

macro to declare a fifo pointer object

Parameters

fifo: name of the declared fifo
type: type of the fifo elements

DECLARE_KFIFO¶

DECLARE_KFIFO(fifo,type,size)

macro to declare a fifo object

Parameters

fifo: name of the declared fifo
type: type of the fifo elements
size: the number of elements in the fifo, this must be a power of 2

INIT_KFIFO¶

INIT_KFIFO(fifo)

Initialize a fifo declared by DECLARE_KFIFO

Parameters

fifo: name of the declared fifo datatype

DEFINE_KFIFO¶

DEFINE_KFIFO(fifo,type,size)

macro to define and initialize a fifo

Parameters

fifo: name of the declared fifo datatype
type: type of the fifo elements
size: the number of elements in the fifo, this must be a power of 2

Note

the macro can be used for global and local fifo data type variables.

kfifo_initialized¶

kfifo_initialized(fifo)

Check if the fifo is initialized

Parameters

fifo: address of the fifo to check

Description

Returntrue if fifo is initialized, otherwisefalse.Assumes the fifo was 0 before.

kfifo_esize¶

kfifo_esize(fifo)

returns the size of the element managed by the fifo

Parameters

fifo: address of the fifo to be used

kfifo_recsize¶

kfifo_recsize(fifo)

returns the size of the record length field

Parameters

fifo: address of the fifo to be used

kfifo_size¶

kfifo_size(fifo)

returns the size of the fifo in elements

Parameters

fifo: address of the fifo to be used

kfifo_reset¶

kfifo_reset(fifo)

removes the entire fifo content

Parameters

fifo: address of the fifo to be used

Note

usage ofkfifo_reset() is dangerous. It should be only called when thefifo is exclusived locked or when it is secured that no other thread isaccessing the fifo.

kfifo_reset_out¶

kfifo_reset_out(fifo)

skip fifo content

Parameters

fifo: address of the fifo to be used

Note

The usage ofkfifo_reset_out() is safe until it will be only calledfrom the reader thread and there is only one concurrent reader. Otherwiseit is dangerous and must be handled in the same way askfifo_reset().

kfifo_len¶

kfifo_len(fifo)

returns the number of used elements in the fifo

Parameters

fifo: address of the fifo to be used

kfifo_is_empty¶

kfifo_is_empty(fifo)

returns true if the fifo is empty

Parameters

fifo: address of the fifo to be used

kfifo_is_empty_spinlocked¶

kfifo_is_empty_spinlocked(fifo,lock)

returns true if the fifo is empty using a spinlock for locking

Parameters

fifo: address of the fifo to be used
lock: spinlock to be used for locking

kfifo_is_empty_spinlocked_noirqsave¶

kfifo_is_empty_spinlocked_noirqsave(fifo,lock)

returns true if the fifo is empty using a spinlock for locking, doesn’t disable interrupts

Parameters

fifo: address of the fifo to be used
lock: spinlock to be used for locking

kfifo_is_full¶

kfifo_is_full(fifo)

returns true if the fifo is full

Parameters

fifo: address of the fifo to be used

kfifo_avail¶

kfifo_avail(fifo)

returns the number of unused elements in the fifo

Parameters

fifo: address of the fifo to be used

kfifo_skip_count¶

kfifo_skip_count(fifo,count)

skip output data

Parameters

fifo: address of the fifo to be used
count: count of data to skip

kfifo_skip¶

kfifo_skip(fifo)

skip output data

Parameters

fifo: address of the fifo to be used

kfifo_peek_len¶

kfifo_peek_len(fifo)

gets the size of the next fifo record

Parameters

fifo: address of the fifo to be used

Description

This function returns the size of the next fifo record in number of bytes.

kfifo_alloc¶

kfifo_alloc(fifo,size,gfp_mask)

dynamically allocates a new fifo buffer

Parameters

fifo: pointer to the fifo
size: the number of elements in the fifo, this must be a power of 2
gfp_mask: get_free_pages mask, passed tokmalloc()

Description

This macro dynamically allocates a new fifo buffer.

The number of elements will be rounded-up to a power of 2.The fifo will be release withkfifo_free().Return 0 if no error, otherwise an error code.

kfifo_alloc_node¶

kfifo_alloc_node(fifo,size,gfp_mask,node)

dynamically allocates a new fifo buffer on a NUMA node

Parameters

fifo: pointer to the fifo
size: the number of elements in the fifo, this must be a power of 2
gfp_mask: get_free_pages mask, passed tokmalloc()
node: NUMA node to allocate memory on

Description

This macro dynamically allocates a new fifo buffer with NUMA node awareness.

The number of elements will be rounded-up to a power of 2.The fifo will be release withkfifo_free().Return 0 if no error, otherwise an error code.

kfifo_free¶

kfifo_free(fifo)

frees the fifo

Parameters

fifo: the fifo to be freed

kfifo_init¶

kfifo_init(fifo,buffer,size)

initialize a fifo using a preallocated buffer

Parameters

fifo: the fifo to assign the buffer
buffer: the preallocated buffer to be used
size: the size of the internal buffer, this have to be a power of 2

Description

This macro initializes a fifo using a preallocated buffer.

The number of elements will be rounded-up to a power of 2.Return 0 if no error, otherwise an error code.

kfifo_put¶

kfifo_put(fifo,val)

put data into the fifo

Parameters

fifo: address of the fifo to be used
val: the data to be added

Description

This macro copies the given value into the fifo.It returns 0 if the fifo was full. Otherwise it returns the numberprocessed elements.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_get¶

kfifo_get(fifo,val)

get data from the fifo

Parameters

fifo: address of the fifo to be used
val: address where to store the data

Description

This macro reads the data from the fifo.It returns 0 if the fifo was empty. Otherwise it returns the numberprocessed elements.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_peek¶

kfifo_peek(fifo,val)

get data from the fifo without removing

Parameters

fifo: address of the fifo to be used
val: address where to store the data

Description

This reads the data from the fifo without removing it from the fifo.It returns 0 if the fifo was empty. Otherwise it returns the numberprocessed elements.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_in¶

kfifo_in(fifo,buf,n)

put data into the fifo

Parameters

fifo: address of the fifo to be used
buf: the data to be added
n: number of elements to be added

Description

This macro copies the given buffer into the fifo and returns thenumber of copied elements.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_in_spinlocked¶

kfifo_in_spinlocked(fifo,buf,n,lock)

put data into the fifo using a spinlock for locking

Parameters

fifo: address of the fifo to be used
buf: the data to be added
n: number of elements to be added
lock: pointer to the spinlock to use for locking

Description

This macro copies the given values buffer into the fifo and returns thenumber of copied elements.

kfifo_in_spinlocked_noirqsave¶

kfifo_in_spinlocked_noirqsave(fifo,buf,n,lock)

put data into fifo using a spinlock for locking, don’t disable interrupts

Parameters

fifo: address of the fifo to be used
buf: the data to be added
n: number of elements to be added
lock: pointer to the spinlock to use for locking

Description

This is a variant ofkfifo_in_spinlocked() but uses spin_lock/unlock()for locking and doesn’t disable interrupts.

kfifo_out¶

kfifo_out(fifo,buf,n)

get data from the fifo

Parameters

fifo: address of the fifo to be used
buf: pointer to the storage buffer
n: max. number of elements to get

Description

This macro gets some data from the fifo and returns the numbers of elementscopied.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_out_spinlocked¶

kfifo_out_spinlocked(fifo,buf,n,lock)

get data from the fifo using a spinlock for locking

Parameters

fifo: address of the fifo to be used
buf: pointer to the storage buffer
n: max. number of elements to get
lock: pointer to the spinlock to use for locking

Description

This macro gets the data from the fifo and returns the numbers of elementscopied.

kfifo_out_spinlocked_noirqsave¶

kfifo_out_spinlocked_noirqsave(fifo,buf,n,lock)

get data from the fifo using a spinlock for locking, don’t disable interrupts

Parameters

fifo: address of the fifo to be used
buf: pointer to the storage buffer
n: max. number of elements to get
lock: pointer to the spinlock to use for locking

Description

This is a variant ofkfifo_out_spinlocked() which uses spin_lock/unlock()for locking and doesn’t disable interrupts.

kfifo_from_user¶

kfifo_from_user(fifo,from,len,copied)

puts some data from user space into the fifo

Parameters

fifo: address of the fifo to be used
from: pointer to the data to be added
len: the length of the data to be added
copied: pointer to output variable to store the number of copied bytes

Description

This macro copies at mostlen bytes from thefrom into thefifo, depending of the available space and returns -EFAULT/0.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_to_user¶

kfifo_to_user(fifo,to,len,copied)

copies data from the fifo into user space

Parameters

fifo: address of the fifo to be used
to: where the data must be copied
len: the size of the destination buffer
copied: pointer to output variable to store the number of copied bytes

Description

This macro copies at mostlen bytes from the fifo into theto buffer and returns -EFAULT/0.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_dma_in_prepare_mapped¶

kfifo_dma_in_prepare_mapped(fifo,sgl,nents,len,dma)

setup a scatterlist for DMA input

Parameters

fifo: address of the fifo to be used
sgl: pointer to the scatterlist array
nents: number of entries in the scatterlist array
len: number of elements to transfer
dma: mapped dma address to fill intosgl

Description

This macro fills a scatterlist for DMA input.It returns the number entries in the scatterlist array.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.

kfifo_dma_in_finish¶

kfifo_dma_in_finish(fifo,len)

finish a DMA IN operation

Parameters

fifo: address of the fifo to be used
len: number of bytes to received

Description

This macro finishes a DMA IN operation. The in counter will be updated bythe len parameter. No error checking will be done.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.

kfifo_dma_out_prepare_mapped¶

kfifo_dma_out_prepare_mapped(fifo,sgl,nents,len,dma)

setup a scatterlist for DMA output

Parameters

fifo: address of the fifo to be used
sgl: pointer to the scatterlist array
nents: number of entries in the scatterlist array
len: number of elements to transfer
dma: mapped dma address to fill intosgl

Description

This macro fills a scatterlist for DMA output which at mostlen bytesto transfer.It returns the number entries in the scatterlist array.A zero means there is no space available and the scatterlist is not filled.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.

kfifo_dma_out_finish¶

kfifo_dma_out_finish(fifo,len)

finish a DMA OUT operation

Parameters

fifo: address of the fifo to be used
len: number of bytes transferred

Description

This macro finishes a DMA OUT operation. The out counter will be updated bythe len parameter. No error checking will be done.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.

kfifo_out_peek¶

kfifo_out_peek(fifo,buf,n)

gets some data from the fifo

Parameters

fifo: address of the fifo to be used
buf: pointer to the storage buffer
n: max. number of elements to get

Description

This macro gets the data from the fifo and returns the numbers of elementscopied. The data is not removed from the fifo.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_out_linear¶

kfifo_out_linear(fifo,tail,n)

gets a tail of/offset to available data

Parameters

fifo: address of the fifo to be used
tail: pointer to an unsigned int to store the value of tail
n: max. number of elements to point at

Description

This macro obtains the offset (tail) to the available data in the fifobuffer and returns thenumbers of elements available. It returns the available count till the endof data or till the end of the buffer. So that it can be used for lineardata processing (likememcpy() of (fifo->data +tail) with countreturned).

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_out_linear_ptr¶

kfifo_out_linear_ptr(fifo,ptr,n)

gets a pointer to the available data

Parameters

fifo: address of the fifo to be used
ptr: pointer to data to store the pointer to tail
n: max. number of elements to point at

Description

Similarly tokfifo_out_linear(), this macro obtains the pointer to theavailable data in the fifo buffer and returns the numbers of elementsavailable. It returns the available count till the end of available data ortill the end of the buffer. So that it can be used for linear dataprocessing (likememcpy() ofptr with count returned).

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

relay interface support¶

Relay interface support is designed to provide an efficient mechanismfor tools and facilities to relay large amounts of data from kernelspace to user space.

relay interface¶

intrelay_buf_full(structrchan_buf*buf)¶: boolean, is the channel buffer full?

Parameters

structrchan_buf*buf: channel buffer

Description

Returns 1 if the buffer is full, 0 otherwise.

voidrelay_reset(structrchan*chan)¶: reset the channel

Parameters

structrchan*chan: the channel

Description

This has the effect of erasing all data from all channel buffersand restarting the channel in its initial state. The buffersare not freed, so any mappings are still in effect.
NOTE. Care should be taken that the channel isn’t actuallybeing used by anything when this call is made.

structrchan*relay_open(constchar*base_filename,structdentry*parent,size_tsubbuf_size,size_tn_subbufs,conststructrchan_callbacks*cb,void*private_data)¶: create a new relay channel

Parameters

constchar*base_filename: base name of files to create
structdentry*parent: dentry of parent directory,NULL for root directory or buffer
size_tsubbuf_size: size of sub-buffers
size_tn_subbufs: number of sub-buffers
conststructrchan_callbacks*cb: client callback functions
void*private_data: user-defined data

Description

Returns channel pointer if successful,NULL otherwise.
Creates a channel buffer for each cpu using the sizes andattributes specified. The created channel buffer fileswill be named base_filename0...base_filenameN-1. Filepermissions will beS_IRUSR.

size_trelay_switch_subbuf(structrchan_buf*buf,size_tlength)¶: switch to a new sub-buffer

Parameters

structrchan_buf*buf: channel buffer
size_tlength: size of current event

Description

Returns either the length passed in or 0 if full.
Performs sub-buffer-switch tasks such as invoking callbacks,updating padding counts, waking up readers, etc.

voidrelay_subbufs_consumed(structrchan*chan,unsignedintcpu,size_tsubbufs_consumed)¶: update the buffer’s sub-buffers-consumed count

Parameters

structrchan*chan: the channel
unsignedintcpu: the cpu associated with the channel buffer to update
size_tsubbufs_consumed: number of sub-buffers to add to current buf’s count

Description

Adds to the channel buffer’s consumed sub-buffer count.subbufs_consumed should be the number of sub-buffers newly consumed,not the total consumed.
NOTE. Kernel clients don’t need to call this function if the channelmode is ‘overwrite’.

voidrelay_close(structrchan*chan)¶: close the channel

Parameters

structrchan*chan: the channel

Description

Closes all channel buffers and frees the channel.

voidrelay_flush(structrchan*chan)¶: close the channel

Parameters

structrchan*chan: the channel

Description

Flushes all channel buffers, i.e. forces buffer switch.

intrelay_mmap_prepare_buf(structrchan_buf*buf,structvm_area_desc*desc)¶

mmap channel buffer to process address space

Parameters

structrchan_buf*buf: the relay channel buffer
structvm_area_desc*desc: describing what to map

Description

Returns 0 if ok, negative on error
Caller should already have grabbed mmap_lock.

void*relay_alloc_buf(structrchan_buf*buf,size_t*size)¶: allocate a channel buffer

Parameters

structrchan_buf*buf: the buffer struct
size_t*size: total size of the buffer

Description

Returns a pointer to the resulting buffer,NULL if unsuccessful. Thepassed in size will get page aligned, if it isn’t already.

structrchan_buf*relay_create_buf(structrchan*chan)¶: allocate and initialize a channel buffer

Parameters

structrchan*chan: the relay channel

Description

Returns channel buffer if successful,NULL otherwise.

voidrelay_destroy_channel(structkref*kref)¶: free the channel struct

Parameters

structkref*kref: target kernel reference that contains the relay channel

Description

Should only be called fromkref_put().

voidrelay_destroy_buf(structrchan_buf*buf)¶: destroy an rchan_bufstructand associated buffer

Parameters

structrchan_buf*buf: the buffer struct

voidrelay_remove_buf(structkref*kref)¶: remove a channel buffer

Parameters

structkref*kref: target kernel reference that contains the relay buffer

Description

Removes the file from the filesystem, which also frees therchan_buf_struct and the channel buffer. Should only be called fromkref_put().

intrelay_buf_empty(structrchan_buf*buf)¶: boolean, is the channel buffer empty?

Parameters

structrchan_buf*buf: channel buffer

Description

Returns 1 if the buffer is empty, 0 otherwise.

voidwakeup_readers(structirq_work*work)¶: wake up readers waiting on a channel

Parameters

structirq_work*work: contains the channel buffer

Description

This is the function used to defer reader waking

void__relay_reset(structrchan_buf*buf,unsignedintinit)¶: reset a channel buffer

Parameters

structrchan_buf*buf: the channel buffer
unsignedintinit: 1 if this is a first-time initialization

Description

Seerelay_reset() for description of effect.

voidrelay_close_buf(structrchan_buf*buf)¶: close a channel buffer

Parameters

structrchan_buf*buf: channel buffer

Description

Marks the buffer finalized and restores the default callbacks.The channel buffer and channel buffer data structure are then freedautomatically when the last reference is given up.

size_trelay_stats(structrchan*chan,intflags)¶: get channel buffer statistics

Parameters

structrchan*chan: the channel
intflags: select particular information to get

Description

Returns the count of certain field that caller specifies.

intrelay_file_open(structinode*inode,structfile*filp)¶: open file op for relay files

Parameters

structinode*inode: the inode
structfile*filp: the file

Description

Increments the channel buffer refcount.

intrelay_file_mmap_prepare(structvm_area_desc*desc)¶: mmap file op for relay files

Parameters

structvm_area_desc*desc: describing what to map

Description

Calls uponrelay_mmap_prepare_buf() to map the file into user space.

__poll_trelay_file_poll(structfile*filp,poll_table*wait)¶: poll file op for relay files

Parameters

structfile*filp: the file
poll_table*wait: poll table

Description

Poll implemention.

intrelay_file_release(structinode*inode,structfile*filp)¶: release file op for relay files

Parameters

structinode*inode: the inode
structfile*filp: the file

Description

Decrements the channel refcount, as the filesystem isno longer using it.

size_trelay_file_read_subbuf_avail(size_tread_pos,structrchan_buf*buf)¶: return bytes available in sub-buffer

Parameters

size_tread_pos: file read position
structrchan_buf*buf: relay channel buffer

size_trelay_file_read_start_pos(structrchan_buf*buf)¶: find the first available byte to read

Parameters

structrchan_buf*buf: relay channel buffer

Description

If the read_pos is in the middle of padding, return theposition of the first actually available byte, otherwisereturn the original value.

size_trelay_file_read_end_pos(structrchan_buf*buf,size_tread_pos,size_tcount)¶: return the new read position

Parameters

structrchan_buf*buf: relay channel buffer
size_tread_pos: file read position
size_tcount: number of bytes to be read

Module Support¶

Kernel module auto-loading¶

int__request_module(boolwait,constchar*fmt,...)¶: try to load a kernel module

Parameters

boolwait: wait (or not) for the operation to complete
constchar*fmt: printf style format string for the name of the module
...: arguments as specified in the format string

Description

Load a module using the user mode module loader. The function returnszero on success or a negative errno code or positive exit code from“modprobe” on failure. Note that a successful module load does not meanthe module did not then unload and exit on an error of its own. Callersmust check that the service they requested is now available not blindlyinvoke it.

If module auto-loading support is disabled then this functionsimply returns -ENOENT.

Module debugging¶

Enabling CONFIG_MODULE_STATS enables module debugging statistics whichare useful to monitor and root cause memory pressure issues with moduleloading. These statistics are useful to allow us to improve productionworkloads.

The current module debugging statistics supported help keep track of moduleloading failures to enable improvements either for kernel module auto-loadingusage (request_module()) or interactions with userspace. Statistics areprovided to track all possible failures in thefinit_module() path and memorywasted in this process space. Each of the failure counters are associatedto a type of module loading failure which is known to incur a certain amountof memory allocation loss. In the worst case loading a module will fail aftera 3 step memory allocation process:

memory allocated withkernel_read_file_from_fd()
module decompression processes the file read fromkernel_read_file_from_fd(), andvmap() is used to mapthe decompressed module to a new local buffer which representsa copy of the decompressed module passed from userspace. The bufferfromkernel_read_file_from_fd() is freed right away.
layout_and_allocate() allocates space for the final restingplace where we would keep the module if it were to be processedsuccessfully.

If a failure occurs after these three different allocations only onecounter will be incremented with the summation of the allocated bytes freedincurred during this failure. Likewise, if module loading failed only afterstep b) a separate counter is used and incremented for the bytes freed andnot used during both of those allocations.

Virtual memory space can be limited, for example on x86 virtual memory sizedefaults to 128 MiB. We should strive to limit and avoid wasting virtualmemory allocations when possible. These module debugging statistics helpto evaluate how much memory is being wasted on bootup due to module loadingfailures.

All counters are designed to be incremental. Atomic counters are used so toremain simple and avoid delays and deadlocks.

dup_failed_modules - tracks duplicate failed modules¶

Linked list of modules which failed to be loaded because an already existingmodule with the same name was already being processed or already loaded.Thefinit_module() system call incurs heavy virtual memory allocations. Inthe worst case anfinit_module() system call can end up allocating virtualmemory 3 times:

kernel_read_file_from_fd() call usesvmalloc()
optional module decompression usesvmap()
layout_andallocate() can usevzalloc() or an arch specific variation ofvmalloc to deal with ELF sections requiring special permissions

In practice on a typical boot today mostfinit_module() calls fail due tothe module with the same name already being loaded or about to be processed.All virtual memory allocated to these failed modules will be freed withno functional use.

To help with this the dup_failed_modules allows us to track modules whichfailed to load due to the fact that a module was already loaded or beingprocessed. There are only two points at which we can fail such calls,we list them below along with the number of virtual memory allocationcalls:

FAIL_DUP_MOD_BECOMING: at the end ofearly_mod_check() beforelayout_and_allocate().- with module decompression: 2 virtual memory allocation calls- without module decompression: 1 virtual memory allocation calls
FAIL_DUP_MOD_LOAD: afterlayout_and_allocate() onadd_unformed_module()- with module decompression 3 virtual memory allocation calls- without module decompression 2 virtual memory allocation calls

We should strive to get this list to be as small as possible. If this listis not empty it is a reflection of possible work or optimizations possibleeither in-kernel or in userspace.

module statistics debugfs counters¶

The total amount of wasted virtual memory allocation space during moduleloading can be computed by adding the total from the summation:

invalid_kread_bytes +invalid_decompress_bytes +invalid_becoming_bytes +invalid_mod_bytes

The following debugfs counters are available to inspect module loadingfailures:

total_mod_size: total bytes ever used by all modules we’ve dealt with onthis system
total_text_size: total bytes of the .text and .init.text ELF sectionsizes we’ve dealt with on this system
invalid_kread_bytes: bytes allocated and then freed on failures whichhappen due to the initialkernel_read_file_from_fd().kernel_read_file_from_fd()usesvmalloc(). These should typically not happen unless your system isunder memory pressure.
invalid_decompress_bytes: number of bytes allocated and freed due tomemory allocations in the module decompression path that usevmap().These typically should not happen unless your system is under memorypressure.
invalid_becoming_bytes: total number of bytes allocated and freed usedto read the kernel module userspace wants us to read before wepromote it to be processed to be added to ourmodules linked list. Thesefailures can happen if we had a check in between a successfulkernel_read_file_from_fd()call and right before we allocate the our private memory for the modulewhich would be kept if the module is successfully loaded. The most commonreason for this failure is when userspace is racing to load a modulewhich it does not yet see loaded. The first module to succeed inadd_unformed_module() will add a module to ourmodules list andsubsequent loads of modules with the same name will error out at theend ofearly_mod_check(). The check formodule_patient_check_exists()at the end ofearly_mod_check() prevents duplicate allocationsonlayout_and_allocate() for modules already being processed. Theseduplicate failed modules are non-fatal, however they typically areindicative of userspace not seeing a module in userspace loaded yet andunnecessarily trying to load a module before the kernel even has a chanceto begin to process prior requests. Although duplicate failures can benon-fatal, we should try to reducevmalloc() pressure proactively, soideally after boot this will be close to as 0 as possible. If moduledecompression was used we also add to this counter the cost of theinitialkernel_read_file_from_fd() of the compressed module. If moduledecompression was not used the value represents the total allocated andfreed bytes inkernel_read_file_from_fd() calls for these type offailures. These failures can occur because:
module_sig_check() - module signature checks
elf_validity_cache_copy() - some ELF validation issue
early_mod_check():
blacklisting
failed to rewrite section headers
version magic
live patch requirements didn’t check out
the module was detected as being already present
invalid_mod_bytes: these are the total number of bytes allocated andfreed due to failures after we did all the sanity checks of the modulewhich userspace passed to us and after our first check that the moduleis unique. A module can still fail to load if we detect the module isloaded after we allocate space for it withlayout_and_allocate(), we dothis check right before processing the module as live and run itsinitialization routines. Note that you have a failure of this type italso means the respectivekernel_read_file_from_fd() memory space wasalso freed and not used, and so we increment this counter with twicethe size of the module. Additionally if you used module decompressionthe size of the compressed module is also added to this counter.
modcount: how many modules we’ve loaded in our kernel life time
failed_kreads: how many modules failed due to failedkernel_read_file_from_fd()
failed_decompress: how many failed module decompression attempts we’ve had.These really should not happen unless your compression / decompressionmight be broken.
failed_becoming: how many modules failed after wekernel_read_file_from_fd()it and before we allocate memory for it withlayout_and_allocate(). Thiscounter is never incremented if you manage to validate the module andcalllayout_and_allocate() for it.
failed_load_modules: how many modules failed once we’ve allocated ourprivate space for our module usinglayout_and_allocate(). These failuresshould hopefully mostly be dealt with already. Races in theory couldstill exist here, but it would just mean the kernel had started processingtwo threads concurrently up toearly_mod_check() and one thread won.These failures are good signs the kernel or userspace is doing somethingseriously stupid or that could be improved. We should strive to fix these,but it is perhaps not easy to fix them. A recent example are the modulesrequests incurred for frequency modules, a separate module request wasbeing issued for each CPU on a system.

Inter Module support¶

Refer to the files in kernel/module/ for more information.

Hardware Interfaces¶

DMA Channels¶

intrequest_dma(unsignedintdmanr,constchar*device_id)¶: request and reserve a system DMA channel

Parameters

unsignedintdmanr: DMA channel number
constchar*device_id: reserving device ID string, used in /proc/dma

voidfree_dma(unsignedintdmanr)¶: free a reserved system DMA channel

Parameters

unsignedintdmanr: DMA channel number

Resources Management¶

structresource*request_resource_conflict(structresource*root,structresource*new)¶: request and reserve an I/O or memory resource

Parameters

structresource*root: root resource descriptor
structresource*new: resource descriptor desired by caller

Description

Returns 0 for success, conflict resource on error.

intfind_next_iomem_res(resource_size_tstart,resource_size_tend,unsignedlongflags,unsignedlongdesc,structresource*res)¶: Finds the lowest iomem resource that covers part of [start..**end**].

Parameters

resource_size_tstart: start address of the resource searched for
resource_size_tend: end address of same resource
unsignedlongflags: flags which the resource must have
unsignedlongdesc: descriptor the resource must have
structresource*res: return ptr, if resource found

Description

If a resource is found, returns 0 and***res is overwritten with the partof the resource that’s within [**start..**end**]; if none is found, returns-ENODEV. Returns -EINVAL for invalid parameters.

The caller must specifystart,end,flags, anddesc(which may be IORES_DESC_NONE).

intreallocate_resource(structresource*root,structresource*old,resource_size_tnewsize,structresource_constraint*constraint)¶: allocate a slot in the resource tree given range & alignment. The resource will be relocated if the new size cannot be reallocated in the current location.

Parameters

structresource*root: root resource descriptor
structresource*old: resource descriptor desired by caller
resource_size_tnewsize: new size of the resource descriptor
structresource_constraint*constraint: the memory range and alignment constraints to be met.

structresource*lookup_resource(structresource*root,resource_size_tstart)¶: find an existing resource by a resource start address

Parameters

structresource*root: root resource descriptor
resource_size_tstart: resource start address

Description

Returns a pointer to the resource if found, NULL otherwise

structresource*insert_resource_conflict(structresource*parent,structresource*new)¶: Inserts resource in the resource tree

Parameters

structresource*parent: parent of the new resource
structresource*new: new resource to insert

Description

Returns 0 on success, conflict resource if the resource can’t be inserted.

This function is equivalent to request_resource_conflict when no conflicthappens. If a conflict happens, and the conflicting resourcesentirely fit within the range of the new resource, then the newresource is inserted and the conflicting resources become children ofthe new resource.

This function is intended for producers of resources, such as FW modulesand bus drivers.

resource_size_tresource_alignment(structresource*res)¶: calculate resource’s alignment

Parameters

structresource*res: resource pointer

Description

Returns alignment on success, 0 (invalid alignment) on failure.

voidrelease_mem_region_adjustable(resource_size_tstart,resource_size_tsize)¶: release a previously reserved memory region

Parameters

resource_size_tstart: resource start address
resource_size_tsize: resource region size

Description

This interface is intended for memory hot-delete. The requested regionis released from a currently busy memory resource. The requested regionmust either match exactly or fit into a single busy resource entry. Inthe latter case, the remaining resource is adjusted accordingly.

Note

Additional release conditions, such as overlapping region, can besupported after they are confirmed as valid cases.
When a busy memory resource gets split into two entries, its children arereassigned to the correct parent based on their range. If a child memoryresource overlaps with more than one parent, enhance the logic as needed.

voidmerge_system_ram_resource(structresource*res)¶: mark the System RAM resource mergeable and try to merge it with adjacent, mergeable resources

Parameters

structresource*res: resource descriptor

Description

This interface is intended for memory hotplug, whereby lots of contiguoussystem ram resources are added (e.g., via add_memory*()) by a driver, andthe actual resource boundaries are not of interest (e.g., it might berelevant for DIMMs). Only resources that are marked mergeable, that have thesame parent, and that don’t have any children are considered. All mergeableresources must be immutable during the request.

Note

The caller has to make sure that no pointers to resources that aremarked mergeable are used anymore after this call - the resource mightbe freed and the pointer might be stale!
release_mem_region_adjustable() will split on demand on memory hotunplug

intrequest_resource(structresource*root,structresource*new)¶: request and reserve an I/O or memory resource

Parameters

structresource*root: root resource descriptor
structresource*new: resource descriptor desired by caller

Description

Returns 0 for success, negative error code on error.

intrelease_resource(structresource*old)¶: release a previously reserved resource

Parameters

structresource*old: resource pointer

intwalk_iomem_res_desc(unsignedlongdesc,unsignedlongflags,u64start,u64end,void*arg,int(*func)(structresource*,void*))¶: Walks through iomem resources and callsfunc() with matching resource ranges. *

Parameters

unsignedlongdesc: I/O resource descriptor. Use IORES_DESC_NONE to skipdesc check.
unsignedlongflags: I/O resource flags
u64start: start addr
u64end: end addr
void*arg: function argument for the callbackfunc
int(*func)(structresource*,void*): callback function that is called for each qualifying resource area

Description

All the memory ranges which overlap start,end and also match flags anddesc are valid candidates.

NOTE

For a new descriptor search, define a new IORES_DESC in<linux/ioport.h> and set it in ‘desc’ of a target resource entry.

intregion_intersects(resource_size_tstart,size_tsize,unsignedlongflags,unsignedlongdesc)¶: determine intersection of region with known resources

Parameters

resource_size_tstart: region start address
size_tsize: size of region
unsignedlongflags: flags of resource (in iomem_resource)
unsignedlongdesc: descriptor of resource (in iomem_resource) or IORES_DESC_NONE

Description

Check if the specified region partially overlaps or fully eclipses aresource identified byflags anddesc (optional with IORES_DESC_NONE).Return REGION_DISJOINT if the region does not overlapflags/desc,return REGION_MIXED if the region overlapsflags/desc and anotherresource, and return REGION_INTERSECTS if the region overlapsflags/descand no other defined resource. Note that REGION_INTERSECTS is alsoreturned in the case when the specified region overlaps RAM and undefinedmemory holes.

region_intersect() is used by memory remapping functions to ensurethe user is not remapping RAM and is a vast speed up over walkingthrough the resource table page by page.

intfind_resource_space(structresource*root,structresource*new,resource_size_tsize,structresource_constraint*constraint)¶: Find empty space in the resource tree

Parameters

structresource*root: Root resource descriptor
structresource*new: Resource descriptor awaiting an empty resource space
resource_size_tsize: The minimum size of the empty space
structresource_constraint*constraint: The range and alignment constraints to be met

Description

Finds an empty space underroot in the resource tree satisfying range andalignmentconstraints.

Return

0 - if successful,new members start, end, and flags are altered.
-EBUSY - if no empty space was found.

intallocate_resource(structresource*root,structresource*new,resource_size_tsize,resource_size_tmin,resource_size_tmax,resource_size_talign,resource_alignfalignf,void*alignf_data)¶: allocate empty slot in the resource tree given range & alignment. The resource will be reallocated with a new size if it was already allocated

Parameters

structresource*root: root resource descriptor
structresource*new: resource descriptor desired by caller
resource_size_tsize: requested resource region size
resource_size_tmin: minimum boundary to allocate
resource_size_tmax: maximum boundary to allocate
resource_size_talign: alignment requested, in bytes
resource_alignfalignf: alignment function, optional, called if not NULL
void*alignf_data: arbitrary data to pass to thealignf function

intinsert_resource(structresource*parent,structresource*new)¶: Inserts a resource in the resource tree

Parameters

structresource*parent: parent of the new resource
structresource*new: new resource to insert

Description

Returns 0 on success, -EBUSY if the resource can’t be inserted.

This function is intended for producers of resources, such as FW modulesand bus drivers.

voidinsert_resource_expand_to_fit(structresource*root,structresource*new)¶: Insert a resource into the resource tree

Parameters

structresource*root: root resource descriptor
structresource*new: new resource to insert

Description

Insert a resource into the resource tree, possibly expanding it in orderto make it encompass any conflicting resources.

intremove_resource(structresource*old)¶: Remove a resource in the resource tree

Parameters

structresource*old: resource to remove

Description

Returns 0 on success, -EINVAL if the resource is not valid.

This function removes a resource previously inserted byinsert_resource()orinsert_resource_conflict(), and moves the children (if any) up towhere they were before.insert_resource() andinsert_resource_conflict()insert a new resource, and move any conflicting resources down to thechildren of the new resource.

insert_resource(),insert_resource_conflict() andremove_resource() areintended for producers of resources, such as FW modules and bus drivers.

intadjust_resource(structresource*res,resource_size_tstart,resource_size_tsize)¶: modify a resource’s start and size

Parameters

structresource*res: resource to modify
resource_size_tstart: new start value
resource_size_tsize: new size

Description

Given an existing resource, change its start and size to match thearguments. Returns 0 on success, -EBUSY if it can’t fit.Existing children of the resource are assumed to be immutable.

structresource*__request_region(structresource*parent,resource_size_tstart,resource_size_tn,constchar*name,intflags)¶: create a new busy resource region

Parameters

structresource*parent: parent resource descriptor
resource_size_tstart: resource start address
resource_size_tn: resource region size
constchar*name: reserving caller’s ID string
intflags: IO resource flags

void__release_region(structresource*parent,resource_size_tstart,resource_size_tn)¶: release a previously reserved resource region

Parameters

structresource*parent: parent resource descriptor
resource_size_tstart: resource start address
resource_size_tn: resource region size

Description

The described resource region must match a currently busy region.

intdevm_request_resource(structdevice*dev,structresource*root,structresource*new)¶: request and reserve an I/O or memory resource

Parameters

structdevice*dev: device for which to request the resource
structresource*root: root of the resource tree from which to request the resource
structresource*new: descriptor of the resource to request

Description

This is a device-managed version ofrequest_resource(). There is usuallyno need to release resources requested by this function explicitly sincethat will be taken care of when the device is unbound from its driver.If for some reason the resource needs to be released explicitly, becauseof ordering issues for example, drivers must calldevm_release_resource()rather than the regularrelease_resource().

When a conflict is detected between any existing resources and the newlyrequested resource, an error message will be printed.

Returns 0 on success or a negative error code on failure.

voiddevm_release_resource(structdevice*dev,structresource*new)¶: release a previously requested resource

Parameters

structdevice*dev: device for which to release the resource
structresource*new: descriptor of the resource to release

Description

Releases a resource previously requested usingdevm_request_resource().

structresource*devm_request_free_mem_region(structdevice*dev,structresource*base,unsignedlongsize)¶: find free region for device private memory

Parameters

structdevice*dev: devicestructto bind the resource to
structresource*base: resource tree to look in
unsignedlongsize: size in bytes of the device memory to add

Description

This function tries to find an empty range of physical address big enough tocontain the new resource, so that it can later be hotplugged as ZONE_DEVICEmemory, which in turn allocatesstructpages.

structresource*alloc_free_mem_region(structresource*base,unsignedlongsize,unsignedlongalign,constchar*name)¶: find a free region relative tobase

Parameters

structresource*base: resource that will parent the new resource
unsignedlongsize: size in bytes of memory to allocate frombase
unsignedlongalign: alignment requirements for the allocation
constchar*name: resource name

Description

Buses like CXL, that can dynamically instantiate new memory regions,need a method to allocate physical address space for those regions.Allocate and insert a new resource to cover a free, unclaimed by adescendant ofbase, range in the span ofbase.

MTRR Handling¶

intarch_phys_wc_add(unsignedlongbase,unsignedlongsize)¶: add a WC MTRR and handle errors if PAT is unavailable

Parameters

unsignedlongbase: Physical base address
unsignedlongsize: Size of region

Description

If PAT is available, this does nothing. If PAT is unavailable, itattempts to add a WC MTRR covering size bytes starting at base andlogs an error if this fails.

The called should provide a power of two size on an equivalentpower of two boundary.

Drivers must store the return value to pass to mtrr_del_wc_if_needed,but drivers should not try to interpret that return value.

Security Framework¶

intlsm_file_alloc(structfile*file)¶: allocate a composite file blob

Parameters

structfile*file: the file that needs a blob

Description

Allocate the file blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_blob_alloc(void**dest,size_tsize,gfp_tgfp)¶: allocate a composite blob

Parameters

void**dest: the destination for the blob
size_tsize: the size of the blob
gfp_tgfp: allocation type

Description

Allocate a blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_cred_alloc(structcred*cred,gfp_tgfp)¶: allocate a composite cred blob

Parameters

structcred*cred: the cred that needs a blob
gfp_tgfp: allocation type

Description

Allocate the cred blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_inode_alloc(structinode*inode,gfp_tgfp)¶: allocate a composite inode blob

Parameters

structinode*inode: the inode that needs a blob
gfp_tgfp: allocation flags

Description

Allocate the inode blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_task_alloc(structtask_struct*task)¶: allocate a composite task blob

Parameters

structtask_struct*task: the task that needs a blob

Description

Allocate the task blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_ipc_alloc(structkern_ipc_perm*kip)¶: allocate a composite ipc blob

Parameters

structkern_ipc_perm*kip: the ipc that needs a blob

Description

Allocate the ipc blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_key_alloc(structkey*key)¶: allocate a composite key blob

Parameters

structkey*key: the key that needs a blob

Description

Allocate the key blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_msg_msg_alloc(structmsg_msg*mp)¶: allocate a composite msg_msg blob

Parameters

structmsg_msg*mp: the msg_msg that needs a blob

Description

Allocate the ipc blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_bdev_alloc(structblock_device*bdev)¶: allocate a composite block_device blob

Parameters

structblock_device*bdev: the block_device that needs a blob

Description

Allocate the block_device blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_bpf_map_alloc(structbpf_map*map)¶: allocate a composite bpf_map blob

Parameters

structbpf_map*map: the bpf_map that needs a blob

Description

Allocate the bpf_map blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_bpf_prog_alloc(structbpf_prog*prog)¶: allocate a composite bpf_prog blob

Parameters

structbpf_prog*prog: the bpf_prog that needs a blob

Description

Allocate the bpf_prog blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_bpf_token_alloc(structbpf_token*token)¶: allocate a composite bpf_token blob

Parameters

structbpf_token*token: the bpf_token that needs a blob

Description

Allocate the bpf_token blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_superblock_alloc(structsuper_block*sb)¶: allocate a composite superblock blob

Parameters

structsuper_block*sb: the superblock that needs a blob

Description

Allocate the superblock blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_fill_user_ctx(structlsm_ctx__user*uctx,u32*uctx_len,void*val,size_tval_len,u64id,u64flags)¶: Fill a user space lsm_ctx structure

Parameters

structlsm_ctx__user*uctx: a userspace LSM context to be filled
u32*uctx_len: available uctx size (input), used uctx size (output)
void*val: the new LSM context value
size_tval_len: the size of the new LSM context value
u64id: LSM id
u64flags: LSM defined flags

Description

Fill all of the fields in a userspace lsm_ctx structure. Ifuctx is NULLsimply calculate the required size to output viautc_len and returnsuccess.

Returns 0 on success, -E2BIG if userspace buffer is not large enough,-EFAULT on a copyout error, -ENOMEM if memory can’t be allocated.

intsecurity_binder_set_context_mgr(conststructcred*mgr)¶: Check if becoming binder ctx mgr is ok

Parameters

conststructcred*mgr: task credentials of current binder process

Description

Check whethermgr is allowed to be the binder context manager.

Return

Return 0 if permission is granted.

intsecurity_binder_transaction(conststructcred*from,conststructcred*to)¶: Check if a binder transaction is allowed

Parameters

conststructcred*from: sending process
conststructcred*to: receiving process

Description

Check whetherfrom is allowed to invoke a binder transaction call toto.

Return

Returns 0 if permission is granted.

intsecurity_binder_transfer_binder(conststructcred*from,conststructcred*to)¶: Check if a binder transfer is allowed

Parameters

conststructcred*from: sending process
conststructcred*to: receiving process

Description

Check whetherfrom is allowed to transfer a binder reference toto.

Return

Returns 0 if permission is granted.

intsecurity_binder_transfer_file(conststructcred*from,conststructcred*to,conststructfile*file)¶: Check if a binder file xfer is allowed

Parameters

conststructcred*from: sending process
conststructcred*to: receiving process
conststructfile*file: file being transferred

Description

Check whetherfrom is allowed to transferfile toto.

Return

Returns 0 if permission is granted.

intsecurity_ptrace_access_check(structtask_struct*child,unsignedintmode)¶: Check if tracing is allowed

Parameters

structtask_struct*child: target process
unsignedintmode: PTRACE_MODE flags

Description

Check permission before allowing the current process to trace thechildprocess. Security modules may also want to perform a process tracing checkduring an execve in the set_security or apply_creds hooks of tracing checkduring an execve in the bprm_set_creds hook of binprm_security_ops if theprocess is being traced and its security attributes would be changed by theexecve.

Return

Returns 0 if permission is granted.

intsecurity_ptrace_traceme(structtask_struct*parent)¶: Check if tracing is allowed

Parameters

structtask_struct*parent: tracing process

Description

Check that theparent process has sufficient permission to trace thecurrent process before allowing the current process to present itself to theparent process for tracing.

Return

Returns 0 if permission is granted.

intsecurity_capget(conststructtask_struct*target,kernel_cap_t*effective,kernel_cap_t*inheritable,kernel_cap_t*permitted)¶: Get the capability sets for a process

Parameters

conststructtask_struct*target: target process
kernel_cap_t*effective: effective capability set
kernel_cap_t*inheritable: inheritable capability set
kernel_cap_t*permitted: permitted capability set

Description

Get theeffective,inheritable, andpermitted capability sets for thetarget process. The hook may also perform permission checking to determineif the current process is allowed to see the capability sets of thetargetprocess.

Return

Returns 0 if the capability sets were successfully obtained.

intsecurity_capset(structcred*new,conststructcred*old,constkernel_cap_t*effective,constkernel_cap_t*inheritable,constkernel_cap_t*permitted)¶: Set the capability sets for a process

Parameters

structcred*new: new credentials for the target process
conststructcred*old: current credentials of the target process
constkernel_cap_t*effective: effective capability set
constkernel_cap_t*inheritable: inheritable capability set
constkernel_cap_t*permitted: permitted capability set

Description

Set theeffective,inheritable, andpermitted capability sets for thecurrent process.

Return

Returns 0 and updatenew if permission is granted.

intsecurity_capable(conststructcred*cred,structuser_namespace*ns,intcap,unsignedintopts)¶: Check if a process has the necessary capability

Parameters

conststructcred*cred: credentials to examine
structuser_namespace*ns: user namespace
intcap: capability requested
unsignedintopts: capability check options

Description

Check whether thetsk process has thecap capability in the indicatedcredentials.cap contains the capability <include/linux/capability.h>.opts contains options for the capable check <include/linux/security.h>.

Return

Returns 0 if the capability is granted.

intsecurity_quotactl(intcmds,inttype,intid,conststructsuper_block*sb)¶: Check if aquotactl() syscall is allowed for this fs

Parameters

intcmds: commands
inttype: type
intid: id
conststructsuper_block*sb: filesystem

Description

Check whether the quotactl syscall is allowed for thissb.

Return

Returns 0 if permission is granted.

intsecurity_quota_on(structdentry*dentry)¶: Check if QUOTAON is allowed for a dentry

Parameters

structdentry*dentry: dentry

Description

Check whether QUOTAON is allowed fordentry.

Return

Returns 0 if permission is granted.

intsecurity_syslog(inttype)¶: Check if accessing the kernel message ring is allowed

Parameters

inttype: SYSLOG_ACTION_* type

Description

Check permission before accessing the kernel message ring or changinglogging to the console. See the syslog(2) manual page for an explanation ofthetype values.

Return

Return 0 if permission is granted.

intsecurity_settime64(conststructtimespec64*ts,conststructtimezone*tz)¶: Check if changing the system time is allowed

Parameters

conststructtimespec64*ts: new time
conststructtimezone*tz: timezone

Description

Check permission to change the system time,structtimespec64 is defined in<include/linux/time64.h> and timezone is defined in <include/linux/time.h>.

Return

Returns 0 if permission is granted.

intsecurity_vm_enough_memory_mm(structmm_struct*mm,longpages)¶: Check if allocating a new mem map is allowed

Parameters

structmm_struct*mm: mm struct
longpages: number of pages

Description

Check permissions for allocating a new virtual mapping. If all LSMs returna positive value,__vm_enough_memory() will be called with cap_sys_adminset. If at least one LSM returns 0 or negative,__vm_enough_memory() will becalled with cap_sys_admin cleared.

Return

Returns 0 if permission is granted by the LSM infrastructure to thecaller.

intsecurity_bprm_creds_for_exec(structlinux_binprm*bprm)¶: Prepare the credentials forexec()

Parameters

structlinux_binprm*bprm: binary program information

Description

If the setup in prepare_exec_creds did not setupbprm->cred->securityproperly for executingbprm->file, update the LSM’s portion ofbprm->cred->security to be what commit_creds needs to install for the newprogram. This hook may also optionally check permissions (e.g. fortransitions between security domains). The hook must setbprm->secureexecto 1 if AT_SECURE should be set to request libc enable secure mode.bprmcontains the linux_binprm structure.

If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check isset. The result must be the same as without this flag even if the executionwill never really happen andbprm will always be dropped.

This hook must not change current->cred, onlybprm->cred.

Return

Returns 0 if the hook is successful and permission is granted.

intsecurity_bprm_creds_from_file(structlinux_binprm*bprm,conststructfile*file)¶: Update linux_binprm creds based on file

Parameters

structlinux_binprm*bprm: binary program information
conststructfile*file: associated file

Description

Iffile is setpcap, suid, sgid or otherwise marked to change privilege uponexec, updatebprm->cred to reflect that change. This is called afterfinding the binary that will be executed without an interpreter. Thisensures that the credentials will not be derived from a script that thebinary will need to reopen, which when reopend may end up being a completelydifferent file. This hook may also optionally check permissions (e.g. fortransitions between security domains). The hook must setbprm->secureexecto 1 if AT_SECURE should be set to request libc enable secure mode. Thehook must add tobprm->per_clear any personality flags that should becleared from current->personality.bprm contains the linux_binprmstructure.

Return

Returns 0 if the hook is successful and permission is granted.

intsecurity_bprm_check(structlinux_binprm*bprm)¶: Mediate binary handler search

Parameters

structlinux_binprm*bprm: binary program information

Description

This hook mediates the point when a search for a binary handler will begin.It allows a check against thebprm->cred->security value which was set inthe preceding creds_for_exec call. The argv list and envp list are reliablyavailable inbprm. This hook may be called multiple times during a singleexecve.bprm contains the linux_binprm structure.

Return

Returns 0 if the hook is successful and permission is granted.

voidsecurity_bprm_committing_creds(conststructlinux_binprm*bprm)¶: Install creds for a process duringexec()

Parameters

conststructlinux_binprm*bprm: binary program information

Description

Prepare to install the new security attributes of a process beingtransformed by an execve operation, based on the old credentials pointed tobycurrent->cred and the information set inbprm->cred by thebprm_creds_for_exec hook.bprm points to the linux_binprm structure. Thishook is a good place to perform state changes on the process such as closingopen file descriptors to which access will no longer be granted when theattributes are changed. This is called immediately beforecommit_creds().

voidsecurity_bprm_committed_creds(conststructlinux_binprm*bprm)¶: Tidy up after cred install duringexec()

Parameters

conststructlinux_binprm*bprm: binary program information

Description

Tidy up after the installation of the new security attributes of a processbeing transformed by an execve operation. The new credentials have, by thispoint, been set tocurrent->cred.bprm points to the linux_binprmstructure. This hook is a good place to perform state changes on theprocess such as clearing out non-inheritable signal state. This is calledimmediately aftercommit_creds().

intsecurity_fs_context_submount(structfs_context*fc,structsuper_block*reference)¶: Initialise fc->security

Parameters

structfs_context*fc: new filesystem context
structsuper_block*reference: dentry reference for submount/remount

Description

Fill out the ->security field for a new fs_context.

Return

Returns 0 on success or negative error code on failure.

intsecurity_fs_context_dup(structfs_context*fc,structfs_context*src_fc)¶: Duplicate a fs_context LSM blob

Parameters

structfs_context*fc: destination filesystem context
structfs_context*src_fc: source filesystem context

Description

Allocate and attach a security structure to sc->security. This pointer isinitialised to NULL by the caller.fc indicates the new filesystem context.src_fc indicates the original filesystem context.

Return

Returns 0 on success or a negative error code on failure.

intsecurity_fs_context_parse_param(structfs_context*fc,structfs_parameter*param)¶: Configure a filesystem context

Parameters

structfs_context*fc: filesystem context
structfs_parameter*param: filesystem parameter

Description

Userspace provided a parameter to configure a superblock. The LSM canconsume the parameter or return it to the caller for use elsewhere.

Return

If the parameter is used by the LSM it should return 0, if it isreturned to the caller -ENOPARAM is returned, otherwise a negativeerror code is returned.

intsecurity_sb_alloc(structsuper_block*sb)¶: Allocate a super_block LSM blob

Parameters

structsuper_block*sb: filesystem superblock

Description

Allocate and attach a security structure to the sb->s_security field. Thes_security field is initialized to NULL when the structure is allocated.sb contains the super_block structure to be modified.

Return

Returns 0 if operation was successful.

voidsecurity_sb_delete(structsuper_block*sb)¶: Release super_block LSM associated objects

Parameters

structsuper_block*sb: filesystem superblock

Description

Release objects tied to a superblock (e.g. inodes).sb contains thesuper_block structure being released.

voidsecurity_sb_free(structsuper_block*sb)¶: Free a super_block LSM blob

Parameters

structsuper_block*sb: filesystem superblock

Description

Deallocate and clear the sb->s_security field.sb contains the super_blockstructure to be modified.

intsecurity_sb_kern_mount(conststructsuper_block*sb)¶: Check if a kernel mount is allowed

Parameters

conststructsuper_block*sb: filesystem superblock

Description

Mount thissb if allowed by permissions.

Return

Returns 0 if permission is granted.

intsecurity_sb_show_options(structseq_file*m,structsuper_block*sb)¶: Output the mount options for a superblock

Parameters

structseq_file*m: output file
structsuper_block*sb: filesystem superblock

Description

Show (print onm) mount options for thissb.

Return

Returns 0 on success, negative values on failure.

intsecurity_sb_statfs(structdentry*dentry)¶: Check if accessing fs stats is allowed

Parameters

structdentry*dentry: superblock handle

Description

Check permission before obtaining filesystem statistics for themntmountpoint.dentry is a handle on the superblock for the filesystem.

Return

Returns 0 if permission is granted.

intsecurity_sb_mount(constchar*dev_name,conststructpath*path,constchar*type,unsignedlongflags,void*data)¶: Check permission for mounting a filesystem

Parameters

constchar*dev_name: filesystem backing device
conststructpath*path: mount point
constchar*type: filesystem type
unsignedlongflags: mount flags
void*data: filesystem specific data

Description

Check permission before an object specified bydev_name is mounted on themount point named bynd. For an ordinary mount,dev_name identifies adevice if the file system type requires a device. For a remount(flags & MS_REMOUNT),dev_name is irrelevant. For a loopback/bind mount(flags & MS_BIND),dev_name identifies the pathname of the object beingmounted.

Return

Returns 0 if permission is granted.

intsecurity_sb_umount(structvfsmount*mnt,intflags)¶: Check permission for unmounting a filesystem

Parameters

structvfsmount*mnt: mounted filesystem
intflags: unmount flags

Description

Check permission before themnt file system is unmounted.

Return

Returns 0 if permission is granted.

intsecurity_sb_pivotroot(conststructpath*old_path,conststructpath*new_path)¶: Check permissions for pivoting the rootfs

Parameters

conststructpath*old_path: new location for current rootfs
conststructpath*new_path: location of the new rootfs

Description

Check permission before pivoting the root filesystem.

Return

Returns 0 if permission is granted.

intsecurity_move_mount(conststructpath*from_path,conststructpath*to_path)¶: Check permissions for moving a mount

Parameters

conststructpath*from_path: source mount point
conststructpath*to_path: destination mount point

Description

Check permission before a mount is moved.

Return

Returns 0 if permission is granted.

intsecurity_path_notify(conststructpath*path,u64mask,unsignedintobj_type)¶: Check if setting a watch is allowed

Parameters

conststructpath*path: file path
u64mask: event mask
unsignedintobj_type: file path type

Description

Check permissions before setting a watch on events as defined bymask, onan object atpath, whose type is defined byobj_type.

Return

Returns 0 if permission is granted.

intsecurity_inode_alloc(structinode*inode,gfp_tgfp)¶: Allocate an inode LSM blob

Parameters

structinode*inode: the inode
gfp_tgfp: allocation flags

Description

Allocate and attach a security structure toinode->i_security. Thei_security field is initialized to NULL when the inode structure isallocated.

Return

Return 0 if operation was successful.

voidsecurity_inode_free(structinode*inode)¶: Free an inode’s LSM blob

Parameters

structinode*inode: the inode

Description

Release any LSM resources associated withinode, although due to theinode’s RCU protections it is possible that the resources will not befully released until after the current RCU grace period has elapsed.

It is important for LSMs to note that despite being present in a call tosecurity_inode_free(),inode may still be referenced in a VFS path walkand calls tosecurity_inode_permission() may be made during, or after,a call tosecurity_inode_free(). For this reason the inode->i_securityfield is released via acall_rcu() callback and any LSMs which need toretain inode state for use insecurity_inode_permission() should onlyrelease that state in theinode_free_security_rcu() LSM hook callback.

intsecurity_inode_init_security_anon(structinode*inode,conststructqstr*name,conststructinode*context_inode)¶: Initialize an anonymous inode

Parameters

structinode*inode: the inode
conststructqstr*name: the anonymous inode class
conststructinode*context_inode: an optional related inode

Description

Set up the incore security field for the new anonymous inode and returnwhether the inode creation is permitted by the security module or not.

Return

Returns 0 on success, -EACCES if the security module denies thecreation of this inode, or another -errno upon other errors.

voidsecurity_path_post_mknod(structmnt_idmap*idmap,structdentry*dentry)¶: Update inode security after reg file creation

Parameters

structmnt_idmap*idmap: idmap of the mount
structdentry*dentry: new file

Description

Update inode security field after a regular file has been created.

intsecurity_path_rmdir(conststructpath*dir,structdentry*dentry)¶: Check if removing a directory is allowed

Parameters

conststructpath*dir: parent directory
structdentry*dentry: directory to remove

Description

Check the permission to remove a directory.

Return

Returns 0 if permission is granted.

intsecurity_path_symlink(conststructpath*dir,structdentry*dentry,constchar*old_name)¶: Check if creating a symbolic link is allowed

Parameters

conststructpath*dir: parent directory
structdentry*dentry: symbolic link
constchar*old_name: file pathname

Description

Check the permission to create a symbolic link to a file.

Return

Returns 0 if permission is granted.

intsecurity_path_link(structdentry*old_dentry,conststructpath*new_dir,structdentry*new_dentry)¶: Check if creating a hard link is allowed

Parameters

structdentry*old_dentry: existing file
conststructpath*new_dir: new parent directory
structdentry*new_dentry: new link

Description

Check permission before creating a new hard link to a file.

Return

Returns 0 if permission is granted.

intsecurity_path_truncate(conststructpath*path)¶: Check if truncating a file is allowed

Parameters

conststructpath*path: file

Description

Check permission before truncating the file indicated by path. Note thattruncation permissions may also be checked based on already opened files,using thesecurity_file_truncate() hook.

Return

Returns 0 if permission is granted.

intsecurity_path_chmod(conststructpath*path,umode_tmode)¶: Check if changing the file’s mode is allowed

Parameters

conststructpath*path: file
umode_tmode: new mode

Description

Check for permission to change a mode of the filepath. The new mode isspecified inmode which is a bitmask of constants from<include/uapi/linux/stat.h>.

Return

Returns 0 if permission is granted.

intsecurity_path_chown(conststructpath*path,kuid_tuid,kgid_tgid)¶: Check if changing the file’s owner/group is allowed

Parameters

conststructpath*path: file
kuid_tuid: file owner
kgid_tgid: file group

Description

Check for permission to change owner/group of a file or directory.

Return

Returns 0 if permission is granted.

intsecurity_path_chroot(conststructpath*path)¶: Check if changing the root directory is allowed

Parameters

conststructpath*path: directory

Description

Check for permission to change root directory.

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_create_tmpfile(structmnt_idmap*idmap,structinode*inode)¶: Update inode security of new tmpfile

Parameters

structmnt_idmap*idmap: idmap of the mount
structinode*inode: inode of the new tmpfile

Description

Update inode security data after a tmpfile has been created.

intsecurity_inode_link(structdentry*old_dentry,structinode*dir,structdentry*new_dentry)¶: Check if creating a hard link is allowed

Parameters

structdentry*old_dentry: existing file
structinode*dir: new parent directory
structdentry*new_dentry: new link

Description

Check permission before creating a new hard link to a file.

Return

Returns 0 if permission is granted.

intsecurity_inode_unlink(structinode*dir,structdentry*dentry)¶: Check if removing a hard link is allowed

Parameters

structinode*dir: parent directory
structdentry*dentry: file

Description

Check the permission to remove a hard link to a file.

Return

Returns 0 if permission is granted.

intsecurity_inode_symlink(structinode*dir,structdentry*dentry,constchar*old_name)¶: Check if creating a symbolic link is allowed

Parameters

structinode*dir: parent directory
structdentry*dentry: symbolic link
constchar*old_name: existing filename

Description

Check the permission to create a symbolic link to a file.

Return

Returns 0 if permission is granted.

intsecurity_inode_rmdir(structinode*dir,structdentry*dentry)¶: Check if removing a directory is allowed

Parameters

structinode*dir: parent directory
structdentry*dentry: directory to be removed

Description

Check the permission to remove a directory.

Return

Returns 0 if permission is granted.

intsecurity_inode_mknod(structinode*dir,structdentry*dentry,umode_tmode,dev_tdev)¶: Check if creating a special file is allowed

Parameters

structinode*dir: parent directory
structdentry*dentry: new file
umode_tmode: new file mode
dev_tdev: device number

Description

Check permissions when creating a special file (or a socket or a fifo filecreated via the mknod system call). Note that if mknod operation is beingdone for a regular file, then the create hook will be called and not thishook.

Return

Returns 0 if permission is granted.

intsecurity_inode_rename(structinode*old_dir,structdentry*old_dentry,structinode*new_dir,structdentry*new_dentry,unsignedintflags)¶: Check if renaming a file is allowed

Parameters

structinode*old_dir: parent directory of the old file
structdentry*old_dentry: the old file
structinode*new_dir: parent directory of the new file
structdentry*new_dentry: the new file
unsignedintflags: flags

Description

Check for permission to rename a file or directory.

Return

Returns 0 if permission is granted.

intsecurity_inode_readlink(structdentry*dentry)¶: Check if reading a symbolic link is allowed

Parameters

structdentry*dentry: link

Description

Check the permission to read the symbolic link.

Return

Returns 0 if permission is granted.

intsecurity_inode_follow_link(structdentry*dentry,structinode*inode,boolrcu)¶: Check if following a symbolic link is allowed

Parameters

structdentry*dentry: link dentry
structinode*inode: link inode
boolrcu: true if in RCU-walk mode

Description

Check permission to follow a symbolic link when looking up a pathname. Ifrcu is true,inode is not stable.

Return

Returns 0 if permission is granted.

intsecurity_inode_permission(structinode*inode,intmask)¶: Check if accessing an inode is allowed

Parameters

structinode*inode: inode
intmask: access mask

Description

Check permission before accessing an inode. This hook is called by theexisting Linux permission function, so a security module can use it toprovide additional checking for existing Linux permission checks. Noticethat this hook is called when a file is opened (as well as many otheroperations), whereas the file_security_ops permission hook is called whenthe actual read/write operations are performed.

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_setattr(structmnt_idmap*idmap,structdentry*dentry,intia_valid)¶: Update the inode after a setattr operation

Parameters

structmnt_idmap*idmap: idmap of the mount
structdentry*dentry: file
intia_valid: file attributes set

Description

Update inode security field after successful setting file attributes.

intsecurity_inode_getattr(conststructpath*path)¶: Check if getting file attributes is allowed

Parameters

conststructpath*path: file

Description

Check permission before obtaining file attributes.

Return

Returns 0 if permission is granted.

intsecurity_inode_setxattr(structmnt_idmap*idmap,structdentry*dentry,constchar*name,constvoid*value,size_tsize,intflags)¶: Check if setting file xattrs is allowed

Parameters

structmnt_idmap*idmap: idmap of the mount
structdentry*dentry: file
constchar*name: xattr name
constvoid*value: xattr value
size_tsize: size of xattr value
intflags: flags

Description

This hook performs the desired permission checks before setting the extendedattributes (xattrs) ondentry. It is important to note that we have someadditional logic before the main LSM implementation calls to detect if weneed to perform an additional capability check at the LSM layer.

Normally we enforce a capability check prior to executing the various LSMhook implementations, but if a LSM wants to avoid this capability check,it can register a ‘inode_xattr_skipcap’ hook and return a value of 1 forxattrs that it wants to avoid the capability check, leaving the LSM fullyresponsible for enforcing the access control for the specific xattr. If allof the enabled LSMs refrain from registering a ‘inode_xattr_skipcap’ hook,or return a 0 (the default return value), the capability check is stillperformed. If no ‘inode_xattr_skipcap’ hooks are registered the capabilitycheck is performed.

Return

Returns 0 if permission is granted.

intsecurity_inode_set_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name,structposix_acl*kacl)¶: Check if setting posix acls is allowed

Parameters

structmnt_idmap*idmap: idmap of the mount
structdentry*dentry: file
constchar*acl_name: acl name
structposix_acl*kacl: acl struct

Description

Check permission before setting posix acls, the posix acls inkacl areidentified byacl_name.

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_set_acl(structdentry*dentry,constchar*acl_name,structposix_acl*kacl)¶: Update inode security from posix acls set

Parameters

structdentry*dentry: file
constchar*acl_name: acl name
structposix_acl*kacl: acl struct

Description

Update inode security data after successfully setting posix acls ondentry.The posix acls inkacl are identified byacl_name.

intsecurity_inode_get_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)¶: Check if reading posix acls is allowed

Parameters

structmnt_idmap*idmap: idmap of the mount
structdentry*dentry: file
constchar*acl_name: acl name

Description

Check permission before getting osix acls, the posix acls are identified byacl_name.

Return

Returns 0 if permission is granted.

intsecurity_inode_remove_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)¶: Check if removing a posix acl is allowed

Parameters

structmnt_idmap*idmap: idmap of the mount
structdentry*dentry: file
constchar*acl_name: acl name

Description

Check permission before removing posix acls, the posix acls are identifiedbyacl_name.

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_remove_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)¶: Update inode security after rm posix acls

Parameters

structmnt_idmap*idmap: idmap of the mount
structdentry*dentry: file
constchar*acl_name: acl name

Description

Update inode security data after successfully removing posix acls ondentry inidmap. The posix acls are identified byacl_name.

voidsecurity_inode_post_setxattr(structdentry*dentry,constchar*name,constvoid*value,size_tsize,intflags)¶: Update the inode after a setxattr operation

Parameters

structdentry*dentry: file
constchar*name: xattr name
constvoid*value: xattr value
size_tsize: xattr value size
intflags: flags

Description

Update inode security field after successful setxattr operation.

intsecurity_inode_getxattr(structdentry*dentry,constchar*name)¶: Check if xattr access is allowed

Parameters

structdentry*dentry: file
constchar*name: xattr name

Description

Check permission before obtaining the extended attributes identified byname fordentry.

Return

Returns 0 if permission is granted.

intsecurity_inode_listxattr(structdentry*dentry)¶: Check if listing xattrs is allowed

Parameters

structdentry*dentry: file

Description

Check permission before obtaining the list of extended attribute names fordentry.

Return

Returns 0 if permission is granted.

intsecurity_inode_removexattr(structmnt_idmap*idmap,structdentry*dentry,constchar*name)¶: Check if removing an xattr is allowed

Parameters

structmnt_idmap*idmap: idmap of the mount
structdentry*dentry: file
constchar*name: xattr name

Description

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_removexattr(structdentry*dentry,constchar*name)¶: Update the inode after a removexattr op

Parameters

structdentry*dentry: file
constchar*name: xattr name

Description

Update the inode after a successful removexattr operation.

intsecurity_inode_file_setattr(structdentry*dentry,structfile_kattr*fa)¶: check if setting fsxattr is allowed

Parameters

structdentry*dentry: file to set filesystem extended attributes on
structfile_kattr*fa: extended attributes to set on the inode

Description

Called whenfile_setattr() syscall or FS_IOC_FSSETXATTR ioctl() is called oninode

Return

Returns 0 if permission is granted.

intsecurity_inode_file_getattr(structdentry*dentry,structfile_kattr*fa)¶: check if retrieving fsxattr is allowed

Parameters

structdentry*dentry: file to retrieve filesystem extended attributes from
structfile_kattr*fa: extended attributes to get

Description

Called whenfile_getattr() syscall or FS_IOC_FSGETXATTR ioctl() is called oninode

Return

Returns 0 if permission is granted.

intsecurity_inode_need_killpriv(structdentry*dentry)¶: Check ifsecurity_inode_killpriv() required

Parameters

structdentry*dentry: associated dentry

Description

Called when an inode has been changed to determine ifsecurity_inode_killpriv() should be called.

Return

Return <0 on error to abort the inode change operation, return 0 ifsecurity_inode_killpriv() does not need to be called, return >0 ifsecurity_inode_killpriv() does need to be called.

intsecurity_inode_killpriv(structmnt_idmap*idmap,structdentry*dentry)¶: The setuid bit is removed, update LSM state

Parameters

structmnt_idmap*idmap: idmap of the mount
structdentry*dentry: associated dentry

Description

Thedentry’s setuid bit is being removed. Remove similar security labels.Called with the dentry->d_inode->i_mutex held.

Return

Return 0 on success. If error is returned, then the operationcausing setuid bit removal is failed.

intsecurity_inode_getsecurity(structmnt_idmap*idmap,structinode*inode,constchar*name,void**buffer,boolalloc)¶: Get the xattr security label of an inode

Parameters

structmnt_idmap*idmap: idmap of the mount
structinode*inode: inode
constchar*name: xattr name
void**buffer: security label buffer
boolalloc: allocation flag

Description

Retrieve a copy of the extended attribute representation of the securitylabel associated withname forinode viabuffer. Note thatname is theremainder of the attribute name after the security prefix has been removed.alloc is used to specify if the call should return a value via the bufferor just the value length.

Return

Returns size of buffer on success.

intsecurity_inode_setsecurity(structinode*inode,constchar*name,constvoid*value,size_tsize,intflags)¶: Set the xattr security label of an inode

Parameters

structinode*inode: inode
constchar*name: xattr name
constvoid*value: security label
size_tsize: length of security label
intflags: flags

Description

Set the security label associated withname forinode from the extendedattribute valuevalue.size indicates the size of thevalue in bytes.flags may be XATTR_CREATE, XATTR_REPLACE, or 0. Note thatname is theremainder of the attribute name after the security. prefix has been removed.

Return

Returns 0 on success.

voidsecurity_inode_getlsmprop(structinode*inode,structlsm_prop*prop)¶: Get an inode’s LSM data

Parameters

structinode*inode: inode
structlsm_prop*prop: lsm specific information to return

Description

Get the lsm specific information associated with the node.

intsecurity_kernfs_init_security(structkernfs_node*kn_dir,structkernfs_node*kn)¶: Init LSM context for a kernfs node

Parameters

structkernfs_node*kn_dir: parent kernfs node
structkernfs_node*kn: the kernfs node to initialize

Description

Initialize the security context of a newly created kernfs node based on itsown and its parent’s attributes.

Return

Returns 0 if permission is granted.

intsecurity_file_permission(structfile*file,intmask)¶: Check file permissions

Parameters

structfile*file: file
intmask: requested permissions

Description

Check file permissions before accessing an open file. This hook is calledby various operations that read or write files. A security module can usethis hook to perform additional checking on these operations, e.g. torevalidate permissions on use to support privilege bracketing or policychanges. Notice that this hook is used when the actual read/writeoperations are performed, whereas the inode_security_ops hook is called whena file is opened (as well as many other operations). Although this hook canbe used to revalidate permissions for various system call operations thatread or write files, it does not address the revalidation of permissions formemory-mapped files. Security modules must handle this separately if theyneed such revalidation.

Return

Returns 0 if permission is granted.

intsecurity_file_alloc(structfile*file)¶: Allocate and init a file’s LSM blob

Parameters

structfile*file: the file

Description

Allocate and attach a security structure to the file->f_security field. Thesecurity field is initialized to NULL when the structure is first created.

Return

Return 0 if the hook is successful and permission is granted.

voidsecurity_file_release(structfile*file)¶: Perform actions before releasing the file ref

Parameters

structfile*file: the file

Description

Perform actions before releasing the last reference to a file.

voidsecurity_file_free(structfile*file)¶: Free a file’s LSM blob

Parameters

structfile*file: the file

Description

Deallocate and free any security structures stored in file->f_security.

intsecurity_mmap_file(structfile*file,unsignedlongprot,unsignedlongflags)¶: Check if mmap’ing a file is allowed

Parameters

structfile*file: file
unsignedlongprot: protection applied by the kernel
unsignedlongflags: flags

Description

Check permissions for a mmap operation. Thefile may be NULL, e.g. ifmapping anonymous memory.

Return

Returns 0 if permission is granted.

intsecurity_mmap_addr(unsignedlongaddr)¶: Check if mmap’ing an address is allowed

Parameters

unsignedlongaddr: address

Description

Check permissions for a mmap operation ataddr.

Return

Returns 0 if permission is granted.

intsecurity_file_mprotect(structvm_area_struct*vma,unsignedlongreqprot,unsignedlongprot)¶: Check if changing memory protections is allowed

Parameters

structvm_area_struct*vma: memory region
unsignedlongreqprot: application requested protection
unsignedlongprot: protection applied by the kernel

Description

Check permissions before changing memory access permissions.

Return

Returns 0 if permission is granted.

intsecurity_file_lock(structfile*file,unsignedintcmd)¶: Check if a file lock is allowed

Parameters

structfile*file: file
unsignedintcmd: lock operation (e.g. F_RDLCK, F_WRLCK)

Description

Check permission before performing file locking operations. Note the hookmediates both flock and fcntl style locks.

Return

Returns 0 if permission is granted.

intsecurity_file_fcntl(structfile*file,unsignedintcmd,unsignedlongarg)¶: Check if fcntl() op is allowed

Parameters

structfile*file: file
unsignedintcmd: fcntl command
unsignedlongarg: command argument

Description

Check permission before allowing the file operation specified bycmd frombeing performed on the filefile. Note thatarg sometimes represents auser space pointer; in other cases, it may be a simple integer value. Whenarg represents a user space pointer, it should never be used by thesecurity module.

Return

Returns 0 if permission is granted.

voidsecurity_file_set_fowner(structfile*file)¶: Set the file owner info in the LSM blob

Parameters

structfile*file: the file

Description

Save owner security information (typically from current->security) infile->f_security for later use by the send_sigiotask hook.

This hook is called with file->f_owner.lock held.

Return

Returns 0 on success.

intsecurity_file_send_sigiotask(structtask_struct*tsk,structfown_struct*fown,intsig)¶: Check if sending SIGIO/SIGURG is allowed

Parameters

structtask_struct*tsk: target task
structfown_struct*fown: signal sender
intsig: signal to be sent, SIGIO is sent if 0

Description

Check permission for the file ownerfown to send SIGIO or SIGURG to theprocesstsk. Note that this hook is sometimes called from interrupt. Notethat the fown_struct,fown, is never outside the context of astructfile,so the file structure (and associated security information) can always beobtained: container_of(fown,structfile, f_owner).

Return

Returns 0 if permission is granted.

intsecurity_file_receive(structfile*file)¶: Check if receiving a file via IPC is allowed

Parameters

structfile*file: file being received

Description

This hook allows security modules to control the ability of a process toreceive an open file descriptor via socket IPC.

Return

Returns 0 if permission is granted.

intsecurity_file_open(structfile*file)¶: Save open() time state for late use by the LSM

Parameters

structfile*file

Description

Save open-time permission checking state for later use upon file_permission,and recheck access if anything has changed since inode_permission.

We can check if a file is opened for execution (e.g. execve(2) call), eitherdirectly or indirectly (e.g. ELF’s ld.so) by checking file->f_flags &__FMODE_EXEC .

Return

Returns 0 if permission is granted.

intsecurity_file_truncate(structfile*file)¶: Check if truncating a file is allowed

Parameters

structfile*file: file

Description

Check permission before truncating a file, i.e. using ftruncate. Note thattruncation permission may also be checked based on the path, using thepath_truncate hook.

Return

Returns 0 if permission is granted.

intsecurity_task_alloc(structtask_struct*task,u64clone_flags)¶: Allocate a task’s LSM blob

Parameters

structtask_struct*task: the task
u64clone_flags: flags indicating what is being shared

Description

Handle allocation of task-related resources.

Return

Returns a zero on success, negative values on failure.

voidsecurity_task_free(structtask_struct*task)¶: Free a task’s LSM blob and related resources

Parameters

structtask_struct*task: task

Description

Handle release of task-related resources. Note that this can be called frominterrupt context.

intsecurity_cred_alloc_blank(structcred*cred,gfp_tgfp)¶: Allocate the min memory to allow cred_transfer

Parameters

structcred*cred: credentials
gfp_tgfp: gfp flags

Description

Only allocate sufficient memory and attach tocred such thatcred_transfer() will not get ENOMEM.

Return

Returns 0 on success, negative values on failure.

voidsecurity_cred_free(structcred*cred)¶: Free the cred’s LSM blob and associated resources

Parameters

structcred*cred: credentials

Description

Deallocate and clear the cred->security field in a set of credentials.

intsecurity_prepare_creds(structcred*new,conststructcred*old,gfp_tgfp)¶: Prepare a new set of credentials

Parameters

structcred*new: new credentials
conststructcred*old: original credentials
gfp_tgfp: gfp flags

Description

Prepare a new set of credentials by copying the data from the old set.

Return

Returns 0 on success, negative values on failure.

voidsecurity_transfer_creds(structcred*new,conststructcred*old)¶: Transfer creds

Parameters

structcred*new: target credentials
conststructcred*old: original credentials

Description

Transfer data from original creds to new creds.

intsecurity_kernel_act_as(structcred*new,u32secid)¶: Set the kernel credentials to act as secid

Parameters

structcred*new: credentials
u32secid: secid

Description

Set the credentials for a kernel service to act as (subjective context).The current task must be the one that nominatedsecid.

Return

Returns 0 if successful.

intsecurity_kernel_create_files_as(structcred*new,structinode*inode)¶: Set file creation context using an inode

Parameters

structcred*new: target credentials
structinode*inode: reference inode

Description

Set the file creation context in a set of credentials to be the same as theobjective context of the specified inode. The current task must be the onethat nominatedinode.

Return

Returns 0 if successful.

intsecurity_kernel_module_request(char*kmod_name)¶: Check if loading a module is allowed

Parameters

char*kmod_name: module name

Description

Ability to trigger the kernel to automatically upcall to userspace foruserspace to load a kernel module with the given name.

Return

Returns 0 if successful.

intsecurity_task_fix_setuid(structcred*new,conststructcred*old,intflags)¶: Update LSM with new user id attributes

Parameters

structcred*new: updated credentials
conststructcred*old: credentials being replaced
intflags: LSM_SETID_* flag values

Description

Update the module’s state after setting one or more of the user identityattributes of the current process. Theflags parameter indicates which ofthe set*uid system calls invoked this hook. Ifnew is the set ofcredentials that will be installed. Modifications should be made to thisrather than tocurrent->cred.

Return

Returns 0 on success.

intsecurity_task_fix_setgid(structcred*new,conststructcred*old,intflags)¶: Update LSM with new group id attributes

Parameters

structcred*new: updated credentials
conststructcred*old: credentials being replaced
intflags: LSM_SETID_* flag value

Description

Update the module’s state after setting one or more of the group identityattributes of the current process. Theflags parameter indicates which ofthe set*gid system calls invoked this hook.new is the set of credentialsthat will be installed. Modifications should be made to this rather than tocurrent->cred.

Return

Returns 0 on success.

intsecurity_task_fix_setgroups(structcred*new,conststructcred*old)¶: Update LSM with new supplementary groups

Parameters

structcred*new: updated credentials
conststructcred*old: credentials being replaced

Description

Update the module’s state after setting the supplementary group identityattributes of the current process.new is the set of credentials that willbe installed. Modifications should be made to this rather than tocurrent->cred.

Return

Returns 0 on success.

intsecurity_task_setpgid(structtask_struct*p,pid_tpgid)¶: Check if setting the pgid is allowed

Parameters

structtask_struct*p: task being modified
pid_tpgid: new pgid

Description

Check permission before setting the process group identifier of the processp topgid.

Return

Returns 0 if permission is granted.

intsecurity_task_getpgid(structtask_struct*p)¶: Check if getting the pgid is allowed

Parameters

structtask_struct*p: task

Description

Check permission before getting the process group identifier of the processp.

Return

Returns 0 if permission is granted.

intsecurity_task_getsid(structtask_struct*p)¶: Check if getting the session id is allowed

Parameters

structtask_struct*p: task

Description

Check permission before getting the session identifier of the processp.

Return

Returns 0 if permission is granted.

intsecurity_task_setnice(structtask_struct*p,intnice)¶: Check if setting a task’s nice value is allowed

Parameters

structtask_struct*p: target task
intnice: nice value

Description

Check permission before setting the nice value ofp tonice.

Return

Returns 0 if permission is granted.

intsecurity_task_setioprio(structtask_struct*p,intioprio)¶: Check if setting a task’s ioprio is allowed

Parameters

structtask_struct*p: target task
intioprio: ioprio value

Description

Check permission before setting the ioprio value ofp toioprio.

Return

Returns 0 if permission is granted.

intsecurity_task_getioprio(structtask_struct*p)¶: Check if getting a task’s ioprio is allowed

Parameters

structtask_struct*p: task

Description

Check permission before getting the ioprio value ofp.

Return

Returns 0 if permission is granted.

intsecurity_task_prlimit(conststructcred*cred,conststructcred*tcred,unsignedintflags)¶: Check if get/setting resources limits is allowed

Parameters

conststructcred*cred: current task credentials
conststructcred*tcred: target task credentials
unsignedintflags: LSM_PRLIMIT_* flag bits indicating a get/set/both

Description

Check permission before getting and/or setting the resource limits ofanother task.

Return

Returns 0 if permission is granted.

intsecurity_task_setrlimit(structtask_struct*p,unsignedintresource,structrlimit*new_rlim)¶: Check if setting a new rlimit value is allowed

Parameters

structtask_struct*p: target task’s group leader
unsignedintresource: resource whose limit is being set
structrlimit*new_rlim: new resource limit

Description

Check permission before setting the resource limits of processp forresource tonew_rlim. The old resource limit values can be examined bydereferencing (p->signal->rlim + resource).

Return

Returns 0 if permission is granted.

intsecurity_task_setscheduler(structtask_struct*p)¶: Check if setting sched policy/param is allowed

Parameters

structtask_struct*p: target task

Description

Check permission before setting scheduling policy and/or parameters ofprocessp.

Return

Returns 0 if permission is granted.

intsecurity_task_getscheduler(structtask_struct*p)¶: Check if getting scheduling info is allowed

Parameters

structtask_struct*p: target task

Description

Check permission before obtaining scheduling information for processp.

Return

Returns 0 if permission is granted.

intsecurity_task_movememory(structtask_struct*p)¶: Check if moving memory is allowed

Parameters

structtask_struct*p: task

Description

Check permission before moving memory owned by processp.

Return

Returns 0 if permission is granted.

intsecurity_task_kill(structtask_struct*p,structkernel_siginfo*info,intsig,conststructcred*cred)¶: Check if sending a signal is allowed

Parameters

structtask_struct*p: target process
structkernel_siginfo*info: signal information
intsig: signal value
conststructcred*cred: credentials of the signal sender, NULL ifcurrent

Description

Check permission before sending signalsig top.info can be NULL, theconstant 1, or a pointer to a kernel_siginfo structure. Ifinfo is 1 orSI_FROMKERNEL(info) is true, then the signal should be viewed as coming fromthe kernel and should typically be permitted. SIGIO signals are handledseparately by the send_sigiotask hook in file_security_ops.

Return

Returns 0 if permission is granted.

intsecurity_task_prctl(intoption,unsignedlongarg2,unsignedlongarg3,unsignedlongarg4,unsignedlongarg5)¶: Check if a prctl op is allowed

Parameters

intoption: operation
unsignedlongarg2: argument
unsignedlongarg3: argument
unsignedlongarg4: argument
unsignedlongarg5: argument

Description

Check permission before performing a process control operation on thecurrent process.

Return

Return -ENOSYS if no-one wanted to handle this op, any other valueto causeprctl() to return immediately with that value.

voidsecurity_task_to_inode(structtask_struct*p,structinode*inode)¶: Set the security attributes of a task’s inode

Parameters

structtask_struct*p: task
structinode*inode: inode

Description

Set the security attributes for an inode based on an associated task’ssecurity attributes, e.g. for /proc/pid inodes.

intsecurity_create_user_ns(conststructcred*cred)¶: Check if creating a new userns is allowed

Parameters

conststructcred*cred: prepared creds

Description

Check permission prior to creating a new user namespace.

Return

Returns 0 if successful, otherwise < 0 error code.

intsecurity_ipc_permission(structkern_ipc_perm*ipcp,shortflag)¶: Check if sysv ipc access is allowed

Parameters

structkern_ipc_perm*ipcp: ipc permission structure
shortflag: requested permissions

Description

Check permissions for access to IPC.

Return

Returns 0 if permission is granted.

voidsecurity_ipc_getlsmprop(structkern_ipc_perm*ipcp,structlsm_prop*prop)¶: Get the sysv ipc object LSM data

Parameters

structkern_ipc_perm*ipcp: ipc permission structure
structlsm_prop*prop: pointer to lsm information

Description

Get the lsm information associated with the ipc object.

intsecurity_msg_msg_alloc(structmsg_msg*msg)¶: Allocate a sysv ipc message LSM blob

Parameters

structmsg_msg*msg: message structure

Description

Allocate and attach a security structure to the msg->security field. Thesecurity field is initialized to NULL when the structure is first created.

Return

Return 0 if operation was successful and permission is granted.

voidsecurity_msg_msg_free(structmsg_msg*msg)¶: Free a sysv ipc message LSM blob

Parameters

structmsg_msg*msg: message structure

Description

Deallocate the security structure for this message.

intsecurity_msg_queue_alloc(structkern_ipc_perm*msq)¶: Allocate a sysv ipc msg queue LSM blob

Parameters

structkern_ipc_perm*msq: sysv ipc permission structure

Description

Allocate and attach a security structure tomsg. The security field isinitialized to NULL when the structure is first created.

Return

Returns 0 if operation was successful and permission is granted.

voidsecurity_msg_queue_free(structkern_ipc_perm*msq)¶: Free a sysv ipc msg queue LSM blob

Parameters

structkern_ipc_perm*msq: sysv ipc permission structure

Description

Deallocate security fieldperm->security for the message queue.

intsecurity_msg_queue_associate(structkern_ipc_perm*msq,intmsqflg)¶: Check if a msg queue operation is allowed

Parameters

structkern_ipc_perm*msq: sysv ipc permission structure
intmsqflg: operation flags

Description

Check permission when a message queue is requested through the msgget systemcall. This hook is only called when returning the message queue identifierfor an existing message queue, not when a new message queue is created.

Return

Return 0 if permission is granted.

intsecurity_msg_queue_msgctl(structkern_ipc_perm*msq,intcmd)¶: Check if a msg queue operation is allowed

Parameters

structkern_ipc_perm*msq: sysv ipc permission structure
intcmd: operation

Description

Check permission when a message control operation specified bycmd is to beperformed on the message queue with permissions.

Return

Returns 0 if permission is granted.

intsecurity_msg_queue_msgsnd(structkern_ipc_perm*msq,structmsg_msg*msg,intmsqflg)¶: Check if sending a sysv ipc message is allowed

Parameters

structkern_ipc_perm*msq: sysv ipc permission structure
structmsg_msg*msg: message
intmsqflg: operation flags

Description

Check permission before a message,msg, is enqueued on the message queuewith permissions specified inmsq.

Return

Returns 0 if permission is granted.

intsecurity_msg_queue_msgrcv(structkern_ipc_perm*msq,structmsg_msg*msg,structtask_struct*target,longtype,intmode)¶: Check if receiving a sysv ipc msg is allowed

Parameters

structkern_ipc_perm*msq: sysv ipc permission structure
structmsg_msg*msg: message
structtask_struct*target: target task
longtype: type of message requested
intmode: operation flags

Description

Check permission before a message,msg, is removed from the message queue.Thetarget task structure contains a pointer to the process that will bereceiving the message (not equal to the current process when inline receivesare being performed).

Return

Returns 0 if permission is granted.

intsecurity_shm_alloc(structkern_ipc_perm*shp)¶: Allocate a sysv shm LSM blob

Parameters

structkern_ipc_perm*shp: sysv ipc permission structure

Description

Allocate and attach a security structure to theshp security field. Thesecurity field is initialized to NULL when the structure is first created.

Return

Returns 0 if operation was successful and permission is granted.

voidsecurity_shm_free(structkern_ipc_perm*shp)¶: Free a sysv shm LSM blob

Parameters

structkern_ipc_perm*shp: sysv ipc permission structure

Description

Deallocate the security structureperm->security for the memory segment.

intsecurity_shm_associate(structkern_ipc_perm*shp,intshmflg)¶: Check if a sysv shm operation is allowed

Parameters

structkern_ipc_perm*shp: sysv ipc permission structure
intshmflg: operation flags

Description

Check permission when a shared memory region is requested through the shmgetsystem call. This hook is only called when returning the shared memoryregion identifier for an existing region, not when a new shared memoryregion is created.

Return

Returns 0 if permission is granted.

intsecurity_shm_shmctl(structkern_ipc_perm*shp,intcmd)¶: Check if a sysv shm operation is allowed

Parameters

structkern_ipc_perm*shp: sysv ipc permission structure
intcmd: operation

Description

Check permission when a shared memory control operation specified bycmd isto be performed on the shared memory region with permissions inshp.

Return

Return 0 if permission is granted.

intsecurity_shm_shmat(structkern_ipc_perm*shp,char__user*shmaddr,intshmflg)¶: Check if a sysv shm attach operation is allowed

Parameters

structkern_ipc_perm*shp: sysv ipc permission structure
char__user*shmaddr: address of memory region to attach
intshmflg: operation flags

Description

Check permissions prior to allowing the shmat system call to attach theshared memory segment with permissionsshp to the data segment of thecalling process. The attaching address is specified byshmaddr.

Return

Returns 0 if permission is granted.

intsecurity_sem_alloc(structkern_ipc_perm*sma)¶: Allocate a sysv semaphore LSM blob

Parameters

structkern_ipc_perm*sma: sysv ipc permission structure

Description

Allocate and attach a security structure to thesma security field. Thesecurity field is initialized to NULL when the structure is first created.

Return

Returns 0 if operation was successful and permission is granted.

voidsecurity_sem_free(structkern_ipc_perm*sma)¶: Free a sysv semaphore LSM blob

Parameters

structkern_ipc_perm*sma: sysv ipc permission structure

Description

Deallocate security structuresma->security for the semaphore.

intsecurity_sem_associate(structkern_ipc_perm*sma,intsemflg)¶: Check if a sysv semaphore operation is allowed

Parameters

structkern_ipc_perm*sma: sysv ipc permission structure
intsemflg: operation flags

Description

Check permission when a semaphore is requested through the semget systemcall. This hook is only called when returning the semaphore identifier foran existing semaphore, not when a new one must be created.

Return

Returns 0 if permission is granted.

intsecurity_sem_semctl(structkern_ipc_perm*sma,intcmd)¶: Check if a sysv semaphore operation is allowed

Parameters

structkern_ipc_perm*sma: sysv ipc permission structure
intcmd: operation

Description

Check permission when a semaphore operation specified bycmd is to beperformed on the semaphore.

Return

Returns 0 if permission is granted.

intsecurity_sem_semop(structkern_ipc_perm*sma,structsembuf*sops,unsignednsops,intalter)¶: Check if a sysv semaphore operation is allowed

Parameters

structkern_ipc_perm*sma: sysv ipc permission structure
structsembuf*sops: operations to perform
unsignednsops: number of operations
intalter: flag indicating changes will be made

Description

Check permissions before performing operations on members of the semaphoreset. If thealter flag is nonzero, the semaphore set may be modified.

Return

Returns 0 if permission is granted.

intsecurity_getselfattr(unsignedintattr,structlsm_ctx__user*uctx,u32__user*size,u32flags)¶: Read an LSM attribute of the current process.

Parameters

unsignedintattr: which attribute to return
structlsm_ctx__user*uctx: the user-space destination for the information, or NULL
u32__user*size: pointer to the size of space available to receive the data
u32flags: special handling options. LSM_FLAG_SINGLE indicates that onlyattributes associated with the LSM identified in the passedctx bereported.

Description

A NULL value foructx can be used to get both the number of attributesand the size of the data.

Returns the number of attributes found on success, negative valueon error.size is reset to the total size of the data.Ifsize is insufficient to contain the data -E2BIG is returned.

intsecurity_setselfattr(unsignedintattr,structlsm_ctx__user*uctx,u32size,u32flags)¶: Set an LSM attribute on the current process.

Parameters

unsignedintattr: which attribute to set
structlsm_ctx__user*uctx: the user-space source for the information
u32size: the size of the data
u32flags: reserved for future use, must be 0

Description

Set an LSM attribute for the current process. The LSM, attributeand new value are included inuctx.

Returns 0 on success, -EINVAL if the input is inconsistent, -EFAULTif the user buffer is inaccessible, E2BIG if size is too big, or anLSM specific failure.

intsecurity_getprocattr(structtask_struct*p,intlsmid,constchar*name,char**value)¶: Read an attribute for a task

Parameters

structtask_struct*p: the task
intlsmid: LSM identification
constchar*name: attribute name
char**value: attribute value

Description

Read attributename for taskp and store it intovalue if allowed.

Return

Returns the length ofvalue on success, a negative value otherwise.

intsecurity_setprocattr(intlsmid,constchar*name,void*value,size_tsize)¶: Set an attribute for a task

Parameters

intlsmid: LSM identification
constchar*name: attribute name
void*value: attribute value
size_tsize: attribute value size

Description

Write (set) the current task’s attributename tovalue, sizesize ifallowed.

Return

Returns bytes written on success, a negative value otherwise.

intsecurity_post_notification(conststructcred*w_cred,conststructcred*cred,structwatch_notification*n)¶: Check if a watch notification can be posted

Parameters

conststructcred*w_cred: credentials of the task that set the watch
conststructcred*cred: credentials of the task which triggered the watch
structwatch_notification*n: the notification

Description

Check to see if a watch notification can be posted to a particular queue.

Return

Returns 0 if permission is granted.

intsecurity_watch_key(structkey*key)¶: Check if a task is allowed to watch for key events

Parameters

structkey*key: the key to watch

Description

Check to see if a process is allowed to watch for event notifications froma key or keyring.

Return

Returns 0 if permission is granted.

intsecurity_netlink_send(structsock*sk,structsk_buff*skb)¶: Save info and check if netlink sending is allowed

Parameters

structsock*sk: sending socket
structsk_buff*skb: netlink message

Description

Save security information for a netlink message so that permission checkingcan be performed when the message is processed. The security informationcan be saved using the eff_cap field of the netlink_skb_parms structure.Also may be used to provide fine grained control over message transmission.

Return

Returns 0 if the information was successfully saved and message isallowed to be transmitted.

intsecurity_socket_create(intfamily,inttype,intprotocol,intkern)¶: Check if creating a new socket is allowed

Parameters

intfamily: protocol family
inttype: communications type
intprotocol: requested protocol
intkern: set to 1 if a kernel socket is requested

Description

Check permissions prior to creating a new socket.

Return

Returns 0 if permission is granted.

intsecurity_socket_post_create(structsocket*sock,intfamily,inttype,intprotocol,intkern)¶: Initialize a newly created socket

Parameters

structsocket*sock: socket
intfamily: protocol family
inttype: communications type
intprotocol: requested protocol
intkern: set to 1 if a kernel socket is requested

Description

This hook allows a module to update or allocate a per-socket securitystructure. Note that the security field was not added directly to the socketstructure, but rather, the socket security information is stored in theassociated inode. Typically, the inode alloc_security hook will allocateand attach security information to SOCK_INODE(sock)->i_security. This hookmay be used to update the SOCK_INODE(sock)->i_security field with additionalinformation that wasn’t available when the inode was allocated.

Return

Returns 0 if permission is granted.

intsecurity_socket_bind(structsocket*sock,structsockaddr*address,intaddrlen)¶: Check if a socket bind operation is allowed

Parameters

structsocket*sock: socket
structsockaddr*address: requested bind address
intaddrlen: length of address

Description

Check permission before socket protocol layer bind operation is performedand the socketsock is bound to the address specified in theaddressparameter.

Return

Returns 0 if permission is granted.

intsecurity_socket_connect(structsocket*sock,structsockaddr*address,intaddrlen)¶: Check if a socket connect operation is allowed

Parameters

structsocket*sock: socket
structsockaddr*address: address of remote connection point
intaddrlen: length of address

Description

Check permission before socket protocol layer connect operation attempts toconnect socketsock to a remote address,address.

Return

Returns 0 if permission is granted.

intsecurity_socket_listen(structsocket*sock,intbacklog)¶: Check if a socket is allowed to listen

Parameters

structsocket*sock: socket
intbacklog: connection queue size

Description

Check permission before socket protocol layer listen operation.

Return

Returns 0 if permission is granted.

intsecurity_socket_accept(structsocket*sock,structsocket*newsock)¶: Check if a socket is allowed to accept connections

Parameters

structsocket*sock: listening socket
structsocket*newsock: newly creation connection socket

Description

Check permission before accepting a new connection. Note that the newsocket,newsock, has been created and some information copied to it, butthe accept operation has not actually been performed.

Return

Returns 0 if permission is granted.

intsecurity_socket_sendmsg(structsocket*sock,structmsghdr*msg,intsize)¶: Check if sending a message is allowed

Parameters

structsocket*sock: sending socket
structmsghdr*msg: message to send
intsize: size of message

Description

Check permission before transmitting a message to another socket.

Return

Returns 0 if permission is granted.

intsecurity_socket_recvmsg(structsocket*sock,structmsghdr*msg,intsize,intflags)¶: Check if receiving a message is allowed

Parameters

structsocket*sock: receiving socket
structmsghdr*msg: message to receive
intsize: size of message
intflags: operational flags

Description

Check permission before receiving a message from a socket.

Return

Returns 0 if permission is granted.

intsecurity_socket_getsockname(structsocket*sock)¶: Check if reading the socket addr is allowed

Parameters

structsocket*sock: socket

Description

Check permission before reading the local address (name) of the socketobject.

Return

Returns 0 if permission is granted.

intsecurity_socket_getpeername(structsocket*sock)¶: Check if reading the peer’s addr is allowed

Parameters

structsocket*sock: socket

Description

Check permission before the remote address (name) of a socket object.

Return

Returns 0 if permission is granted.

intsecurity_socket_getsockopt(structsocket*sock,intlevel,intoptname)¶: Check if reading a socket option is allowed

Parameters

structsocket*sock: socket
intlevel: option’s protocol level
intoptname: option name

Description

Check permissions before retrieving the options associated with socketsock.

Return

Returns 0 if permission is granted.

intsecurity_socket_setsockopt(structsocket*sock,intlevel,intoptname)¶: Check if setting a socket option is allowed

Parameters

structsocket*sock: socket
intlevel: option’s protocol level
intoptname: option name

Description

Check permissions before setting the options associated with socketsock.

Return

Returns 0 if permission is granted.

intsecurity_socket_shutdown(structsocket*sock,inthow)¶: Checks if shutting down the socket is allowed

Parameters

structsocket*sock: socket
inthow: flag indicating how sends and receives are handled

Description

Checks permission before all or part of a connection on the socketsock isshut down.

Return

Returns 0 if permission is granted.

intsecurity_socket_getpeersec_stream(structsocket*sock,sockptr_toptval,sockptr_toptlen,unsignedintlen)¶: Get the remote peer label

Parameters

structsocket*sock: socket
sockptr_toptval: destination buffer
sockptr_toptlen: size of peer label copied into the buffer
unsignedintlen: maximum size of the destination buffer

Description

This hook allows the security module to provide peer socket security statefor unix or connected tcp sockets to userspace via getsockopt SO_GETPEERSEC.For tcp sockets this can be meaningful if the socket is associated with anipsec SA.

Return

Returns 0 if all is well, otherwise, typical getsockopt returnvalues.

intlsm_sock_alloc(structsock*sock,gfp_tgfp)¶: allocate a composite sock blob

Parameters

structsock*sock: the sock that needs a blob
gfp_tgfp: allocation mode

Description

Allocate the sock blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intsecurity_sk_alloc(structsock*sk,intfamily,gfp_tpriority)¶: Allocate and initialize a sock’s LSM blob

Parameters

structsock*sk: sock
intfamily: protocol family
gfp_tpriority: gfp flags

Description

Allocate and attach a security structure to the sk->sk_security field, whichis used to copy security attributes between local stream sockets.

Return

Returns 0 on success, error on failure.

voidsecurity_sk_free(structsock*sk)¶: Free the sock’s LSM blob

Parameters

structsock*sk: sock

Description

Deallocate security structure.

voidsecurity_inet_csk_clone(structsock*newsk,conststructrequest_sock*req)¶: Set new sock LSM state based on request_sock

Parameters

structsock*newsk: new sock
conststructrequest_sock*req: connection request_sock

Description

Set that LSM state ofsock using the LSM state fromreq.

intsecurity_mptcp_add_subflow(structsock*sk,structsock*ssk)¶: Inherit the LSM label from the MPTCP socket

Parameters

structsock*sk: the owning MPTCP socket
structsock*ssk: the new subflow

Description

Update the labeling for the given MPTCP subflow, to match the one of theowning MPTCP socket. This hook has to be called after the socket creation andinitialization via thesecurity_socket_create() andsecurity_socket_post_create() LSM hooks.

Return

Returns 0 on success or a negative error code on failure.

intsecurity_xfrm_policy_clone(structxfrm_sec_ctx*old_ctx,structxfrm_sec_ctx**new_ctxp)¶: Clone xfrm policy LSM state

Parameters

structxfrm_sec_ctx*old_ctx: xfrm security context
structxfrm_sec_ctx**new_ctxp: target xfrm security context

Description

Allocate a security structure in new_ctxp that contains the information fromthe old_ctx structure.

Return

Return 0 if operation was successful.

intsecurity_xfrm_policy_delete(structxfrm_sec_ctx*ctx)¶: Check if deleting a xfrm policy is allowed

Parameters

structxfrm_sec_ctx*ctx: xfrm security context

Description

Authorize deletion of a SPD entry.

Return

Returns 0 if permission is granted.

intsecurity_xfrm_state_alloc_acquire(structxfrm_state*x,structxfrm_sec_ctx*polsec,u32secid)¶: Allocate a xfrm state LSM blob

Parameters

structxfrm_state*x: xfrm state being added to the SAD
structxfrm_sec_ctx*polsec: associated policy’s security context
u32secid: secid from the flow

Description

Allocate a security structure to the x->security field; the security fieldis initialized to NULL when the xfrm_state is allocated. Set the context tocorrespond to secid.

Return

Returns 0 if operation was successful.

voidsecurity_xfrm_state_free(structxfrm_state*x)¶: Free a xfrm state

Parameters

structxfrm_state*x: xfrm state

Description

Deallocate x->security.

intsecurity_xfrm_policy_lookup(structxfrm_sec_ctx*ctx,u32fl_secid)¶: Check if using a xfrm policy is allowed

Parameters

structxfrm_sec_ctx*ctx: target xfrm security context
u32fl_secid: flow secid used to authorize access

Description

Check permission when a flow selects a xfrm_policy for processing XFRMs on apacket. The hook is called when selecting either a per-socket policy or ageneric xfrm policy.

Return

Return 0 if permission is granted, -ESRCH otherwise, or -errno onother errors.

intsecurity_xfrm_state_pol_flow_match(structxfrm_state*x,structxfrm_policy*xp,conststructflowi_common*flic)¶: Check for a xfrm match

Parameters

structxfrm_state*x: xfrm state to match
structxfrm_policy*xp: xfrm policy to check for a match
conststructflowi_common*flic: flow to check for a match.

Description

Checkxp andflic for a match withx.

Return

Returns 1 if there is a match.

intsecurity_xfrm_decode_session(structsk_buff*skb,u32*secid)¶: Determine the xfrm secid for a packet

Parameters

structsk_buff*skb: xfrm packet
u32*secid: secid

Description

Decode the packet inskb and return the security label insecid.

Return

Return 0 if all xfrms used have the same secid.

intsecurity_key_alloc(structkey*key,conststructcred*cred,unsignedlongflags)¶: Allocate and initialize a kernel key LSM blob

Parameters

structkey*key: key
conststructcred*cred: credentials
unsignedlongflags: allocation flags

Description

Permit allocation of a key and assign security data. Note that key does nothave a serial number assigned at this point.

Return

Return 0 if permission is granted, -ve error otherwise.

voidsecurity_key_free(structkey*key)¶: Free a kernel key LSM blob

Parameters

structkey*key: key

Description

Notification of destruction; free security data.

intsecurity_key_permission(key_ref_tkey_ref,conststructcred*cred,enumkey_need_permneed_perm)¶: Check if a kernel key operation is allowed

Parameters

key_ref_tkey_ref: key reference
conststructcred*cred: credentials of actor requesting access
enumkey_need_permneed_perm: requested permissions

Description

See whether a specific operational right is granted to a process on a key.

Return

Return 0 if permission is granted, -ve error otherwise.

intsecurity_key_getsecurity(structkey*key,char**buffer)¶: Get the key’s security label

Parameters

structkey*key: key
char**buffer: security label buffer

Description

Get a textual representation of the security context attached to a key forthe purposes of honouring KEYCTL_GETSECURITY. This function allocates thestorage for the NUL-terminated string and the caller should free it.

Return

Returns the length ofbuffer (including terminating NUL) or -ve ifan error occurs. May also return 0 (and a NULL buffer pointer) ifthere is no security label assigned to the key.

voidsecurity_key_post_create_or_update(structkey*keyring,structkey*key,constvoid*payload,size_tpayload_len,unsignedlongflags,boolcreate)¶: Notification of key create or update

Parameters

structkey*keyring: keyring to which the key is linked to
structkey*key: created or updated key
constvoid*payload: data used to instantiate or update the key
size_tpayload_len: length of payload
unsignedlongflags: key flags
boolcreate: flag indicating whether the key was created or updated

Description

Notify the caller of a key creation or update.

intsecurity_audit_rule_init(u32field,u32op,char*rulestr,void**lsmrule,gfp_tgfp)¶: Allocate and init an LSM audit rule struct

Parameters

u32field: audit action
u32op: rule operator
char*rulestr: rule context
void**lsmrule: receive buffer for audit rule struct
gfp_tgfp: GFP flag used for kmalloc

Description

Allocate and initialize an LSM audit rule structure.

Return

Return 0 iflsmrule has been successfully set, -EINVAL in case ofan invalid rule.

intsecurity_audit_rule_known(structaudit_krule*krule)¶: Check if an audit rule contains LSM fields

Parameters

structaudit_krule*krule: audit rule

Description

Specifies whether givenkrule contains any fields related to the currentLSM.

Return

Returns 1 in case of relation found, 0 otherwise.

voidsecurity_audit_rule_free(void*lsmrule)¶: Free an LSM audit rule struct

Parameters

void*lsmrule: audit rule struct

Description

Deallocate the LSM audit rule structure previously allocated byaudit_rule_init().

intsecurity_audit_rule_match(structlsm_prop*prop,u32field,u32op,void*lsmrule)¶: Check if a label matches an audit rule

Parameters

structlsm_prop*prop: security label
u32field: LSM audit field
u32op: matching operator
void*lsmrule: audit rule

Description

Determine if givensecid matches a rule previously approved bysecurity_audit_rule_known().

Return

Returns 1 if secid matches the rule, 0 if it does not, -ERRNO onfailure.

intsecurity_bpf(intcmd,unionbpf_attr*attr,unsignedintsize,boolkernel)¶: Check if the bpf syscall operation is allowed

Parameters

intcmd: command
unionbpf_attr*attr: bpf attribute
unsignedintsize: size
boolkernel: whether or not call originated from kernel

Description

Do a initial check for all bpf syscalls after the attribute is copied intothe kernel. The actual security module can implement their own rules tocheck the specific cmd they need.

Return

Returns 0 if permission is granted.

intsecurity_bpf_map(structbpf_map*map,fmode_tfmode)¶: Check if access to a bpf map is allowed

Parameters

structbpf_map*map: bpf map
fmode_tfmode: mode

Description

Do a check when the kernel generates and returns a file descriptor for eBPFmaps.

Return

Returns 0 if permission is granted.

intsecurity_bpf_prog(structbpf_prog*prog)¶: Check if access to a bpf program is allowed

Parameters

structbpf_prog*prog: bpf program

Description

Do a check when the kernel generates and returns a file descriptor for eBPFprograms.

Return

Returns 0 if permission is granted.

intsecurity_bpf_map_create(structbpf_map*map,unionbpf_attr*attr,structbpf_token*token,boolkernel)¶: Check if BPF map creation is allowed

Parameters

structbpf_map*map: BPF map object
unionbpf_attr*attr: BPF syscall attributes used to create BPF map
structbpf_token*token: BPF token used to grant user access
boolkernel: whether or not call originated from kernel

Description

Do a check when the kernel creates a new BPF map. This is also thepoint where LSM blob is allocated for LSMs that need them.

Return

Returns 0 on success, error on failure.

intsecurity_bpf_prog_load(structbpf_prog*prog,unionbpf_attr*attr,structbpf_token*token,boolkernel)¶: Check if loading of BPF program is allowed

Parameters

structbpf_prog*prog: BPF program object
unionbpf_attr*attr: BPF syscall attributes used to create BPF program
structbpf_token*token: BPF token used to grant user access to BPF subsystem
boolkernel: whether or not call originated from kernel

Description

Perform an access control check when the kernel loads a BPF program andallocates associated BPF program object. This hook is also responsible forallocating any required LSM state for the BPF program.

Return

Returns 0 on success, error on failure.

intsecurity_bpf_token_create(structbpf_token*token,unionbpf_attr*attr,conststructpath*path)¶: Check if creating of BPF token is allowed

Parameters

structbpf_token*token: BPF token object
unionbpf_attr*attr: BPF syscall attributes used to create BPF token
conststructpath*path: path pointing to BPF FS mount point from which BPF token is created

Description

Do a check when the kernel instantiates a new BPF token object from BPF FSinstance. This is also the point where LSM blob can be allocated for LSMs.

Return

Returns 0 on success, error on failure.

intsecurity_bpf_token_cmd(conststructbpf_token*token,enumbpf_cmdcmd)¶: Check if BPF token is allowed to delegate requested BPF syscall command

Parameters

conststructbpf_token*token: BPF token object
enumbpf_cmdcmd: BPF syscall command requested to be delegated by BPF token

Description

Do a check when the kernel decides whether provided BPF token should allowdelegation of requested BPF syscall command.

Return

Returns 0 on success, error on failure.

intsecurity_bpf_token_capable(conststructbpf_token*token,intcap)¶: Check if BPF token is allowed to delegate requested BPF-related capability

Parameters

conststructbpf_token*token: BPF token object
intcap: capabilities requested to be delegated by BPF token

Description

Do a check when the kernel decides whether provided BPF token should allowdelegation of requested BPF-related capabilities.

Return

Returns 0 on success, error on failure.

voidsecurity_bpf_map_free(structbpf_map*map)¶: Free a bpf map’s LSM blob

Parameters

structbpf_map*map: bpf map

Description

Clean up the security information stored inside bpf map.

voidsecurity_bpf_prog_free(structbpf_prog*prog)¶: Free a BPF program’s LSM blob

Parameters

structbpf_prog*prog: BPF program struct

Description

Clean up the security information stored inside BPF program.

voidsecurity_bpf_token_free(structbpf_token*token)¶: Free a BPF token’s LSM blob

Parameters

structbpf_token*token: BPF token struct

Description

Clean up the security information stored inside BPF token.

intsecurity_perf_event_open(inttype)¶: Check if a perf event open is allowed

Parameters

inttype: type of event

Description

Check whether thetype of perf_event_open syscall is allowed.

Return

Returns 0 if permission is granted.

intsecurity_perf_event_alloc(structperf_event*event)¶: Allocate a perf event LSM blob

Parameters

structperf_event*event: perf event

Description

Allocate and save perf_event security info.

Return

Returns 0 on success, error on failure.

voidsecurity_perf_event_free(structperf_event*event)¶: Free a perf event LSM blob

Parameters

structperf_event*event: perf event

Description

Release (free) perf_event security info.

intsecurity_perf_event_read(structperf_event*event)¶: Check if reading a perf event label is allowed

Parameters

structperf_event*event: perf event

Description

Read perf_event security info if allowed.

Return

Returns 0 if permission is granted.

intsecurity_perf_event_write(structperf_event*event)¶: Check if writing a perf event label is allowed

Parameters

structperf_event*event: perf event

Description

Write perf_event security info if allowed.

Return

Returns 0 if permission is granted.

intsecurity_uring_override_creds(conststructcred*new)¶: Check if overriding creds is allowed

Parameters

conststructcred*new: new credentials

Description

Check if the current task, executing an io_uring operation, is allowed tooverride it’s credentials withnew.

Return

Returns 0 if permission is granted.

intsecurity_uring_sqpoll(void)¶: Check if IORING_SETUP_SQPOLL is allowed

Parameters

void: no arguments

Description

Check whether the current task is allowed to spawn a io_uring polling thread(IORING_SETUP_SQPOLL).

Return

Returns 0 if permission is granted.

intsecurity_uring_cmd(structio_uring_cmd*ioucmd)¶: Check if a io_uring passthrough command is allowed

Parameters

structio_uring_cmd*ioucmd: command

Description

Check whether the file_operations uring_cmd is allowed to run.

Return

Returns 0 if permission is granted.

intsecurity_uring_allowed(void)¶: Check ifio_uring_setup() is allowed

Parameters

void: no arguments

Description

Check whether the current task is allowed to callio_uring_setup().

Return

Returns 0 if permission is granted.

voidsecurity_initramfs_populated(void)¶: Notify LSMs that initramfs has been loaded

Parameters

void: no arguments

Description

Tells the LSMs the initramfs has been unpacked into the rootfs.

structdentry*securityfs_create_file(constchar*name,umode_tmode,structdentry*parent,void*data,conststructfile_operations*fops)¶: create a file in the securityfs filesystem

Parameters

constchar*name: a pointer to a string containing the name of the file to create.
umode_tmode: the permission that the file should have
structdentry*parent: a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the securityfs filesystem.
void*data: a pointer to something that the caller will want to get to lateron. The inode.i_private pointer will point to this value onthe open() call.
conststructfile_operations*fops: a pointer to astructfile_operations that should be used forthis file.

Description

This function creates a file in securityfs with the givenname.

This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove() function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).

If securityfs is not enabled in the kernel, the value-ENODEV isreturned.

structdentry*securityfs_create_dir(constchar*name,structdentry*parent)¶: create a directory in the securityfs filesystem

Parameters

constchar*name: a pointer to a string containing the name of the directory tocreate.
structdentry*parent: a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thedirectory will be created in the root of the securityfs filesystem.

Description

This function creates a directory in securityfs with the givenname.

If securityfs is not enabled in the kernel, the value-ENODEV isreturned.

structdentry*securityfs_create_symlink(constchar*name,structdentry*parent,constchar*target,conststructinode_operations*iops)¶: create a symlink in the securityfs filesystem

Parameters

constchar*name: a pointer to a string containing the name of the symlink tocreate.
structdentry*parent: a pointer to the parent dentry for the symlink. This should be adirectory dentry if set. If this parameter isNULL, then thedirectory will be created in the root of the securityfs filesystem.
constchar*target: a pointer to a string containing the name of the symlink’s target.If this parameter isNULL, then theiops parameter needs to besetup to handle .readlink and .get_link inode_operations.
conststructinode_operations*iops: a pointer to thestructinode_operations to use for the symlink. Ifthis parameter isNULL, then the default simple_symlink_inodeoperations will be used.

Description

This function creates a symlink in securityfs with the givenname.

If securityfs is not enabled in the kernel, the value-ENODEV isreturned.

voidsecurityfs_remove(structdentry*dentry)¶: removes a file or directory from the securityfs filesystem

Parameters

structdentry*dentry: a pointer to a the dentry of the file or directory to be removed.

Description

This function removes a file or directory in securityfs that was previouslycreated with a call to another securityfs function (likesecurityfs_create_file() or variants thereof.)

This function is required to be called in order for the file to beremoved. No automatic cleanup of files will happen when a module isremoved; you are responsible here.

AV: when applied to directory it will take all children out; no need to callit for descendents if ancestor is getting killed.

Audit Interfaces¶

structaudit_buffer*audit_log_start(structaudit_context*ctx,gfp_tgfp_mask,inttype)¶: obtain an audit buffer

Parameters

structaudit_context*ctx: audit_context (may be NULL)
gfp_tgfp_mask: type of allocation
inttype: audit message type

Description

Returns audit_buffer pointer on success or NULL on error.

Obtain an audit buffer. This routine does locking to obtain theaudit buffer, but then no locking is required for calls toaudit_log_*format. If the task (ctx) is a task that is currently in asyscall, then the syscall is marked as auditable and an audit recordwill be written at syscall exit. If there is no associated task, thentask context (ctx) should be NULL.

voidaudit_log_format(structaudit_buffer*ab,constchar*fmt,...)¶: format a message into the audit buffer.

Parameters

structaudit_buffer*ab: audit_buffer
constchar*fmt: format string
...: optional parameters matchingfmt string

Description

All the work is done in audit_log_vformat.

intaudit_log_subj_ctx(structaudit_buffer*ab,structlsm_prop*prop)¶: Add LSM subject information

Parameters

structaudit_buffer*ab: audit_buffer
structlsm_prop*prop: LSM subject properties.

Description

Add a subj= field and, if necessary, a AUDIT_MAC_TASK_CONTEXTS record.

voidaudit_log_end(structaudit_buffer*ab)¶: end one audit record

Parameters

structaudit_buffer*ab: the audit_buffer

Description

We can not do a netlink send inside an irq context because it blocks (lastarg, flags, is not set to MSG_DONTWAIT), so the audit buffer is placed on aqueue and a kthread is scheduled to remove them from the queue outside theirq context. May be called in any context.

voidaudit_log(structaudit_context*ctx,gfp_tgfp_mask,inttype,constchar*fmt,...)¶: Log an audit record

Parameters

structaudit_context*ctx: audit context
gfp_tgfp_mask: type of allocation
inttype: audit message type
constchar*fmt: format string to use
...: variable parameters matching the format string

Description

This is a convenience function that calls audit_log_start,audit_log_vformat, and audit_log_end. It may be calledin any context.

int__audit_filter_op(structtask_struct*tsk,structaudit_context*ctx,structlist_head*list,structaudit_names*name,unsignedlongop)¶: common filter helper for operations (syscall/uring/etc)

Parameters

structtask_struct*tsk: associated task
structaudit_context*ctx: audit context
structlist_head*list: audit filter list
structaudit_names*name: audit_name (can be NULL)
unsignedlongop: current syscall/uring_op

Description

Run the udit filters specified inlist againsttsk usingctx,name, andop, as necessary; the caller is responsible for ensuringthat the call is made while the RCU read lock is held. Thenameparameter can be NULL, but all others must be specified.Returns 1/true if the filter finds a match, 0/false if none are found.

voidaudit_filter_uring(structtask_struct*tsk,structaudit_context*ctx)¶: apply filters to an io_uring operation

Parameters

structtask_struct*tsk: associated task
structaudit_context*ctx: audit context

voidaudit_reset_context(structaudit_context*ctx)¶: reset a audit_context structure

Parameters

structaudit_context*ctx: the audit_context to reset

Description

All fields in the audit_context will be reset to an initial state, allreferences held by fields will be dropped, and private memory will bereleased. When this function returns the audit_context will be suitablefor reuse, so long as the passed context is not NULL or a dummy context.

intaudit_alloc(structtask_struct*tsk)¶: allocate an audit context block for a task

Parameters

structtask_struct*tsk: task

Description

Filter on the task information and allocate a per-task audit contextif necessary. Doing so turns on system call auditing for thespecified task. This is called from copy_process, so no lock isneeded.

voidaudit_log_uring(structaudit_context*ctx)¶: generate a AUDIT_URINGOP record

Parameters

structaudit_context*ctx: the audit context

void__audit_free(structtask_struct*tsk)¶: free a per-task audit context

Parameters

structtask_struct*tsk: task whose audit context block to free

Description

Called from copy_process, do_exit, and the io_uring code

voidaudit_return_fixup(structaudit_context*ctx,intsuccess,longcode)¶: fixup the return codes in the audit_context

Parameters

structaudit_context*ctx: the audit_context
intsuccess: true/false value to indicate if the operation succeeded or not
longcode: operation return code

Description

We need to fixup the return code in the audit logs if the actual returncodes are later going to be fixed by the arch specific signal handlers.

void__audit_uring_entry(u8op)¶: prepare the kernel task’s audit context for io_uring

Parameters

u8op: the io_uring opcode

Description

This is similar toaudit_syscall_entry() but is intended for use by io_uringoperations. This function should only ever be called fromaudit_uring_entry() as we rely on the audit context checking present in thatfunction.

void__audit_uring_exit(intsuccess,longcode)¶: wrap up the kernel task’s audit context after io_uring

Parameters

intsuccess: true/false value to indicate if the operation succeeded or not
longcode: operation return code

Description

This is similar toaudit_syscall_exit() but is intended for use by io_uringoperations. This function should only ever be called fromaudit_uring_exit() as we rely on the audit context checking present in thatfunction.

void__audit_syscall_entry(intmajor,unsignedlonga1,unsignedlonga2,unsignedlonga3,unsignedlonga4)¶: fill in an audit record at syscall entry

Parameters

intmajor: major syscall type (function)
unsignedlonga1: additional syscall register 1
unsignedlonga2: additional syscall register 2
unsignedlonga3: additional syscall register 3
unsignedlonga4: additional syscall register 4

Description

Fill in audit context at syscall entry. This only happens if theaudit context was created when the task was created and the state orfilters demand the audit context be built. If the state from theper-task filter or from the per-syscall filter is AUDIT_STATE_RECORD,then the record will be written at syscall exit time (otherwise, itwill only be written if another part of the kernel requests that itbe written).

void__audit_syscall_exit(intsuccess,longreturn_code)¶: deallocate audit context after a system call

Parameters

intsuccess: success value of the syscall
longreturn_code: return value of the syscall

Description

Tear down after system call. If the audit context has been marked asauditable (either because of the AUDIT_STATE_RECORD state fromfiltering, or because some other part of the kernel wrote an auditmessage), then write out the syscall information. In call cases,free the names stored fromgetname().

structfilename*__audit_reusename(__userconstchar*uptr)¶: fill out filename with info from existing entry

Parameters

const__userchar*uptr: userland ptr to pathname

Description

Search the audit_names list for the current audit context. If there is anexisting entry with a matching “uptr” then return the filenameassociated with that audit_name. If not, return NULL.

void__audit_getname(structfilename*name)¶: add a name to the list

Parameters

structfilename*name: name to add

Description

Add a name to the list of audit names for this context.Called from fs/namei.c:getname().

void__audit_inode(structfilename*name,conststructdentry*dentry,unsignedintflags)¶: store the inode and device from a lookup

Parameters

structfilename*name: name being audited
conststructdentry*dentry: dentry being audited
unsignedintflags: attributes for this particular entry

intauditsc_get_stamp(structaudit_context*ctx,structaudit_stamp*stamp)¶: get local copies of audit_context values

Parameters

structaudit_context*ctx: audit_context for the task
structaudit_stamp*stamp: timestamp to record

Description

Also sets the context as auditable.

void__audit_mq_open(intoflag,umode_tmode,structmq_attr*attr)¶: record audit data for a POSIX MQ open

Parameters

intoflag: open flag
umode_tmode: mode bits
structmq_attr*attr: queue attributes

void__audit_mq_sendrecv(mqd_tmqdes,size_tmsg_len,unsignedintmsg_prio,conststructtimespec64*abs_timeout)¶: record audit data for a POSIX MQ timed send/receive

Parameters

mqd_tmqdes: MQ descriptor
size_tmsg_len: Message length
unsignedintmsg_prio: Message priority
conststructtimespec64*abs_timeout: Message timeout in absolute time

void__audit_mq_notify(mqd_tmqdes,conststructsigevent*notification)¶: record audit data for a POSIX MQ notify

Parameters

mqd_tmqdes: MQ descriptor
conststructsigevent*notification: Notification event

void__audit_mq_getsetattr(mqd_tmqdes,structmq_attr*mqstat)¶: record audit data for a POSIX MQ get/set attribute

Parameters

mqd_tmqdes: MQ descriptor
structmq_attr*mqstat: MQ flags

void__audit_ipc_obj(structkern_ipc_perm*ipcp)¶: record audit data for ipc object

Parameters

structkern_ipc_perm*ipcp: ipc permissions

void__audit_ipc_set_perm(unsignedlongqbytes,uid_tuid,gid_tgid,umode_tmode)¶: record audit data for new ipc permissions

Parameters

unsignedlongqbytes: msgq bytes
uid_tuid: msgq user id
gid_tgid: msgq group id
umode_tmode: msgq mode (permissions)

Description

Called only afteraudit_ipc_obj().

int__audit_socketcall(intnargs,unsignedlong*args)¶: record audit data for sys_socketcall

Parameters

intnargs: number of args, which should not be more than AUDITSC_ARGS.
unsignedlong*args: args array

void__audit_fd_pair(intfd1,intfd2)¶: record audit data for pipe and socketpair

Parameters

intfd1: the first file descriptor
intfd2: the second file descriptor

int__audit_sockaddr(intlen,void*a)¶: record audit data for sys_bind, sys_connect, sys_sendto

Parameters

intlen: data length in user space
void*a: data address in kernel space

Description

Returns 0 for success or NULL context or < 0 on error.

intaudit_signal_info_syscall(structtask_struct*t)¶: record signal info for syscalls

Parameters

structtask_struct*t: task being signaled

Description

If the audit subsystem is being terminated, record the task (pid)and uid that is doing that.

int__audit_log_bprm_fcaps(structlinux_binprm*bprm,conststructcred*new,conststructcred*old)¶: store information about a loading bprm and relevant fcaps

Parameters

structlinux_binprm*bprm: pointer to the bprm being processed
conststructcred*new: the proposed new credentials
conststructcred*old: the old credentials

Description

Simply check if the proc already has the caps given by the file and if notstore the priv escalation info for later auditing at the end of the syscall

-Eric

void__audit_log_capset(conststructcred*new,conststructcred*old)¶: store information about the arguments to the capset syscall

Parameters

conststructcred*new: the new credentials
conststructcred*old: the old (current) credentials

Description

Record the arguments userspace sent to sys_capset for later printing by theaudit system if applicable

voidaudit_core_dumps(longsignr)¶: record information about processes that end abnormally

Parameters

longsignr: signal value

Description

If a process ends with a core dump, something fishy is going on and weshould record the event for investigation.

voidaudit_seccomp(unsignedlongsyscall,longsignr,intcode)¶: record information about a seccomp action

Parameters

unsignedlongsyscall: syscall number
longsignr: signal value
intcode: the seccomp action

Description

Record the information associated with a seccomp action. Event filtering forseccomp actions that are not to be logged is done inseccomp_log().Therefore, this function forces auditing independent of the audit_enabledand dummy context state because seccomp actions should be logged even whenaudit is not in use.

intaudit_rule_change(inttype,intseq,void*data,size_tdatasz)¶: apply all rules to the specified message type

Parameters

inttype: audit message type
intseq: netlink audit message sequence (serial) number
void*data: payload data
size_tdatasz: size of payload data

intaudit_list_rules_send(structsk_buff*request_skb,intseq)¶: list the audit rules

Parameters

structsk_buff*request_skb: skb of request we are replying to (used to target the reply)
intseq: netlink audit message sequence (serial) number

intparent_len(constchar*path)¶: find the length of the parent portion of a pathname

Parameters

constchar*path: pathname of which to determine length

intaudit_compare_dname_path(conststructqstr*dname,constchar*path,intparentlen)¶: compare given dentry name with last component in given path. Return of 0 indicates a match.

Parameters

conststructqstr*dname: dentry name that we’re comparing
constchar*path: full pathname that we’re comparing
intparentlen: length of the parent if known. Passing in AUDIT_NAME_FULLhere indicates that we must compute this value.

Accounting Framework¶

longsys_acct(constchar__user*name)¶: enable/disable process accounting

Parameters

constchar__user*name: file name for accounting records or NULL to shutdown accounting

Description

sys_acct() is the only system call needed to implement processaccounting. It takes the name of the file where accounting recordsshould be written. If the filename is NULL, accounting will beshutdown.

Return

0 for success or negative errno values for failure.

voidacct_collect(longexitcode,intgroup_dead)¶: collect accounting information into pacct_struct

Parameters

longexitcode: task exit code
intgroup_dead: not 0, if this thread is the last one in the process.

voidacct_process(void)¶: handles process accounting for an exiting task

Parameters

void: no arguments

Block Devices¶

voidbio_advance(structbio*bio,unsignedintnbytes)¶: increment/complete a bio by some number of bytes

Parameters

structbio*bio: bio to advance
unsignedintnbytes: number of bytes to complete

Description

This updates bi_sector, bi_size and bi_idx; if the number of bytes tocomplete doesn’t align with a bvec boundary, then bv_len and bv_offset willbe updated on the last bvec as well.

bio will then represent the remaining, uncompleted portion of the io.

structfolio_iter¶: State for iterating all folios in a bio.

Definition:

struct folio_iter {    struct folio *folio;    size_t offset;    size_t length;};

Members

folio: The current folio we’re iterating. NULL after the last folio.
offset: The byte offset within the current folio.
length: The number of bytes in this iteration (will not cross folioboundary).

bio_for_each_folio_all¶

bio_for_each_folio_all(fi,bio)

Iterate over each folio in a bio.

Parameters

fi: structfolio_iter which is updated for each folio.
bio: structbio to iterate over.

structbio*bio_next_split(structbio*bio,intsectors,gfp_tgfp,structbio_set*bs)¶: get nextsectors from a bio, splitting if necessary

Parameters

structbio*bio: bio to split
intsectors: number of sectors to split from the front ofbio
gfp_tgfp: gfp mask
structbio_set*bs: bio set to allocate from

Return

a bio representing the nextsectors ofbio - if the bio is smallerthansectors, returns the original bio unchanged.

unsignedintbio_add_max_vecs(void*kaddr,unsignedintlen)¶: number of bio_vecs needed to add data to a bio

Parameters

void*kaddr: kernel virtual address to add
unsignedintlen: length in bytes to add

Description

Calculate how many bio_vecs need to be allocated to add the kernel virtualaddress range in [kaddr:len] in the worse case.

boolbio_is_zone_append(structbio*bio)¶: is this a zone append bio?

Parameters

structbio*bio: bio to check

Description

Check ifbio is a zone append operation. Core block layer code and end_iohandlers must use this instead of an open coded REQ_OP_ZONE_APPEND checkbecause the block layer can rewrite REQ_OP_ZONE_APPEND to REQ_OP_WRITE ifit is not natively supported.

voidblk_queue_flag_set(unsignedintflag,structrequest_queue*q)¶: atomically set a queue flag

Parameters

unsignedintflag: flag to be set
structrequest_queue*q: request queue

voidblk_queue_flag_clear(unsignedintflag,structrequest_queue*q)¶: atomically clear a queue flag

Parameters

unsignedintflag: flag to be cleared
structrequest_queue*q: request queue

constchar*blk_op_str(enumreq_opop)¶: Return string XXX in the REQ_OP_XXX.

Parameters

enumreq_opop: REQ_OP_XXX.

Description

Centralize block layer function to convert REQ_OP_XXX intostring format. Useful in the debugging and tracing bio or request. Forinvalid REQ_OP_XXX it returns string “UNKNOWN”.

voidblk_sync_queue(structrequest_queue*q)¶: cancel any pending callbacks on a queue

Parameters

structrequest_queue*q: the queue

Description

The block layer may perform asynchronous callback activityon a queue, such as calling the unplug function after a timeout.A block device may call blk_sync_queue to ensure that anysuch activity is cancelled, thus allowing it to release resourcesthat the callbacks might use. The caller must already have made surethat its ->submit_bio will not re-add plugging prior to callingthis function.
This function does not cancel any asynchronous activity arisingout of elevator or throttling code. That would requireelevator_exit()andblkcg_exit_queue() to be called with queue lock initialized.

voidblk_set_pm_only(structrequest_queue*q)¶: increment pm_only counter

Parameters

structrequest_queue*q: request queue pointer

voidblk_put_queue(structrequest_queue*q)¶: decrement the request_queue refcount

Parameters

structrequest_queue*q: the request_queue structure to decrement the refcount for

Description

Decrements the refcount of the request_queue and free it when the refcountreaches 0.

boolblk_get_queue(structrequest_queue*q)¶: increment the request_queue refcount

Parameters

structrequest_queue*q: the request_queue structure to increment the refcount for

Description

Increment the refcount of the request_queue kobject.

Context

Any context.

voidsubmit_bio_noacct(structbio*bio)¶: re-submit a bio to the block device layer for I/O

Parameters

structbio*bio: The bio describing the location in memory and on the device.

Description

This is a version ofsubmit_bio() that shall only be used for I/O that isresubmitted to lower level drivers by stacking block drivers. All filesystems and other upper level users of the block layer should usesubmit_bio() instead.

voidsubmit_bio(structbio*bio)¶: submit a bio to the block device layer for I/O

Parameters

structbio*bio: Thestructbio which describes the I/O

Description

submit_bio() is used to submit I/O requests to block devices. It is passed afully set upstructbio that describes the I/O that needs to be done. Thebio will be sent to the device described by the bi_bdev field.

The success/failure status of the request, along with notification ofcompletion, is delivered asynchronously through the ->bi_end_io() callbackinbio. The bio must NOT be touched by the caller until ->bi_end_io() hasbeen called.

intbio_poll(structbio*bio,structio_comp_batch*iob,unsignedintflags)¶: poll for BIO completions

Parameters

structbio*bio: bio to poll for
structio_comp_batch*iob: batches of IO
unsignedintflags: BLK_POLL_* flags that control the behavior

Description

Poll for completions on queue associated with the bio. Returns number ofcompleted entries found.

Note

the caller must either be the context that submittedbio, orbe in a RCU critical section to prevent freeing ofbio.

unsignedlongbio_start_io_acct(structbio*bio)¶: start I/O accounting for bio based drivers

Parameters

structbio*bio: bio to start account for

Description

Returns the start time that should be passed back tobio_end_io_acct().

intblk_lld_busy(structrequest_queue*q)¶: Check if underlying low-level drivers of a device are busy

Parameters

structrequest_queue*q: the queue of the device being checked

Description

Check if underlying low-level drivers of a device are busy.If the drivers want to export their busy state, they must set ownexporting function usingblk_queue_lld_busy() first.
Basically, this function is used only by request stacking driversto stop dispatching requests to underlying devices when underlyingdevices are busy. This behavior helps more I/O merging on the queueof the request stacking driver and prevents I/O throughput regressionon burst I/O load.

Return

0 - Not busy (The request stacking driver should dispatch request)1 - Busy (The request stacking driver should stop dispatching request)

voidblk_start_plug(structblk_plug*plug)¶: initialize blk_plug and track it inside the task_struct

Parameters

structblk_plug*plug: Thestructblk_plug that needs to be initialized

Description

blk_start_plug() indicates to the block layer an intent by the callerto submit multiple I/O requests in a batch. The block layer may usethis hint to defer submitting I/Os from the caller untilblk_finish_plug()is called. However, the block layer may choose to submit requestsbefore a call toblk_finish_plug() if the number of queued I/OsexceedsBLK_MAX_REQUEST_COUNT, or if the size of the I/O is larger thanBLK_PLUG_FLUSH_SIZE. The queued I/Os may also be submitted early ifthe task schedules (see below).
Tracking blk_plug inside the task_struct will help with auto-flushing thepending I/O should the task end up blocking betweenblk_start_plug() andblk_finish_plug(). This is important from a performance perspective, butalso ensures that we don’t deadlock. For instance, if the task is blockingfor a memory allocation, memory reclaim could end up wanting to free apage belonging to that request that is currently residing in our privateplug. By flushing the pending I/O when the process goes to sleep, we avoidthis kind of deadlock.

voidblk_finish_plug(structblk_plug*plug)¶: mark the end of a batch of submitted I/O

Parameters

structblk_plug*plug: Thestructblk_plug passed toblk_start_plug()

Description

Indicate that a batch of I/O submissions is complete. This functionmust be paired with an initial call toblk_start_plug(). The intentis to allow the block layer to optimize I/O submission. See thedocumentation forblk_start_plug() for more information.

intblk_queue_enter(structrequest_queue*q,blk_mq_req_flags_tflags)¶: try to increase q->q_usage_counter

Parameters

structrequest_queue*q: request queue pointer
blk_mq_req_flags_tflags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PM

intblk_rq_map_user_iov(structrequest_queue*q,structrequest*rq,structrq_map_data*map_data,conststructiov_iter*iter,gfp_tgfp_mask)¶: map user data to a request, for passthrough requests

Parameters

structrequest_queue*q: request queue where request should be inserted
structrequest*rq: request to map data to
structrq_map_data*map_data: pointer to the rq_map_data holding pages (if necessary)
conststructiov_iter*iter: iovec iterator
gfp_tgfp_mask: memory allocation flags

Description

Data will be mapped directly for zero copy I/O, if possible. Otherwisea kernel bounce buffer is used.
A matchingblk_rq_unmap_user() must be issued at the end of I/O, whilestill in process context.

intblk_rq_unmap_user(structbio*bio)¶: unmap a request with user data

Parameters

structbio*bio: start of bio list

Description

Unmap a rq previously mapped byblk_rq_map_user(). The caller mustsupply the original rq->bio from theblk_rq_map_user() return, sincethe I/O completion may have changed rq->bio.

intblk_rq_map_kern(structrequest*rq,void*kbuf,unsignedintlen,gfp_tgfp_mask)¶: map kernel data to a request, for passthrough requests

Parameters

structrequest*rq: request to fill
void*kbuf: the kernel buffer
unsignedintlen: length of user data
gfp_tgfp_mask: memory allocation flags

Description

Data will be mapped directly if possible. Otherwise a bouncebuffer is used. Can be called multiple times to append multiplebuffers.

intblk_register_queue(structgendisk*disk)¶: register a block layer queue with sysfs

Parameters

structgendisk*disk: Disk of which the request queue should be registered with sysfs.

voidblk_unregister_queue(structgendisk*disk)¶: counterpart ofblk_register_queue()

Parameters

structgendisk*disk: Disk of which the request queue should be unregistered from sysfs.

Note

the caller is responsible for guaranteeing that this function is calledafterblk_register_queue() has finished.

voidblk_set_stacking_limits(structqueue_limits*lim)¶: set default limits for stacking devices

Parameters

structqueue_limits*lim: the queue_limits structure to reset

Description

Prepare queue limits for applying limits from underlying devices usingblk_stack_limits().

intqueue_limits_commit_update(structrequest_queue*q,structqueue_limits*lim)¶: commit an atomic update of queue limits

Parameters

structrequest_queue*q: queue to update
structqueue_limits*lim: limits to apply

Description

Apply the limits inlim that were obtained fromqueue_limits_start_update()and updated by the caller toq. The caller must have frozen the queue orensure that there are no outstanding I/Os by other means.

Returns 0 if successful, else a negative error code.

intqueue_limits_commit_update_frozen(structrequest_queue*q,structqueue_limits*lim)¶: commit an atomic update of queue limits

Parameters

structrequest_queue*q: queue to update
structqueue_limits*lim: limits to apply

Description

Apply the limits inlim that were obtained fromqueue_limits_start_update()and updated with the new values by the caller toq. Freezes the queuebefore the update and unfreezes it after.

Returns 0 if successful, else a negative error code.

intqueue_limits_set(structrequest_queue*q,structqueue_limits*lim)¶: apply queue limits to queue

Parameters

structrequest_queue*q: queue to update
structqueue_limits*lim: limits to apply

Description

Apply the limits inlim that were freshly initialized toq.To update existing limits usequeue_limits_start_update() andqueue_limits_commit_update() instead.

Returns 0 if successful, else a negative error code.

intblk_stack_limits(structqueue_limits*t,structqueue_limits*b,sector_tstart)¶: adjust queue_limits for stacked devices

Parameters

structqueue_limits*t: the stacking driver limits (top device)
structqueue_limits*b: the underlying queue limits (bottom, component device)
sector_tstart: first data sector within component device

Description

This function is used by stacking drivers like MD and DM to ensurethat all component devices have compatible block sizes andalignments. The stacking driver must provide a queue_limitsstruct (top) and then iteratively call the stacking function forall component (bottom) devices. The stacking function willattempt to combine the values and ensure proper alignment.
Returns 0 if the top and bottom queue_limits are compatible. Thetop device’s block sizes and alignment offsets may be adjusted toensure alignment with the bottom device. If no compatible sizesand alignments exist, -1 is returned and the resulting topqueue_limits will have the misaligned flag set to indicate thatthe alignment_offset is undefined.

voidqueue_limits_stack_bdev(structqueue_limits*t,structblock_device*bdev,sector_toffset,constchar*pfx)¶: adjust queue_limits for stacked devices

Parameters

structqueue_limits*t: the stacking driver limits (top device)
structblock_device*bdev: the underlying block device (bottom)
sector_toffset: offset to beginning of data within component device
constchar*pfx: prefix to use for warnings logged

Description

This function is used by stacking drivers like MD and DM to ensurethat all component devices have compatible block sizes andalignments. The stacking driver must provide a queue_limitsstruct (top) and then iteratively call the stacking function forall component (bottom) devices. The stacking function willattempt to combine the values and ensure proper alignment.

boolqueue_limits_stack_integrity(structqueue_limits*t,structqueue_limits*b)¶: stack integrity profile

Parameters

structqueue_limits*t: target queue limits
structqueue_limits*b: base queue limits

Description

Check if the integrity profile in theb can be stacked into thetargett. Stacking is possible if either:

does not have any integrity information stacked into it yet
the integrity profile inb is identical to the one int

Ifb can be stacked intot, returntrue. Else returnfalse and clear theintegrity information int.

voidblk_set_queue_depth(structrequest_queue*q,unsignedintdepth)¶: tell the block layer about the device queue depth

Parameters

structrequest_queue*q: the request queue for the device
unsignedintdepth: queue depth

intblkdev_issue_flush(structblock_device*bdev)¶: queue a flush

Parameters

structblock_device*bdev: blockdev to issue flush for

Description

Issue a flush for the block device in question.

intblkdev_issue_discard(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask)¶: queue a discard

Parameters

structblock_device*bdev: blockdev to issue discard for
sector_tsector: start sector
sector_tnr_sects: number of sectors to discard
gfp_tgfp_mask: memory allocation flags (for bio_alloc)

Description

Issue a discard request for the sectors in question.

int__blkdev_issue_zeroout(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask,structbio**biop,unsignedflags)¶: generate number of zero filed write bios

Parameters

structblock_device*bdev: blockdev to issue
sector_tsector: start sector
sector_tnr_sects: number of sectors to write
gfp_tgfp_mask: memory allocation flags (for bio_alloc)
structbio**biop: pointer to anchor bio
unsignedflags: controls detailed behavior

Description

Zero-fill a block range, either using hardware offload or by explicitlywriting zeroes to the device.
If a device is using logical block provisioning, the underlying space willnot be released ifflags contains BLKDEV_ZERO_NOUNMAP.
Ifflags contains BLKDEV_ZERO_NOFALLBACK, the function will return-EOPNOTSUPP if no explicit hardware offload for zeroing is provided.

intblkdev_issue_zeroout(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask,unsignedflags)¶: zero-fill a block range

Parameters

structblock_device*bdev: blockdev to write
sector_tsector: start sector
sector_tnr_sects: number of sectors to write
gfp_tgfp_mask: memory allocation flags (for bio_alloc)
unsignedflags: controls detailed behavior

Description

Zero-fill a block range, either using hardware offload or by explicitlywriting zeroes to the device. See__blkdev_issue_zeroout() for thevalid values forflags.

intblk_trace_ioctl(structblock_device*bdev,unsignedcmd,char__user*arg)¶: handle the ioctls associated with tracing

Parameters

structblock_device*bdev: the block device
unsignedcmd: the ioctl cmd
char__user*arg: the argument data, if any

voidblk_trace_shutdown(structrequest_queue*q)¶: stop and cleanup trace structures

Parameters

structrequest_queue*q: the request queue associated with the device

voidblk_add_trace_rq(structrequest*rq,blk_status_terror,unsignedintnr_bytes,u64what,u64cgid)¶: Add a trace for a request oriented action

Parameters

structrequest*rq: the source request
blk_status_terror: return status to log
unsignedintnr_bytes: number of completed bytes
u64what: the action
u64cgid: the cgroup info

Description

Records an action against a request. Will log the bio offset + size.

voidblk_add_trace_bio(structrequest_queue*q,structbio*bio,u64what,interror)¶: Add a trace for a bio oriented action

Parameters

structrequest_queue*q: queue the io is for
structbio*bio: the source bio
u64what: the action
interror: error, if any

Description

Records an action against a bio. Will log the bio offset + size.

voidblk_add_trace_bio_remap(void*ignore,structbio*bio,dev_tdev,sector_tfrom)¶: Add a trace for a bio-remap operation

Parameters

void*ignore: trace callback data parameter (not used)
structbio*bio: the source bio
dev_tdev: source device
sector_tfrom: source sector

Description

Called after a bio is remapped to a different device and/or sector.

voidblk_add_trace_rq_remap(void*ignore,structrequest*rq,dev_tdev,sector_tfrom)¶: Add a trace for a request-remap operation

Parameters

void*ignore: trace callback data parameter (not used)
structrequest*rq: the source request
dev_tdev: target device
sector_tfrom: source sector

Description

Device mapper remaps request to other devices.Add a trace for that action.

voiddisk_release(structdevice*dev)¶: releases all allocated resources of the gendisk

Parameters

structdevice*dev: the device representing this disk

Description

This function releases all allocated resources of the gendisk.

Drivers which useddevice_add_disk() have a gendisk with a request_queueassigned. Since the request_queue sits on top of the gendisk for thesedrivers we also callblk_put_queue() for them, and we expect therequest_queue refcount to reach 0 at this point, and so the request_queuewill also be freed prior to the disk.

Context

can sleep

unsignedintbdev_count_inflight(structblock_device*part)¶: get the number of inflight IOs for a block device.

Parameters

structblock_device*part: the block device.

Description

Inflight here means started IO accounting, frombdev_start_io_acct() forbio-based block device, and fromblk_account_io_start() for rq-based blockdevice.

int__register_blkdev(unsignedintmajor,constchar*name,void(*probe)(dev_tdevt))¶: register a new block device

Parameters

unsignedintmajor: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. Ifmajor = 0, try to allocate any unused major number.
constchar*name: the name of the new block device as a zero terminated string
void(*probe)(dev_tdevt): pre-devtmpfs / pre-udev callback used to create disks when theirpre-created device node is accessed. When a probe call usesadd_disk() and it fails the driver must cleanup resources. Thisinterface may soon be removed.

Description

Thename must be unique within the system.

The return value depends on themajor input parameter:

if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1]then the function returns zero on success, or a negative error code
if any unused major number was requested withmajor = 0 parameterthen the return value is the allocated major number in range[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise

SeeLinux allocated devices (4.x+ version) for the list of allocatedmajor numbers.

Use register_blkdev instead for any new code.

intadd_disk_fwnode(structdevice*parent,structgendisk*disk,conststructattribute_group**groups,structfwnode_handle*fwnode)¶: add disk information to kernel list with fwnode

Parameters

structdevice*parent: parent device for the disk
structgendisk*disk: per-device partitioning information
conststructattribute_group**groups: Additional per-device sysfs groups
structfwnode_handle*fwnode: attached disk fwnode

Description

This function registers the partitioning information indiskwith the kernel. Also attach a fwnode to the disk device.

intdevice_add_disk(structdevice*parent,structgendisk*disk,conststructattribute_group**groups)¶: add disk information to kernel list

Parameters

structdevice*parent: parent device for the disk
structgendisk*disk: per-device partitioning information
conststructattribute_group**groups: Additional per-device sysfs groups

Description

This function registers the partitioning information indiskwith the kernel.

voidblk_mark_disk_dead(structgendisk*disk)¶: mark a disk as dead

Parameters

structgendisk*disk: disk to mark as dead

Description

Mark as disk as dead (e.g. surprise removed) and don’t accept any new I/Oto this disk.

voiddel_gendisk(structgendisk*disk)¶: remove the gendisk

Parameters

structgendisk*disk: thestructgendisk to remove

Description

Removes the gendisk and all its associated resources. This deletes thepartitions associated with the gendisk, and unregisters the associatedrequest_queue.

This is the counter to the respectivedevice_add_disk() call.

The final removal of thestructgendisk happens when its refcount reaches 0withput_disk(), which should be called afterdel_gendisk(), ifdevice_add_disk() was used.

Drivers exist which depend on the release of the gendisk to be synchronous,it should not be deferred.

Context

can sleep

voidinvalidate_disk(structgendisk*disk)¶: invalidate the disk

Parameters

structgendisk*disk: thestructgendisk to invalidate

Description

A helper to invalidates the disk. It will clean the disk’s associatedbuffer/page caches and reset its internal states so that the diskcan be reused by the drivers.

Context

can sleep

voidput_disk(structgendisk*disk)¶: decrements the gendisk refcount

Parameters

structgendisk*disk: thestructgendisk to decrement the refcount for

Description

This decrements the refcount for thestructgendisk. When this reaches 0we’ll havedisk_release() called.

Note

for blk-mq disk put_disk must be called before freeing the tag_setwhen handling probe errors (that is beforeadd_disk() is called).

Context

Any context, but the last reference must not be dropped fromatomic context.

voidset_disk_ro(structgendisk*disk,boolread_only)¶: set a gendisk read-only

Parameters

structgendisk*disk: gendisk to operate on
boolread_only: true to set the disk read-only,false set the disk read/write

Description

This function is used to indicate whether a given disk device should have itsread-only flag set.set_disk_ro() is typically used by device drivers toindicate whether the underlying physical device is write-protected.

intbdev_validate_blocksize(structblock_device*bdev,intblock_size)¶: check that this block size is acceptable

Parameters

structblock_device*bdev: blockdevice to check
intblock_size: block size to check

Description

For block device users that do not use buffer heads or the block devicepage cache, make sure that this block size can be used with the device.

Return

On success zero is returned, negative error code on failure.

intbdev_freeze(structblock_device*bdev)¶: lock a filesystem and force it into a consistent state

Parameters

structblock_device*bdev: blockdevice to lock

Description

If a superblock is found on this device, we take the s_umount semaphoreon it to make sure nobody unmounts until the snapshot creation is done.The reference counter (bd_fsfreeze_count) guarantees that only the lastunfreeze process can unfreeze the frozen filesystem actually when multiplefreeze requests arrive simultaneously. It counts up inbdev_freeze() andcount down inbdev_thaw(). When it becomes 0,thaw_bdev() will unfreezeactually.

Return

On success zero is returned, negative error code on failure.

intbdev_thaw(structblock_device*bdev)¶: unlock filesystem

Parameters

structblock_device*bdev: blockdevice to unlock

Description

Unlocks the filesystem and marks it writeable again afterbdev_freeze().

Return

On success zero is returned, negative error code on failure.

intbd_prepare_to_claim(structblock_device*bdev,void*holder,conststructblk_holder_ops*hops)¶: claim a block device

Parameters

structblock_device*bdev: block device of interest
void*holder: holder trying to claimbdev
conststructblk_holder_ops*hops: holder ops.

Description

Claimbdev. This function fails ifbdev is already claimed by anotherholder and waits if another claiming is in progress. return, the callerhas ownership of bd_claiming and bd_holder[s].

Return

0 ifbdev can be claimed, -EBUSY otherwise.

voidbd_abort_claiming(structblock_device*bdev,void*holder)¶: abort claiming of a block device

Parameters

structblock_device*bdev: block device of interest
void*holder: holder that has claimedbdev

Description

Abort claiming of a block device when the exclusive open failed. This can bealso used when exclusive open is not actually desired and we just neededto block other exclusive openers for a while.

voidbdev_fput(structfile*bdev_file)¶: yield claim to the block device and put the file

Parameters

structfile*bdev_file: open block device

Description

Yield claim on the block device and put the file. Ensure that theblock device can be reclaimed before the file is closed which is adeferred operation.

intlookup_bdev(constchar*pathname,dev_t*dev)¶: Look up astructblock_device by name.

Parameters

constchar*pathname: Name of the block device in the filesystem.
dev_t*dev: Pointer to the block device’s dev_t, if found.

Description

Lookup the block device’s dev_t atpathname in the currentnamespace if possible and return it indev.

Context

May sleep.

Return

0 if succeeded, negative errno otherwise.

voidbdev_mark_dead(structblock_device*bdev,boolsurprise)¶: mark a block device as dead

Parameters

structblock_device*bdev: block device to operate on
boolsurprise: indicate a surprise removal

Description

Tell the file system that this devices or media is dead. Ifsurprise is settotrue the device or media is already gone, if not we are preparing for anorderly removal.

This calls into the file system, which then typicall syncs out all dirty dataand writes back inodes and then invalidates any cached data in the inodes onthe file system. In addition we also invalidate the block device mapping.

Char devices¶

intregister_chrdev_region(dev_tfrom,unsignedcount,constchar*name)¶: register a range of device numbers

Parameters

dev_tfrom: the first in the desired range of device numbers; must includethe major number.
unsignedcount: the number of consecutive device numbers required
constchar*name: the name of the device or driver.

Description

Return value is zero on success, a negative error code on failure.

intalloc_chrdev_region(dev_t*dev,unsignedbaseminor,unsignedcount,constchar*name)¶: register a range of char device numbers

Parameters

dev_t*dev: output parameter for first assigned number
unsignedbaseminor: first of the requested range of minor numbers
unsignedcount: the number of minor numbers required
constchar*name: the name of the associated device or driver

Description

Allocates a range of char device numbers. The major number will bechosen dynamically, and returned (along with the first minor number)indev. Returns zero or a negative error code.

int__register_chrdev(unsignedintmajor,unsignedintbaseminor,unsignedintcount,constchar*name,conststructfile_operations*fops)¶: create and register a cdev occupying a range of minors

Parameters

unsignedintmajor: major device number or 0 for dynamic allocation
unsignedintbaseminor: first of the requested range of minor numbers
unsignedintcount: the number of minor numbers required
constchar*name: name of this range of devices
conststructfile_operations*fops: file operations associated with this devices

Description

Ifmajor == 0 this functions will dynamically allocate a major and returnits number.

Ifmajor > 0 this function will attempt to reserve a device with the givenmajor number and will return zero on success.

Returns a -ve errno on failure.

The name of this device has nothing to do with the name of the device in/dev. It only helps to keep track of the different owners of devices. Ifyour module name has only one type of devices it’s ok to use e.g. the nameof the module here.

voidunregister_chrdev_region(dev_tfrom,unsignedcount)¶: unregister a range of device numbers

Parameters

dev_tfrom: the first in the range of numbers to unregister
unsignedcount: the number of device numbers to unregister

Description

This function will unregister a range ofcount device numbers,starting withfrom. The caller should normally be the one whoallocated those numbers in the first place...

void__unregister_chrdev(unsignedintmajor,unsignedintbaseminor,unsignedintcount,constchar*name)¶: unregister and destroy a cdev

Parameters

unsignedintmajor: major device number
unsignedintbaseminor: first of the range of minor numbers
unsignedintcount: the number of minor numbers this cdev is occupying
constchar*name: name of this range of devices

Description

Unregister and destroy the cdev occupying the region described bymajor,baseminor andcount. This function undoes what__register_chrdev() did.

intcdev_add(structcdev*p,dev_tdev,unsignedcount)¶: add a char device to the system

Parameters

structcdev*p: the cdev structure for the device
dev_tdev: the first device number for which this device is responsible
unsignedcount: the number of consecutive minor numbers corresponding to thisdevice

Description

cdev_add() adds the device represented byp to the system, making itlive immediately. A negative error code is returned on failure.

voidcdev_set_parent(structcdev*p,structkobject*kobj)¶: set the parent kobject for a char device

Parameters

structcdev*p: the cdev structure
structkobject*kobj: the kobject to take a reference to

Description

cdev_set_parent() sets a parent kobject which will be referencedappropriately so the parent is not freed before the cdev. Thisshould be called before cdev_add.

intcdev_device_add(structcdev*cdev,structdevice*dev)¶: add a char device and it’s correspondingstructdevice, linkink

Parameters

structcdev*cdev: the cdev structure
structdevice*dev: the device structure

Description

cdev_device_add() adds the char device represented bycdev to the system,just as cdev_add does. It then addsdev to the system using device_addThe dev_t for the char device will be taken from thestructdevice whichneeds to be initialized first. This helper function correctly takes areference to the parent device so the parent will not get released untilall references to the cdev are released.

This helper uses dev->devt for the device number. If it is not setit will not add the cdev and it will be equivalent to device_add.

This function should be used whenever thestructcdev and thestructdevice are members of the same structure whose lifetime ismanaged by thestructdevice.

NOTE

Callers must assume that userspace was able to open the cdev andcan call cdev fops callbacks at any time, even if this function fails.

voidcdev_device_del(structcdev*cdev,structdevice*dev)¶: inverse of cdev_device_add

Parameters

structcdev*cdev: the cdev structure
structdevice*dev: the device structure

Description

cdev_device_del() is a helper function to call cdev_del and device_del.It should be used whenever cdev_device_add is used.

If dev->devt is not set it will not remove the cdev and will be equivalentto device_del.

NOTE

This guarantees that associated sysfs callbacks are not runningor runnable, however any cdevs already open will remain and their fopswill still be callable even after this function returns.

voidcdev_del(structcdev*p)¶: remove a cdev from the system

Parameters

structcdev*p: the cdev structure to be removed

Description

cdev_del() removesp from the system, possibly freeing the structureitself.

NOTE

This guarantees that cdev device will no longer be able to beopened, however any cdevs already open will remain and their fops willstill be callable even after cdev_del returns.

structcdev*cdev_alloc(void)¶: allocate a cdev structure

Parameters

void: no arguments

Description

Allocates and returns a cdev structure, or NULL on failure.

voidcdev_init(structcdev*cdev,conststructfile_operations*fops)¶: initialize a cdev structure

Parameters

structcdev*cdev: the structure to initialize
conststructfile_operations*fops: the file_operations for this device

Description

Initializescdev, rememberingfops, making it ready to add to thesystem withcdev_add().

Clock Framework¶

The clock framework defines programming interfaces to support softwaremanagement of the system clock tree. This framework is widely used withSystem-On-Chip (SOC) platforms to support power management and variousdevices which may need custom clock rates. Note that these “clocks”don’t relate to timekeeping or real time clocks (RTCs), each of whichhave separate frameworks. Thesestructclkinstances may be used to manage for example a 96 MHz signal that is usedto shift bits into and out of peripherals or busses, or otherwisetrigger synchronous state machine transitions in system hardware.

Power management is supported by explicit software clock gating: unusedclocks are disabled, so the system doesn’t waste power changing thestate of transistors that aren’t in active use. On some systems this maybe backed by hardware clock gating, where clocks are gated without beingdisabled in software. Sections of chips that are powered but not clockedmay be able to retain their last state. This low power state is oftencalled aretention mode. This mode still incurs leakage currents,especially with finer circuit geometries, but for CMOS circuits power ismostly used by clocked state changes.

Power-aware drivers only enable their clocks when the device they manageis in active use. Also, system sleep states often differ according towhich clock domains are active: while a “standby” state may allow wakeupfrom several active domains, a “mem” (suspend-to-RAM) state may requirea more wholesale shutdown of clocks derived from higher speed PLLs andoscillators, limiting the number of possible wakeup event sources. Adriver’s suspend method may need to be aware of system-specific clockconstraints on the target sleep state.

Some platforms support programmable clock generators. These can be usedby external chips of various kinds, such as other CPUs, multimediacodecs, and devices with strict requirements for interface clocking.

structclk_notifier¶: associate a clk with a notifier

Definition:

struct clk_notifier {    struct clk                      *clk;    struct srcu_notifier_head       notifier_head;    struct list_head                node;};

Members

clk: structclk * to associate the notifier with
notifier_head: a blocking_notifier_head for this clk
node: linked list pointers

Description

A list ofstructclk_notifier is maintained by the notifier code.An entry is created whenever code registers the first notifier on aparticularclk. Future notifiers on thatclk are added to thenotifier_head.

structclk_notifier_data¶: rate data to pass to the notifier callback

Definition:

struct clk_notifier_data {    struct clk              *clk;    unsigned long           old_rate;    unsigned long           new_rate;};

Members

clk: structclk * being changed
old_rate: previous rate of this clk
new_rate: new rate of this clk

Description

For a pre-notifier, old_rate is the clk’s rate before this ratechange, and new_rate is what the rate will be in the future. For apost-notifier, old_rate and new_rate are both set to the clk’scurrent rate (this was done to optimize the implementation).

structclk_bulk_data¶: Data used for bulk clk operations.

Definition:

struct clk_bulk_data {    const char              *id;    struct clk              *clk;};

Members

id: clock consumer ID
clk: structclk * to store the associated clock

Description

The CLK APIs provide a series ofclk_bulk_() API calls asa convenience to consumers which require multiple clks. Thisstructure is used to manage data for these calls.

intclk_notifier_register(structclk*clk,structnotifier_block*nb)¶: register a clock rate-change notifier callback

Parameters

structclk*clk: clock whose rate we are interested in
structnotifier_block*nb: notifier block with callback function pointer

Description

ProTip: debugging across notifier chains can be frustrating. Make sure thatyour notifier callback function prints a nice big warning in case offailure.

intclk_notifier_unregister(structclk*clk,structnotifier_block*nb)¶: unregister a clock rate-change notifier callback

Parameters

structclk*clk: clock whose rate we are no longer interested in
structnotifier_block*nb: notifier block which will be unregistered

intdevm_clk_notifier_register(structdevice*dev,structclk*clk,structnotifier_block*nb)¶: register a managed rate-change notifier callback

Parameters

structdevice*dev: device for clock “consumer”
structclk*clk: clock whose rate we are interested in
structnotifier_block*nb: notifier block with callback function pointer

Description

Returns 0 on success, -EERROR otherwise

longclk_get_accuracy(structclk*clk)¶: obtain the clock accuracy in ppb (parts per billion) for a clock source.

Parameters

structclk*clk: clock source

Description

This gets the clock source accuracy expressed in ppb.A perfect clock returns 0.

intclk_set_phase(structclk*clk,intdegrees)¶: adjust the phase shift of a clock signal

Parameters

structclk*clk: clock signal source
intdegrees: number of degrees the signal is shifted

Description

Shifts the phase of a clock signal by the specified degrees. Returns 0 onsuccess, -EERROR otherwise.

intclk_get_phase(structclk*clk)¶: return the phase shift of a clock signal

Parameters

structclk*clk: clock signal source

Description

Returns the phase shift of a clock node in degrees, otherwise returns-EERROR.

intclk_set_duty_cycle(structclk*clk,unsignedintnum,unsignedintden)¶: adjust the duty cycle ratio of a clock signal

Parameters

structclk*clk: clock signal source
unsignedintnum: numerator of the duty cycle ratio to be applied
unsignedintden: denominator of the duty cycle ratio to be applied

Description

Adjust the duty cycle of a clock signal by the specified ratio. Returns 0 onsuccess, -EERROR otherwise.

intclk_get_scaled_duty_cycle(structclk*clk,unsignedintscale)¶: return the duty cycle ratio of a clock signal

Parameters

structclk*clk: clock signal source
unsignedintscale: scaling factor to be applied to represent the ratio as an integer

Description

Returns the duty cycle ratio multiplied by the scale provided, otherwisereturns -EERROR.

boolclk_is_match(conststructclk*p,conststructclk*q)¶: check if two clk’s point to the same hardware clock

Parameters

conststructclk*p: clk compared against q
conststructclk*q: clk compared against p

Description

Returns true if the twostructclk pointers both point to the same hardwareclock node. Put differently, returns true ifp andqshare the samestructclk_core object.

Returns false otherwise. Note that two NULL clks are treated as matching.

intclk_rate_exclusive_get(structclk*clk)¶: get exclusivity over the rate control of a producer

Parameters

structclk*clk: clock source

Description

This function allows drivers to get exclusive control over the rate of aprovider. It prevents any other consumer to execute, even indirectly,opereation which could alter the rate of the provider or cause glitches

If exlusivity is claimed more than once on clock, even by the same driver,the rate effectively gets locked as exclusivity can’t be preempted.

Must not be called from within atomic context.

Returns success (0) or negative errno.

intdevm_clk_rate_exclusive_get(structdevice*dev,structclk*clk)¶: devm variant of clk_rate_exclusive_get

Parameters

structdevice*dev: device the exclusivity is bound to
structclk*clk: clock source

Description

Callsclk_rate_exclusive_get() onclk and registers a devm cleanup handlerondev to callclk_rate_exclusive_put().

Must not be called from within atomic context.

voidclk_rate_exclusive_put(structclk*clk)¶: release exclusivity over the rate control of a producer

Parameters

structclk*clk: clock source

Description

This function allows drivers to release the exclusivity it previously gotfromclk_rate_exclusive_get()

The caller must balance the number ofclk_rate_exclusive_get() andclk_rate_exclusive_put() calls.

Must not be called from within atomic context.

intclk_prepare(structclk*clk)¶: prepare a clock source

Parameters

structclk*clk: clock source

Description

This prepares the clock source for use.

Must not be called from within atomic context.

boolclk_is_enabled_when_prepared(structclk*clk)¶: indicate if preparing a clock also enables it.

Parameters

structclk*clk: clock source

Description

Returns true ifclk_prepare() implicitly enables the clock, effectivelymakingclk_enable()/clk_disable() no-ops, false otherwise.

This is of interest mainly to the power management code where actuallydisabling the clock also requires unpreparing it to have any materialeffect.

Regardless of the value returned here, the caller must always invokeclk_enable() orclk_prepare_enable() and counterparts for usage countsto be right.

voidclk_unprepare(structclk*clk)¶: undo preparation of a clock source

Parameters

structclk*clk: clock source

Description

This undoes a previously prepared clock. The caller must balancethe number of prepare and unprepare calls.

Must not be called from within atomic context.

structclk*clk_get(structdevice*dev,constchar*id)¶: lookup and obtain a reference to a clock producer.

Parameters

structdevice*dev: device for clock “consumer”
constchar*id: clock consumer ID

Description

Returns astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)

Drivers must assume that the clock source is not enabled.

clk_get should not be called from within interrupt context.

intclk_bulk_get(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶: lookup and obtain a number of references to clock producer.

Parameters

structdevice*dev: device for clock “consumer”
intnum_clks: the number of clk_bulk_data
structclk_bulk_data*clks: the clk_bulk_data table of consumer

Description

This helper function allows drivers to get several clk consumers in oneoperation. If any of the clk cannot be acquired then any clksthat were obtained will be freed before returning to the caller.

Returns 0 if all clocks specified in clk_bulk_data table are obtainedsuccessfully, or validIS_ERR() condition containing errno.The implementation usesdev andclk_bulk_data.id to determine theclock consumer, and thereby the clock producer.The clock returned is stored in eachclk_bulk_data.clk field.

Drivers must assume that the clock source is not enabled.

clk_bulk_get should not be called from within interrupt context.

intclk_bulk_get_all(structdevice*dev,structclk_bulk_data**clks)¶: lookup and obtain all available references to clock producer.

Parameters

structdevice*dev: device for clock “consumer”
structclk_bulk_data**clks: pointer to the clk_bulk_data table of consumer

Description

This helper function allows drivers to get all clk consumers in oneoperation. If any of the clk cannot be acquired then any clksthat were obtained will be freed before returning to the caller.

Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.

Drivers must assume that the clock source is not enabled.

clk_bulk_get should not be called from within interrupt context.

intclk_bulk_get_optional(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶: lookup and obtain a number of references to clock producer

Parameters

structdevice*dev: device for clock “consumer”
intnum_clks: the number of clk_bulk_data
structclk_bulk_data*clks: the clk_bulk_data table of consumer

Description

Behaves the same asclk_bulk_get() except where there is no clock producer.In this case, instead of returning -ENOENT, the function returns 0 andNULL for a clk for which a clock producer could not be determined.

intdevm_clk_bulk_get(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶: managed get multiple clk consumers

Parameters

structdevice*dev: device for clock “consumer”
intnum_clks: the number of clk_bulk_data
structclk_bulk_data*clks: the clk_bulk_data table of consumer

Description

Return 0 on success, an errno on failure.

This helper function allows drivers to get several clkconsumers in one operation with management, the clks willautomatically be freed when the device is unbound.

intdevm_clk_bulk_get_optional(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶: managed get multiple optional consumer clocks

Parameters

structdevice*dev: device for clock “consumer”
intnum_clks: the number of clk_bulk_data
structclk_bulk_data*clks: pointer to the clk_bulk_data table of consumer

Description

Behaves the same asdevm_clk_bulk_get() except where there is no clockproducer. In this case, instead of returning -ENOENT, the function returnsNULL for given clk. It is assumed all clocks in clk_bulk_data are optional.

Returns 0 if all clocks specified in clk_bulk_data table are obtainedsuccessfully or for any clk there was no clk provider available, otherwisereturns validIS_ERR() condition containing errno.The implementation usesdev andclk_bulk_data.id to determine theclock consumer, and thereby the clock producer.The clock returned is stored in eachclk_bulk_data.clk field.

Drivers must assume that the clock source is not enabled.

clk_bulk_get should not be called from within interrupt context.

intdevm_clk_bulk_get_all(structdevice*dev,structclk_bulk_data**clks)¶: managed get multiple clk consumers

Parameters

structdevice*dev: device for clock “consumer”
structclk_bulk_data**clks: pointer to the clk_bulk_data table of consumer

Description

This helper function allows drivers to get several clkconsumers in one operation with management, the clks willautomatically be freed when the device is unbound.

intdevm_clk_bulk_get_all_enabled(structdevice*dev,structclk_bulk_data**clks)¶: Get and enable all clocks of the consumer (managed)

Parameters

structdevice*dev: device for clock “consumer”
structclk_bulk_data**clks: pointer to the clk_bulk_data table of consumer

Description

This helper function allows drivers to get all clocks of theconsumer and enables them in one operation with management.The clks will automatically be disabled and freed when the deviceis unbound.

structclk*devm_clk_get(structdevice*dev,constchar*id)¶: lookup and obtain a managed reference to a clock producer.

Parameters

structdevice*dev: device for clock “consumer”
constchar*id: clock consumer ID

Context

May sleep.

Return

astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)

Description

Drivers must assume that the clock source is neither prepared norenabled.

The clock will automatically be freed when the device is unboundfrom the bus.

structclk*devm_clk_get_prepared(structdevice*dev,constchar*id)¶: devm_clk_get() +clk_prepare()

Parameters

structdevice*dev: device for clock “consumer”
constchar*id: clock consumer ID

Context

May sleep.

Return

Description

The returned clk (if valid) is prepared. Drivers must however assumethat the clock is not enabled.

The clock will automatically be unprepared and freed when the deviceis unbound from the bus.

structclk*devm_clk_get_enabled(structdevice*dev,constchar*id)¶: devm_clk_get() +clk_prepare_enable()

Parameters

structdevice*dev: device for clock “consumer”
constchar*id: clock consumer ID

Context

May sleep.

Return

Description

The returned clk (if valid) is prepared and enabled.

The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.

structclk*devm_clk_get_optional(structdevice*dev,constchar*id)¶: lookup and obtain a managed reference to an optional clock producer.

Parameters

structdevice*dev: device for clock “consumer”
constchar*id: clock consumer ID

Context

May sleep.

Return

astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get().

Description

Drivers must assume that the clock source is neither prepared norenabled.

The clock will automatically be freed when the device is unboundfrom the bus.

structclk*devm_clk_get_optional_prepared(structdevice*dev,constchar*id)¶: devm_clk_get_optional() +clk_prepare()

Parameters

structdevice*dev: device for clock “consumer”
constchar*id: clock consumer ID

Context

May sleep.

Return

astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_prepared().

Description

The returned clk (if valid) is prepared. Drivers must howeverassume that the clock is not enabled.

The clock will automatically be unprepared and freed when thedevice is unbound from the bus.

structclk*devm_clk_get_optional_enabled(structdevice*dev,constchar*id)¶: devm_clk_get_optional() +clk_prepare_enable()

Parameters

structdevice*dev: device for clock “consumer”
constchar*id: clock consumer ID

Context

May sleep.

Return

astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_enabled().

Description

The returned clk (if valid) is prepared and enabled.

The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.

structclk*devm_clk_get_optional_enabled_with_rate(structdevice*dev,constchar*id,unsignedlongrate)¶: devm_clk_get_optional() +clk_set_rate() +clk_prepare_enable()

Parameters

structdevice*dev: device for clock “consumer”
constchar*id: clock consumer ID
unsignedlongrate: new clock rate

Context

May sleep.

Return

astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_enabled().

Description

The returned clk (if valid) is prepared and enabled and rate was set.

The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.

structclk*devm_get_clk_from_child(structdevice*dev,structdevice_node*np,constchar*con_id)¶: lookup and obtain a managed reference to a clock producer from child node.

Parameters

structdevice*dev: device for clock “consumer”
structdevice_node*np: pointer to clock consumer node
constchar*con_id: clock consumer ID

Description

This function parses the clocks, and uses them to look up thestructclk from the registered list of clock providers by usingnp andcon_id

The clock will automatically be freed when the device is unboundfrom the bus.

intclk_enable(structclk*clk)¶: inform the system when the clock source should be running.

Parameters

structclk*clk: clock source

Description

If the clock can not be enabled/disabled, this should return success.

May be called from atomic contexts.

Returns success (0) or negative errno.

intclk_bulk_enable(intnum_clks,conststructclk_bulk_data*clks)¶: inform the system when the set of clks should be running.

Parameters

intnum_clks: the number of clk_bulk_data
conststructclk_bulk_data*clks: the clk_bulk_data table of consumer

Description

May be called from atomic contexts.

Returns success (0) or negative errno.

voidclk_disable(structclk*clk)¶: inform the system when the clock source is no longer required.

Parameters

structclk*clk: clock source

Description

Inform the system that a clock source is no longer required bya driver and may be shut down.

May be called from atomic contexts.

Implementation detail: if the clock source is shared betweenmultiple drivers,clk_enable() calls must be balanced by thesame number ofclk_disable() calls for the clock source to bedisabled.

voidclk_bulk_disable(intnum_clks,conststructclk_bulk_data*clks)¶: inform the system when the set of clks is no longer required.

Parameters

intnum_clks: the number of clk_bulk_data
conststructclk_bulk_data*clks: the clk_bulk_data table of consumer

Description

Inform the system that a set of clks is no longer required bya driver and may be shut down.

May be called from atomic contexts.

Implementation detail: if the set of clks is shared betweenmultiple drivers,clk_bulk_enable() calls must be balanced by thesame number ofclk_bulk_disable() calls for the clock source to bedisabled.

unsignedlongclk_get_rate(structclk*clk)¶: obtain the current clock rate (in Hz) for a clock source. This is only valid once the clock source has been enabled.

Parameters

structclk*clk: clock source

voidclk_put(structclk*clk)¶: “free” the clock source

Parameters

structclk*clk: clock source

Note

drivers must ensure that all clk_enable calls made on thisclock source are balanced by clk_disable calls prior to callingthis function.

clk_put should not be called from within interrupt context.

voidclk_bulk_put(intnum_clks,structclk_bulk_data*clks)¶: “free” the clock source

Parameters

intnum_clks: the number of clk_bulk_data
structclk_bulk_data*clks: the clk_bulk_data table of consumer

Note

drivers must ensure that all clk_bulk_enable calls made on thisclock source are balanced by clk_bulk_disable calls prior to callingthis function.

clk_bulk_put should not be called from within interrupt context.

voidclk_bulk_put_all(intnum_clks,structclk_bulk_data*clks)¶: “free” all the clock source

Parameters

intnum_clks: the number of clk_bulk_data
structclk_bulk_data*clks: the clk_bulk_data table of consumer

Note

drivers must ensure that all clk_bulk_enable calls made on thisclock source are balanced by clk_bulk_disable calls prior to callingthis function.

clk_bulk_put_all should not be called from within interrupt context.

voiddevm_clk_put(structdevice*dev,structclk*clk)¶: “free” a managed clock source

Parameters

structdevice*dev: device used to acquire the clock
structclk*clk: clock source acquired withdevm_clk_get()

Note

drivers must ensure that all clk_enable calls made on thisclock source are balanced by clk_disable calls prior to callingthis function.

clk_put should not be called from within interrupt context.

longclk_round_rate(structclk*clk,unsignedlongrate)¶: adjust a rate to the exact rate a clock can provide

Parameters

structclk*clk: clock source
unsignedlongrate: desired clock rate in Hz

Description

This answers the question “if I were to passrate toclk_set_rate(),what clock rate would I end up with?” without changing the hardwarein any way. In other words:

rate = clk_round_rate(clk, r);

and:

clk_set_rate(clk, r);rate = clk_get_rate(clk);

are equivalent except the former does not modify the clock hardwarein any way.

Returns rounded clock rate in Hz, or negative errno.

intclk_set_rate(structclk*clk,unsignedlongrate)¶: set the clock rate for a clock source

Parameters

structclk*clk: clock source
unsignedlongrate: desired clock rate in Hz

Description

Updating the rate starts at the top-most affected clock and thenwalks the tree down to the bottom-most clock that needs updating.

Returns success (0) or negative errno.

intclk_set_rate_exclusive(structclk*clk,unsignedlongrate)¶: set the clock rate and claim exclusivity over clock source

Parameters

structclk*clk: clock source
unsignedlongrate: desired clock rate in Hz

Description

This helper function allows drivers to atomically set the rate of a producerand claim exclusivity over the rate control of the producer.

It is essentially a combination ofclk_set_rate() andclk_rate_exclusite_get(). Caller must balance this call with a call toclk_rate_exclusive_put()

Returns success (0) or negative errno.

boolclk_has_parent(conststructclk*clk,conststructclk*parent)¶: check if a clock is a possible parent for another

Parameters

conststructclk*clk: clock source
conststructclk*parent: parent clock source

Description

This function can be used in drivers that need to check that a clock can bethe parent of another without actually changing the parent.

Returns true ifparent is a possible parent forclk, false otherwise.

intclk_set_rate_range(structclk*clk,unsignedlongmin,unsignedlongmax)¶: set a rate range for a clock source

Parameters

structclk*clk: clock source
unsignedlongmin: desired minimum clock rate in Hz, inclusive
unsignedlongmax: desired maximum clock rate in Hz, inclusive

Description

Returns success (0) or negative errno.

intclk_set_min_rate(structclk*clk,unsignedlongrate)¶: set a minimum clock rate for a clock source

Parameters

structclk*clk: clock source
unsignedlongrate: desired minimum clock rate in Hz, inclusive

Description

Returns success (0) or negative errno.

intclk_set_max_rate(structclk*clk,unsignedlongrate)¶: set a maximum clock rate for a clock source

Parameters

structclk*clk: clock source
unsignedlongrate: desired maximum clock rate in Hz, inclusive

Description

Returns success (0) or negative errno.

intclk_set_parent(structclk*clk,structclk*parent)¶: set the parent clock source for this clock

Parameters

structclk*clk: clock source
structclk*parent: parent clock source

Description

Returns success (0) or negative errno.

structclk*clk_get_parent(structclk*clk)¶: get the parent clock source for this clock

Parameters

structclk*clk: clock source

Description

Returnsstructclk corresponding to parent clock source, orvalidIS_ERR() condition containing errno.

structclk*clk_get_sys(constchar*dev_id,constchar*con_id)¶: get a clock based upon the device name

Parameters

constchar*dev_id: device name
constchar*con_id: connection ID

Description

Returns astructclk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev_id andcon_id to determine the clock consumer, andthereby the clock producer. In contrast toclk_get() this functiontakes the device name instead of the device itself for identification.

Drivers must assume that the clock source is not enabled.

clk_get_sys should not be called from within interrupt context.

intclk_save_context(void)¶: save clock context for poweroff

Parameters

void: no arguments

Description

Saves the context of the clock register for powerstates in which thecontents of the registers will be lost. Occurs deep within the suspendcode so locking is not necessary.

voidclk_restore_context(void)¶: restore clock context after poweroff

Parameters

void: no arguments

Description

This occurs with all clocks enabled. Occurs deep within the resume codeso locking is not necessary.

intclk_drop_range(structclk*clk)¶: Reset any range set on that clock

Parameters

structclk*clk: clock source

Description

Returns success (0) or negative errno.

structclk*clk_get_optional(structdevice*dev,constchar*id)¶: lookup and obtain a reference to an optional clock producer.

Parameters

structdevice*dev: device for clock “consumer”
constchar*id: clock consumer ID

Description

Behaves the same asclk_get() except where there is no clock producer. Inthis case, instead of returning -ENOENT, the function returns NULL.

Synchronization Primitives¶

Read-Copy Update (RCU)¶

boolsame_state_synchronize_rcu(unsignedlongoldstate1,unsignedlongoldstate2)¶: Are two old-state values identical?

Parameters

unsignedlongoldstate1: First old-state value.
unsignedlongoldstate2: Second old-state value.

Description

The two old-state values must have been obtained from eitherget_state_synchronize_rcu(),start_poll_synchronize_rcu(), orget_completed_synchronize_rcu(). Returnstrue if the two values areidentical andfalse otherwise. This allows structures whose lifetimesare tracked by old-state values to push these values to a list header,allowing those structures to be slightly smaller.

boolrcu_trace_implies_rcu_gp(void)¶: does an RCU Tasks Trace grace period imply an RCU grace period?

Parameters

void: no arguments

Description

As an accident of implementation, an RCU Tasks Trace grace period alsoacts as an RCU grace period. However, this could change at any time.Code relying on this accident must call this function to verify thatthis accident is still happening.

You have been warned!

cond_resched_tasks_rcu_qs¶

cond_resched_tasks_rcu_qs()

Report potential quiescent states to RCU
Description
This macro resemblescond_resched(), except that it is defined toreport potential quiescent states to RCU-tasks even if thecond_resched()machinery were to be shut off, as some advocate for PREEMPTION kernels.

rcu_softirq_qs_periodic¶

rcu_softirq_qs_periodic(old_ts)

Report RCU and RCU-Tasks quiescent states

Parameters

old_ts: jiffies at start of processing.

Description

This helper is for long-running softirq handlers, such as NAPI threads innetworking. The caller should initialize the variable passed in asold_tsat the beginning of the softirq handler. When invoked frequently, this macrowill invokercu_softirq_qs() every 100 milliseconds thereafter, which willprovide both RCU and RCU-Tasks quiescent states. Note that this macromodifies its old_ts argument.

Because regions of code that have disabled softirq act as RCU read-sidecritical sections, this macro should be invoked with softirq (andpreemption) enabled.

The macro is not needed when CONFIG_PREEMPT_RT is defined. RT kernels wouldhave more chance to invokeschedule() calls and provide necessary quiescentstates. As a contrast, callingcond_resched() only won’t achieve the sameeffect becausecond_resched() does not provide RCU-Tasks quiescent states.

RCU_LOCKDEP_WARN¶

RCU_LOCKDEP_WARN(c,s)

emit lockdep splat if specified condition is met

Parameters

c: condition to check
s: informative message

Description

This checksdebug_lockdep_rcu_enabled() before checking (c) toprevent early boot splats due to lockdep not yet being initialized,and rechecks it after checking (c) to prevent false-positive splatsdue to races with lockdep being disabled. Seecommit 3066820034b5dd(“rcu: RejectRCU_LOCKDEP_WARN() false positives”) for more detail.

lockdep_assert_in_rcu_read_lock¶

lockdep_assert_in_rcu_read_lock()

WARN if not protected byrcu_read_lock()
Description
Splats if lockdep is enabled and there is norcu_read_lock() in effect.

lockdep_assert_in_rcu_read_lock_bh¶

lockdep_assert_in_rcu_read_lock_bh()

WARN if not protected byrcu_read_lock_bh()
Description
Splats if lockdep is enabled and there is norcu_read_lock_bh() in effect.Note thatlocal_bh_disable() and friends do not suffice here, instead anactualrcu_read_lock_bh() is required.

lockdep_assert_in_rcu_read_lock_sched¶

lockdep_assert_in_rcu_read_lock_sched()

WARN if not protected byrcu_read_lock_sched()
Description
Splats if lockdep is enabled and there is norcu_read_lock_sched()in effect. Note thatpreempt_disable() and friends do not suffice here,instead an actualrcu_read_lock_sched() is required.

lockdep_assert_in_rcu_reader¶

lockdep_assert_in_rcu_reader()

WARN if not within some type of RCU reader
Description
Splats if lockdep is enabled and there is no RCU reader of anytype in effect. Note that regions of code protected by things likepreempt_disable,local_bh_disable(), andlocal_irq_disable() all qualifyas RCU readers.
Note that this will never trigger in PREEMPT_NONE or PREEMPT_VOLUNTARYkernels that are not also built with PREEMPT_COUNT. But if you havelockdep enabled, you might as well also enable PREEMPT_COUNT.

unrcu_pointer¶

unrcu_pointer(p)

mark a pointer as not being RCU protected

Parameters

p: pointer needing to lose its __rcu property

Description

Convertsp from an __rcu pointer to a __kernel pointer.This allows an __rcu pointer to be used withxchg() and friends.

RCU_INITIALIZER¶

RCU_INITIALIZER(v)

statically initialize an RCU-protected global variable

Parameters

v: The value to statically initialize with.

rcu_assign_pointer¶

rcu_assign_pointer(p,v)

assign to RCU-protected pointer

Parameters

p: pointer to assign to
v: value to assign (publish)

Description

Assigns the specified value to the specified RCU-protectedpointer, ensuring that any concurrent RCU readers will seeany prior initialization.

Inserts memory barriers on architectures that require them(which is most of them), and also prevents the compiler fromreordering the code that initializes the structure after the pointerassignment. More importantly, this call documents which pointerswill be dereferenced by RCU read-side code.

In some special cases, you may useRCU_INIT_POINTER() insteadofrcu_assign_pointer().RCU_INIT_POINTER() is a bit faster dueto the fact that it does not constrain either the CPU or the compiler.That said, usingRCU_INIT_POINTER() when you should have usedrcu_assign_pointer() is a very bad thing that results inimpossible-to-diagnose memory corruption. So please be careful.See theRCU_INIT_POINTER() comment header for details.

Note thatrcu_assign_pointer() evaluates each of its arguments onlyonce, appearances notwithstanding. One of the “extra” evaluationsis intypeof() and the other visible only to sparse (__CHECKER__),neither of which actually execute the argument. As with most cppmacros, this execute-arguments-only-once property is important, soplease be careful when making changes torcu_assign_pointer() and theother macros that it invokes.

rcu_replace_pointer¶

rcu_replace_pointer(rcu_ptr,ptr,c)

replace an RCU pointer, returning its old value

Parameters

rcu_ptr: RCU pointer, whose old value is returned
ptr: regular pointer
c: the lockdep conditions under which the dereference will take place

Description

Perform a replacement, wherercu_ptr is an RCU-annotatedpointer andc is the lockdep argument that is passed to thercu_dereference_protected() call used to read that pointer. The oldvalue ofrcu_ptr is returned, andrcu_ptr is set toptr.

rcu_access_pointer¶

rcu_access_pointer(p)

fetch RCU pointer with no dereferencing

Parameters

p: The pointer to read

Description

Return the value of the specified RCU-protected pointer, but omit thelockdep checks for being in an RCU read-side critical section. This isuseful when the value of this pointer is accessed, but the pointer isnot dereferenced, for example, when testing an RCU-protected pointeragainst NULL. Althoughrcu_access_pointer() may also be used in caseswhere update-side locks prevent the value of the pointer from changing,you should instead usercu_dereference_protected() for this use case.Within an RCU read-side critical section, there is little reason tousercu_access_pointer().

It is usually best to test thercu_access_pointer() return valuedirectly in order to avoid accidental dereferences being introducedby later inattentive changes. In other words, assigning thercu_access_pointer() return value to a local variable results in anaccident waiting to happen.

It is also permissible to usercu_access_pointer() when read-sideaccess to the pointer was removed at least one grace period ago, as isthe case in the context of the RCU callback that is freeing up the data,or after asynchronize_rcu() returns. This can be useful when tearingdown multi-linked structures after a grace period has elapsed. However,rcu_dereference_protected() is normally preferred for this use case.

rcu_dereference_check¶

rcu_dereference_check(p,c)

rcu_dereference with debug checking

Parameters

p: The pointer to read, prior to dereferencing
c: The conditions under which the dereference will take place

Description

Do anrcu_dereference(), but check that the conditions under which thedereference will take place are correct. Typically the conditionsindicate the various locking conditions that should be held at thatpoint. The check should return true if the conditions are satisfied.An implicit check for being in an RCU read-side critical section(rcu_read_lock()) is included.

For example:

bar = rcu_dereference_check(foo->bar, lockdep_is_held(foo->lock));

could be used to indicate to lockdep that foo->bar may only be dereferencedif eitherrcu_read_lock() is held, or that the lock required to replacethe barstructat foo->bar is held.

Note that the list of conditions may also include indications of when a lockneed not be held, for example during initialisation or destruction of thetarget struct:

bar = rcu_dereference_check(foo->bar, lockdep_is_held(foo->lock) ||
atomic_read(foo->usage) == 0);

Inserts memory barriers on architectures that require them(currently only the Alpha), prevents the compiler from refetching(and from merging fetches), and, more importantly, documents exactlywhich pointers are protected by RCU and checks that the pointer isannotated as __rcu.

rcu_dereference_bh_check¶

rcu_dereference_bh_check(p,c)

rcu_dereference_bh with debug checking

Parameters

p: The pointer to read, prior to dereferencing
c: The conditions under which the dereference will take place

Description

This is the RCU-bh counterpart torcu_dereference_check(). However,please note that starting in v5.0 kernels, vanilla RCU grace periodswait forlocal_bh_disable() regions of code in addition to regions ofcode demarked byrcu_read_lock() andrcu_read_unlock(). This meansthatsynchronize_rcu(), call_rcu, and friends all take not onlyrcu_read_lock() but alsorcu_read_lock_bh() into account.

rcu_dereference_sched_check¶

rcu_dereference_sched_check(p,c)

rcu_dereference_sched with debug checking

Parameters

p: The pointer to read, prior to dereferencing
c: The conditions under which the dereference will take place

Description

This is the RCU-sched counterpart torcu_dereference_check().However, please note that starting in v5.0 kernels, vanilla RCU graceperiods wait forpreempt_disable() regions of code in addition toregions of code demarked byrcu_read_lock() andrcu_read_unlock().This means thatsynchronize_rcu(), call_rcu, and friends all take notonlyrcu_read_lock() but alsorcu_read_lock_sched() into account.

rcu_dereference_all_check¶

rcu_dereference_all_check(p,c)

rcu_dereference_all with debug checking

Parameters

p: The pointer to read, prior to dereferencing
c: The conditions under which the dereference will take place

Description

This is similar torcu_dereference_check(), but allows protectionby all forms of vanilla RCU readers, including preemption disabled,bh-disabled, and interrupt-disabled regions of code. Note that “vanillaRCU” excludes SRCU and the various Tasks RCU flavors. Please notethat this macro should not be backported to any Linux-kernel versionpreceding v5.0 due to changes insynchronize_rcu() semantics priorto that version.

rcu_dereference_protected¶

rcu_dereference_protected(p,c)

fetch RCU pointer when updates prevented

Parameters

p: The pointer to read, prior to dereferencing
c: The conditions under which the dereference will take place

Description

Return the value of the specified RCU-protected pointer, but omittheREAD_ONCE(). This is useful in cases where update-side locksprevent the value of the pointer from changing. Please note that thisprimitive doesnot prevent the compiler from repeating this referenceor combining it with other references, so it should not be used withoutprotection of appropriate locks.

This function is only for update-side use. Using this functionwhen protected only byrcu_read_lock() will result in infrequentbut very ugly failures.

rcu_dereference¶

rcu_dereference(p)

fetch RCU-protected pointer for dereferencing

Parameters

p: The pointer to read, prior to dereferencing

Description

This is a simple wrapper aroundrcu_dereference_check().

rcu_dereference_bh¶

rcu_dereference_bh(p)

fetch an RCU-bh-protected pointer for dereferencing

Parameters

p: The pointer to read, prior to dereferencing

Description

Makesrcu_dereference_check() do the dirty work.

rcu_dereference_sched¶

rcu_dereference_sched(p)

fetch RCU-sched-protected pointer for dereferencing

Parameters

p: The pointer to read, prior to dereferencing

Description

Makesrcu_dereference_check() do the dirty work.

rcu_dereference_all¶

rcu_dereference_all(p)

fetch RCU-all-protected pointer for dereferencing

Parameters

p: The pointer to read, prior to dereferencing

Description

Makesrcu_dereference_check() do the dirty work.

rcu_pointer_handoff¶

rcu_pointer_handoff(p)

Hand off a pointer from RCU to other mechanism

Parameters

p: The pointer to hand off

Description

This is simply an identity function, but it documents where a pointeris handed off from RCU to some other synchronization mechanism, forexample, reference counting or locking. In C11, it would map tokill_dependency(). It could be used as follows:

rcu_read_lock();p = rcu_dereference(gp);long_lived = is_long_lived(p);if (long_lived) {        if (!atomic_inc_not_zero(p->refcnt))                long_lived = false;        else                p = rcu_pointer_handoff(p);}rcu_read_unlock();

voidrcu_read_lock(void)¶: mark the beginning of an RCU read-side critical section

Parameters

void: no arguments

Description

Whensynchronize_rcu() is invoked on one CPU while other CPUsare within RCU read-side critical sections, then thesynchronize_rcu() is guaranteed to block until after all the otherCPUs exit their critical sections. Similarly, ifcall_rcu() is invokedon one CPU while other CPUs are within RCU read-side criticalsections, invocation of the corresponding RCU callback is deferreduntil after the all the other CPUs exit their critical sections.

Bothsynchronize_rcu() andcall_rcu() also wait for regions of codewith preemption disabled, including regions of code with interrupts orsoftirqs disabled.

Note, however, that RCU callbacks are permitted to run concurrentlywith new RCU read-side critical sections. One way that this can happenis via the following sequence of events: (1) CPU 0 enters an RCUread-side critical section, (2) CPU 1 invokescall_rcu() to registeran RCU callback, (3) CPU 0 exits the RCU read-side critical section,(4) CPU 2 enters a RCU read-side critical section, (5) the RCUcallback is invoked. This is legal, because the RCU read-side criticalsection that was running concurrently with thecall_rcu() (and whichtherefore might be referencing something that the corresponding RCUcallback would free up) has completed before the correspondingRCU callback is invoked.

RCU read-side critical sections may be nested. Any deferred actionswill be deferred until the outermost RCU read-side critical sectioncompletes.

You can avoid reading and understanding the next paragraph byfollowing this rule: don’t put anything in anrcu_read_lock() RCUread-side critical section that would block in a !PREEMPTION kernel.But if you want the full story, read on!

In non-preemptible RCU implementations (pure TREE_RCU and TINY_RCU),it is illegal to block while in an RCU read-side critical section.In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTIONkernel builds, RCU read-side critical sections may be preempted,but explicit blocking is illegal. Finally, in preemptible RCUimplementations in real-time (with -rt patchset) kernel builds, RCUread-side critical sections may be preempted and they may also block, butonly when acquiring spinlocks that are subject to priority inheritance.

voidrcu_read_unlock(void)¶: marks the end of an RCU read-side critical section.

Parameters

void: no arguments

Description

In almost all situations,rcu_read_unlock() is immune from deadlock.This deadlock immunity also extends to the scheduler’s runqueueand priority-inheritance spinlocks, courtesy of the quiescent-statedeferral that is carried out whenrcu_read_unlock() is invoked withinterrupts disabled.

Seercu_read_lock() for more information.

voidrcu_read_lock_bh(void)¶: mark the beginning of an RCU-bh critical section

Parameters

void: no arguments

Description

This is equivalent torcu_read_lock(), but also disables softirqs.Note that anything else that disables softirqs can also serve as an RCUread-side critical section. However, please note that this equivalenceapplies only to v5.0 and later. Before v5.0,rcu_read_lock() andrcu_read_lock_bh() were unrelated.

Note thatrcu_read_lock_bh() and the matchingrcu_read_unlock_bh()must occur in the same context, for example, it is illegal to invokercu_read_unlock_bh() from one task if the matchingrcu_read_lock_bh()was invoked from some other task.

voidrcu_read_unlock_bh(void)¶: marks the end of a softirq-only RCU critical section

Parameters

void: no arguments

Description

Seercu_read_lock_bh() for more information.

voidrcu_read_lock_sched(void)¶: mark the beginning of a RCU-sched critical section

Parameters

void: no arguments

Description

This is equivalent torcu_read_lock(), but also disables preemption.Read-side critical sections can also be introduced by anything else thatdisables preemption, includinglocal_irq_disable() and friends. However,please note that the equivalence torcu_read_lock() applies only tov5.0 and later. Before v5.0,rcu_read_lock() andrcu_read_lock_sched()were unrelated.

Note thatrcu_read_lock_sched() and the matchingrcu_read_unlock_sched()must occur in the same context, for example, it is illegal to invokercu_read_unlock_sched() from process context if the matchingrcu_read_lock_sched() was invoked from an NMI handler.

voidrcu_read_unlock_sched(void)¶: marks the end of a RCU-classic critical section

Parameters

void: no arguments

Description

Seercu_read_lock_sched() for more information.

RCU_INIT_POINTER¶

RCU_INIT_POINTER(p,v)

initialize an RCU protected pointer

Parameters

p: The pointer to be initialized.
v: The value to initialized the pointer to.

Description

Initialize an RCU-protected pointer in special cases where readersdo not need ordering constraints on the CPU or the compiler. Thesespecial cases are:

This use ofRCU_INIT_POINTER() is NULLing out the pointeror
The caller has taken whatever steps are required to preventRCU readers from concurrently accessing this pointeror
The referenced data structure has already been exposed toreaders either at compile time or viarcu_assign_pointer()and
1. You have not madeany reader-visible changes tothis structure since thenor
2. It is OK for readers accessing this structure from itsnew location to see the old state of the structure. (Forexample, the changes were to statistical counters or toother state where exact synchronization is not required.)

Failure to follow these rules governing use ofRCU_INIT_POINTER() willresult in impossible-to-diagnose memory corruption. As in the structureswill look OK in crash dumps, but any concurrent RCU readers mightsee pre-initialized values of the referenced data structure. Soplease be very careful how you useRCU_INIT_POINTER()!!!

If you are creating an RCU-protected linked structure that is accessedby a single external-to-structure RCU-protected pointer, then you mayuseRCU_INIT_POINTER() to initialize the internal RCU-protectedpointers, but you must usercu_assign_pointer() to initialize theexternal-to-structure pointerafter you have completely initializedthe reader-accessible portions of the linked structure.

Note that unlikercu_assign_pointer(),RCU_INIT_POINTER() provides noordering guarantees for either the CPU or the compiler.

RCU_POINTER_INITIALIZER¶

RCU_POINTER_INITIALIZER(p,v)

statically initialize an RCU protected pointer

Parameters

p: The pointer to be initialized.
v: The value to initialized the pointer to.

Description

GCC-style initialization for an RCU-protected pointer in a structure field.

kfree_rcu¶

kfree_rcu(ptr,rhf)

kfree an object after a grace period.

Parameters

ptr: pointer to kfree for double-argument invocations.
rhf: the name of thestructrcu_head within the type ofptr.

Description

Many rcu callbacks functions just callkfree() on the base structure.These functions are trivial, but their size adds up, and furthermorewhen they are used in a kernel module, that module must invoke thehigh-latencyrcu_barrier() function at module-unload time.

Thekfree_rcu() function handles this issue. In order to have a universalcallback function handling different offsets of rcu_head, the callback needsto determine the starting address of the freed object, which can be a largekmalloc or vmalloc allocation. To allow simply aligning the pointer down topage boundary for those, only offsets up to 4095 bytes can be accommodated.If the offset is larger than 4095 bytes, a compile-time error willbe generated inkvfree_rcu_arg_2(). If this error is triggered, you caneither fall back to use ofcall_rcu() or rearrange the structure toposition the rcu_head structure into the first 4096 bytes.

The object to be freed can be allocated either bykmalloc() orkmem_cache_alloc().

Note that the allowable offset might decrease in the future.

The BUILD_BUG_ON check must not involve any function calls, hence thechecks are done in macros here.

kfree_rcu_mightsleep¶

kfree_rcu_mightsleep(ptr)

kfree an object after a grace period.

Parameters

ptr: pointer to kfree for single-argument invocations.

Description

When it comes to head-less variant, only one argumentis passed and that is just a pointer which has to befreed after a grace period. Therefore the semantic is

kfree_rcu_mightsleep(ptr);

whereptr is the pointer to be freed bykvfree().

Please note, head-less way of freeing is permitted touse from a context that has to followmight_sleep()annotation. Otherwise, please switch and embed thercu_head structure within the type ofptr.

voidrcu_head_init(structrcu_head*rhp)¶: Initialize rcu_head forrcu_head_after_call_rcu()

Parameters

structrcu_head*rhp: The rcu_head structure to initialize.

Description

If you intend to invokercu_head_after_call_rcu() to test whether agiven rcu_head structure has already been passed tocall_rcu(), thenyou must also invoke thisrcu_head_init() function on it just afterallocating that structure. Calls to this function must not race withcalls tocall_rcu(),rcu_head_after_call_rcu(), or callback invocation.

boolrcu_head_after_call_rcu(structrcu_head*rhp,rcu_callback_tf)¶: Has this rcu_head been passed tocall_rcu()?

Parameters

structrcu_head*rhp: The rcu_head structure to test.
rcu_callback_tf: The function passed tocall_rcu() along withrhp.

Description

Returnstrue if therhp has been passed tocall_rcu() withfunc,andfalse otherwise. Emits a warning in any other case, includingthe case whererhp has already been invoked after a grace period.Calls to this function must not race with callback invocation. One wayto avoid such races is to enclose the call torcu_head_after_call_rcu()in an RCU read-side critical section that includes a read-side fetchof the pointer to the structure containingrhp.

voidrcu_softirq_qs(void)¶: Provide a set of RCU quiescent states in softirq processing

Parameters

void: no arguments

Description

Mark a quiescent state for RCU, Tasks RCU, and Tasks Trace RCU.This is a special-purpose function to be used in the softirqinfrastructure and perhaps the occasional long-running softirqhandler.

Note that from RCU’s viewpoint, a call torcu_softirq_qs() isequivalent to momentarily completely enabling preemption. Forexample, given this code:

local_bh_disable();do_something();rcu_softirq_qs();  // Ado_something_else();local_bh_enable();  // B

A call tosynchronize_rcu() that began concurrently with thecall todo_something() would be guaranteed to wait only untilexecution reached statement A. Without thatrcu_softirq_qs(),that samesynchronize_rcu() would instead be guaranteed to waituntil execution reached statement B.

boolrcu_watching_snap_stopped_since(structrcu_data*rdp,intsnap)¶: Has RCU stopped watching a given CPU since the specifiedsnap?

Parameters

structrcu_data*rdp: The rcu_data corresponding to the CPU for which to check EQS.
intsnap: rcu_watching snapshot taken when the CPU wasn’t in an EQS.

Description

Returns true if the CPU corresponding tordp has spent some time in anextended quiescent state sincesnap. Note that this doesn’t check if it/still/ is in an EQS, just that it went through one sincesnap.

This is meant to be used in a loop waiting for a CPU to go through an EQS.

intrcu_is_cpu_rrupt_from_idle(void)¶: see if ‘interrupted’ from idle

Parameters

void: no arguments

Description

If the current CPU is idle and running at a first-level (not nested)interrupt, or directly, from idle, return true.

The caller must have at least disabled IRQs.

voidrcu_irq_exit_check_preempt(void)¶: Validate that scheduling is possible

Parameters

void: no arguments

void__rcu_irq_enter_check_tick(void)¶: Enable scheduler tick on CPU if RCU needs it.

Parameters

void: no arguments

Description

The scheduler tick is not normally enabled when CPUs enter the kernelfrom nohz_full userspace execution. After all, nohz_full userspaceexecution is an RCU quiescent state and the time executing in the kernelis quite short. Except of course when it isn’t. And it is not hard tocause a large system to spend tens of seconds or even minutes loopingin the kernel, which can cause a number of problems, include RCU CPUstall warnings.

Therefore, if a nohz_full CPU fails to report a quiescent statein a timely manner, the RCU grace-period kthread sets that CPU’s->rcu_urgent_qs flag with the expectation that the next interrupt orexception will invoke this function, which will turn on the schedulertick, which will enable RCU to detect that CPU’s quiescent states,for example, due tocond_resched() calls in CONFIG_PREEMPT=n kernels.The tick will be disabled once a quiescent state is reported forthis CPU.

Of course, in carefully tuned systems, there might never be aninterrupt or exception. In that case, the RCU grace-period kthreadwill eventually cause one to happen. However, in less carefullycontrolled environments, this function allows RCU to get what itneeds without creating otherwise useless interruptions.

notraceboolrcu_is_watching(void)¶: RCU read-side critical sections permitted on current CPU?

Parameters

void: no arguments

Description

Returntrue if RCU is watching the running CPU andfalse otherwise.Antrue return means that this CPU can safely enter RCU read-sidecritical sections.

Although calls torcu_is_watching() from most parts of the kernelwill returntrue, there are important exceptions. For example, if thecurrent CPU is deep within its idle loop, in kernel entry/exit code,or offline,rcu_is_watching() will returnfalse.

Make notrace because it can be called by the internal functions offtrace, and making this notrace removes unnecessary recursion calls.

voidrcu_set_gpwrap_lag(unsignedlonglag_gps)¶: Set RCU GP sequence overflow lag value.

Parameters

unsignedlonglag_gps: Set overflow lag to this many grace period worth of counterswhich is used by rcutorture to quickly force a gpwrap situation.lag_gps = 0 means we reset it back to the boot-time value.

voidcall_rcu_hurry(structrcu_head*head,rcu_callback_tfunc)¶: Queue RCU callback for invocation after grace period, and flush all lazy callbacks (including the new one) to the main ->cblist while doing so.

Parameters

structrcu_head*head: structure to be used for queueing the RCU updates.
rcu_callback_tfunc: actual callback function to be invoked after the grace period

Description

The callback function will be invoked some time after a full graceperiod elapses, in other words after all pre-existing RCU read-sidecritical sections have completed.

Use this API instead ofcall_rcu() if you don’t want the callback to bedelayed for very long periods of time, which can happen on systems withoutmemory pressure and on systems which are lightly loaded or mostly idle.This function will cause callbacks to be invoked sooner than later at theexpense of extra power. Other than that, this function is identical to, andreusescall_rcu()’s logic. Refer tocall_rcu() for more details about memoryordering and other functionality.

voidcall_rcu(structrcu_head*head,rcu_callback_tfunc)¶: Queue an RCU callback for invocation after a grace period. By default the callbacks are ‘lazy’ and are kept hidden from the main ->cblist to prevent starting of grace periods too soon. If you desire grace periods to start very soon, usecall_rcu_hurry().

Parameters

structrcu_head*head: structure to be used for queueing the RCU updates.
rcu_callback_tfunc: actual callback function to be invoked after the grace period

Description

The callback function will be invoked some time after a full graceperiod elapses, in other words after all pre-existing RCU read-sidecritical sections have completed. However, the callback functionmight well execute concurrently with RCU read-side critical sectionsthat started aftercall_rcu() was invoked.

It is perfectly legal to repost an RCU callback, potentially witha different callback function, from within its callback function.The specified function will be invoked after another full grace periodhas elapsed. This use case is similar in form to the common practiceof reposting a timer from within its own handler.

RCU read-side critical sections are delimited byrcu_read_lock()andrcu_read_unlock(), and may be nested. In addition, but only inv5.0 and later, regions of code across which interrupts, preemption,or softirqs have been disabled also serve as RCU read-side criticalsections. This includes hardware interrupt handlers, softirq handlers,and NMI handlers.

Note that all CPUs must agree that the grace period extended beyondall pre-existing RCU read-side critical section. On systems with morethan one CPU, this means that when “func()” is invoked, each CPU isguaranteed to have executed a full memory barrier since the end of itslast RCU read-side critical section whose beginning preceded the calltocall_rcu(). It also means that each CPU executing an RCU read-sidecritical section that continues beyond the start of “func()” must haveexecuted a memory barrier after thecall_rcu() but before the beginningof that RCU read-side critical section. Note that these guaranteesinclude CPUs that are offline, idle, or executing in user mode, aswell as CPUs that are executing in the kernel.

Furthermore, if CPU A invokedcall_rcu() and CPU B invoked theresulting RCU callback function “func()”, then both CPU A and CPU B areguaranteed to execute a full memory barrier during the time intervalbetween the call tocall_rcu() and the invocation of “func()” -- evenif CPU A and CPU B are the same CPU (but again only if the system hasmore than one CPU).

Implementation of these memory-ordering guarantees is described here:A Tour Through TREE_RCU’s Grace-Period Memory Ordering.

Specific tocall_rcu() (as opposed to the other call_rcu*() functions),in kernels built with CONFIG_RCU_LAZY=y,call_rcu() might delay for manyseconds before starting the grace period needed by the correspondingcallback. This delay can significantly improve energy-efficiencyon low-utilization battery-powered devices. To avoid this delay,in latency-sensitive kernel code, usecall_rcu_hurry().

voidsynchronize_rcu(void)¶: wait until a grace period has elapsed.

Parameters

void: no arguments

Description

Control will return to the caller some time after a full graceperiod has elapsed, in other words after all currently executing RCUread-side critical sections have completed. Note, however, thatupon return fromsynchronize_rcu(), the caller might well be executingconcurrently with new RCU read-side critical sections that began whilesynchronize_rcu() was waiting.

Note that this guarantee implies further memory-ordering guarantees.On systems with more than one CPU, whensynchronize_rcu() returns,each CPU is guaranteed to have executed a full memory barrier sincethe end of its last RCU read-side critical section whose beginningpreceded the call tosynchronize_rcu(). In addition, each CPU havingan RCU read-side critical section that extends beyond the return fromsynchronize_rcu() is guaranteed to have executed a full memory barrierafter the beginning ofsynchronize_rcu() and before the beginning ofthat RCU read-side critical section. Note that these guarantees includeCPUs that are offline, idle, or executing in user mode, as well as CPUsthat are executing in the kernel.

Furthermore, if CPU A invokedsynchronize_rcu(), which returnedto its caller on CPU B, then both CPU A and CPU B are guaranteedto have executed a full memory barrier during the execution ofsynchronize_rcu() -- even if CPU A and CPU B are the same CPU (butagain only if the system has more than one CPU).

Implementation of these memory-ordering guarantees is described here:A Tour Through TREE_RCU’s Grace-Period Memory Ordering.

voidget_completed_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶: Return a full pre-completed polled state cookie

Parameters

structrcu_gp_oldstate*rgosp: Place to put state cookie

Description

Stores intorgosp a value that will always be treated by functionslikepoll_state_synchronize_rcu_full() as a cookie whose grace periodhas already completed.

unsignedlongget_state_synchronize_rcu(void)¶: Snapshot current RCU state

Parameters

void: no arguments

Description

Returns a cookie that is used by a later call tocond_synchronize_rcu()orpoll_state_synchronize_rcu() to determine whether or not a fullgrace period has elapsed in the meantime.

voidget_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶: Snapshot RCU state, both normal and expedited

Parameters

structrcu_gp_oldstate*rgosp: location to place combined normal/expedited grace-period state

Description

Places the normal and expedited grace-period states inrgosp. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()orpoll_state_synchronize_rcu_full() to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.The rcu_gp_oldstate structure takes up twice the memory of an unsignedlong, but is guaranteed to see all grace periods. In contrast, thecombined state occupies less memory, but can sometimes fail to takegrace periods into account.

This does not guarantee that the needed grace period will actuallystart.

unsignedlongstart_poll_synchronize_rcu(void)¶: Snapshot and start RCU grace period

Parameters

void: no arguments

Description

Returns a cookie that is used by a later call tocond_synchronize_rcu()orpoll_state_synchronize_rcu() to determine whether or not a fullgrace period has elapsed in the meantime. If the needed grace periodis not already slated to start, notifies RCU core of the need for thatgrace period.

voidstart_poll_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶: Take a full snapshot and start RCU grace period

Parameters

structrcu_gp_oldstate*rgosp: value fromget_state_synchronize_rcu_full() orstart_poll_synchronize_rcu_full()

Description

Places the normal and expedited grace-period states in*rgos. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()orpoll_state_synchronize_rcu_full() to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.If the needed grace period is not already slated to start, notifiesRCU core of the need for that grace period.

boolpoll_state_synchronize_rcu(unsignedlongoldstate)¶: Has the specified RCU grace period completed?

Parameters

unsignedlongoldstate: value fromget_state_synchronize_rcu() orstart_poll_synchronize_rcu()

Description

If a full RCU grace period has elapsed since the earlier call fromwhicholdstate was obtained, returntrue, otherwise returnfalse.Iffalse is returned, it is the caller’s responsibility to invoke thisfunction later on until it does returntrue. Alternatively, the callercan explicitly wait for a grace period, for example, by passingoldstateto eithercond_synchronize_rcu() orcond_synchronize_rcu_expedited()on the one hand or by directly invoking eithersynchronize_rcu() orsynchronize_rcu_expedited() on the other.

Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than a billion grace periods (and way more on a 64-bit system!).Those needing to keep old state values for very long time periods(many hours even on 32-bit systems) should check them occasionally andeither refresh them or set a flag indicating that the grace period hascompleted. Alternatively, they can useget_completed_synchronize_rcu()to get a guaranteed-completed grace-period state.

In addition, because oldstate compresses the grace-period state forboth normal and expedited grace periods into a single unsigned long,it can miss a grace period whensynchronize_rcu() runs concurrentlywithsynchronize_rcu_expedited(). If this is unacceptable, pleaseinstead use the_full() variant of these polling APIs.

This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedoldstate, and that returned at the endof this function.

boolpoll_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶: Has the specified RCU grace period completed?

Parameters

structrcu_gp_oldstate*rgosp: value fromget_state_synchronize_rcu_full() orstart_poll_synchronize_rcu_full()

Description

If a full RCU grace period has elapsed since the earlier call fromwhichrgosp was obtained, return **true*, otherwise returnfalse.Iffalse is returned, it is the caller’s responsibility to invoke thisfunction later on until it does returntrue. Alternatively, the callercan explicitly wait for a grace period, for example, by passingrgosptocond_synchronize_rcu() or by directly invokingsynchronize_rcu().

Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waitedfor more than a billion grace periods (and way more on a 64-bitsystem!). Those needing to keep rcu_gp_oldstate values for verylong time periods (many hours even on 32-bit systems) should checkthem occasionally and either refresh them or set a flag indicatingthat the grace period has completed. Alternatively, they can useget_completed_synchronize_rcu_full() to get a guaranteed-completedgrace-period state.

This function provides the same memory-ordering guarantees that wouldbe provided by asynchronize_rcu() that was invoked at the call tothe function that providedrgosp, and that returned at the end of thisfunction. And this guarantee requires that the root rcu_node structure’s->gp_seq field be checked instead of that of the rcu_state structure.The problem is that the just-ending grace-period’s callbacks can beinvoked between the time that the root rcu_node structure’s ->gp_seqfield is updated and the time that the rcu_state structure’s ->gp_seqfield is updated. Therefore, if a singlesynchronize_rcu() is tocause a subsequentpoll_state_synchronize_rcu_full() to returntrue,then the root rcu_node structure is the one that needs to be polled.

voidcond_synchronize_rcu(unsignedlongoldstate)¶: Conditionally wait for an RCU grace period

Parameters

unsignedlongoldstate: value fromget_state_synchronize_rcu(),start_poll_synchronize_rcu(), orstart_poll_synchronize_rcu_expedited()

Description

If a full RCU grace period has elapsed since the earlier call toget_state_synchronize_rcu() orstart_poll_synchronize_rcu(), just return.Otherwise, invokesynchronize_rcu() to wait for a full grace period.

Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.

This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedoldstate and that returned at the endof this function.

voidcond_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶: Conditionally wait for an RCU grace period

Parameters

structrcu_gp_oldstate*rgosp: value fromget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(), orstart_poll_synchronize_rcu_expedited_full()

Description

If a full RCU grace period has elapsed since the call toget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(),orstart_poll_synchronize_rcu_expedited_full() from whichrgosp wasobtained, just return. Otherwise, invokesynchronize_rcu() to waitfor a full grace period.

This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedrgosp and that returned at the end ofthis function.

voidrcu_barrier(void)¶: Wait until all in-flightcall_rcu() callbacks complete.

Parameters

void: no arguments

Description

Note that this primitive does not necessarily wait for an RCU grace periodto complete. For example, if there are no RCU callbacks queued anywherein the system, thenrcu_barrier() is within its rights to returnimmediately, without waiting for anything, much less an RCU grace period.In fact,rcu_barrier() will normally not result in any RCU grace periodsbeyond those that were already destined to be executed.

In kernels built with CONFIG_RCU_LAZY=y, this function also hurries allpending lazy RCU callbacks.

voidrcu_barrier_throttled(void)¶: Dorcu_barrier(), but limit to one per second

Parameters

void: no arguments

Description

This can be thought of as guard rails aroundrcu_barrier() thatpermits unrestricted userspace use, at least assuming the hardware’stry_cmpxchg() is robust. There will be at most one call per second torcu_barrier() system-wide from use of this function, which means thatcallers might needlessly wait a second or three.

This is intended for use by test suites to avoid OOM by flushing RCUcallbacks from the previous test before starting the next. See thercutree.do_rcu_barrier module parameter for more information.

Why not simply makercu_barrier() more scalable? That might bethe eventual endpoint, but let’s keep it simple for the time being.Note that the module parameter infrastructure serializes calls to agiven .set() function, but should concurrent .set() invocation ever bepossible, we are ready!

voidsynchronize_rcu_expedited(void)¶: Brute-force RCU grace period

Parameters

void: no arguments

Description

Wait for an RCU grace period, but expedite it. The basic idea is toIPI all non-idle non-nohz online CPUs. The IPI handler checks whetherthe CPU is in an RCU critical section, and if so, it sets a flag thatcauses the outermostrcu_read_unlock() to report the quiescent statefor RCU-preempt or asks the scheduler for help for RCU-sched. On theother hand, if the CPU is not in an RCU read-side critical section,the IPI handler reports the quiescent state immediately.

Although this is a great improvement over previous expeditedimplementations, it is still unfriendly to real-time workloads, so isthus not recommended for any sort of common-case code. In fact, ifyou are usingsynchronize_rcu_expedited() in a loop, please restructureyour code to batch your updates, and then use a singlesynchronize_rcu()instead.

This has the same semantics as (but is more brutal than)synchronize_rcu().

unsignedlongstart_poll_synchronize_rcu_expedited(void)¶: Snapshot current RCU state and start expedited grace period

Parameters

void: no arguments

Description

Returns a cookie to pass to a call tocond_synchronize_rcu(),cond_synchronize_rcu_expedited(), orpoll_state_synchronize_rcu(),allowing them to determine whether or not any sort of grace period haselapsed in the meantime. If the needed expedited grace period is notalready slated to start, initiates that grace period.

voidstart_poll_synchronize_rcu_expedited_full(structrcu_gp_oldstate*rgosp)¶: Take a full snapshot and start expedited grace period

Parameters

structrcu_gp_oldstate*rgosp: Place to put snapshot of grace-period state

Description

Places the normal and expedited grace-period states in rgosp. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()orpoll_state_synchronize_rcu_full() to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.If the needed expedited grace period is not already slated to start,initiates that grace period.

voidcond_synchronize_rcu_expedited(unsignedlongoldstate)¶: Conditionally wait for an expedited RCU grace period

Parameters

unsignedlongoldstate: value fromget_state_synchronize_rcu(),start_poll_synchronize_rcu(), orstart_poll_synchronize_rcu_expedited()

Description

If any type of full RCU grace period has elapsed since the earliercall toget_state_synchronize_rcu(),start_poll_synchronize_rcu(),orstart_poll_synchronize_rcu_expedited(), just return. Otherwise,invokesynchronize_rcu_expedited() to wait for a full grace period.

voidcond_synchronize_rcu_expedited_full(structrcu_gp_oldstate*rgosp)¶: Conditionally wait for an expedited RCU grace period

Parameters

structrcu_gp_oldstate*rgosp: value fromget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(), orstart_poll_synchronize_rcu_expedited_full()

Description

boolrcu_read_lock_held_common(bool*ret)¶: might we be in RCU-sched read-side critical section?

Parameters

bool*ret: Best guess answer if lockdep cannot be relied on

Description

Returns true if lockdep must be ignored, in which case*ret containsthe best guess described below. Otherwise returns false, in whichcase*ret tells the caller nothing and the caller should insteadconsult lockdep.

If CONFIG_DEBUG_LOCK_ALLOC is selected, set*ret to nonzero iff in anRCU-sched read-side critical section. In absence ofCONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-sidecritical section unless it can prove otherwise. Note that disablingof preemption (including disabling irqs) counts as an RCU-schedread-side critical section. This is useful for debug checks in functionsthat required that they be called within an RCU-sched read-sidecritical section.

Checkdebug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.

Note that if the CPU is in the idle loop from an RCU point of view (ie:that we are in the section betweenct_idle_enter() andct_idle_exit())thenrcu_read_lock_held() sets*ret to false even if the CPU did anrcu_read_lock(). The reason for this is that RCU ignores CPUs that arein such a section, considering these as in extended quiescent state,so such a CPU is effectively never in an RCU read-side critical sectionregardless of what RCU primitives it invokes. This state of affairs isrequired --- we need to keep an RCU-free window in idle where the CPU maypossibly enter into low power mode. This way we can notice an extendedquiescent state to other CPUs that started a grace period. Otherwisewe would delay any grace period as long as we run in the idle task.

Similarly, we avoid claiming an RCU read lock held if the currentCPU is offline.

voidrcu_async_hurry(void)¶: Make future async RCU callbacks not lazy.

Parameters

void: no arguments

Description

After a call to this function, future calls tocall_rcu()will be processed in a timely fashion.

voidrcu_async_relax(void)¶: Make future async RCU callbacks lazy.

Parameters

void: no arguments

Description

After a call to this function, future calls tocall_rcu()will be processed in a lazy fashion.

voidrcu_expedite_gp(void)¶: Expedite future RCU grace periods

Parameters

void: no arguments

Description

After a call to this function, future calls tosynchronize_rcu() andfriends act as the correspondingsynchronize_rcu_expedited() functionhad instead been called.

voidrcu_unexpedite_gp(void)¶: Cancel priorrcu_expedite_gp() invocation

Parameters

void: no arguments

Description

Undo a prior call torcu_expedite_gp(). If all prior calls torcu_expedite_gp() are undone by a subsequent call torcu_unexpedite_gp(),and if the rcu_expedited sysfs/boot parameter is not set, then allsubsequent calls tosynchronize_rcu() and friends will return totheir normal non-expedited behavior.

intnotracercu_read_lock_held(void)¶: might we be in RCU read-side critical section?

Parameters

void: no arguments

Description

If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCUread-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,this assumes we are in an RCU read-side critical section unless it canprove otherwise. This is useful for debug checks in functions thatrequire that they be called within an RCU read-side critical section.

Checksdebug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.

Note thatrcu_read_lock() and the matchingrcu_read_unlock() mustoccur in the same context, for example, it is illegal to invokercu_read_unlock() in process context if the matchingrcu_read_lock()was invoked from within an irq handler.

Note thatrcu_read_lock() is disallowed if the CPU is either idle oroffline from an RCU perspective, so check for those as well.

intnotracercu_read_lock_bh_held(void)¶: might we be in RCU-bh read-side critical section?

Parameters

void: no arguments

Description

Check for bottom half being disabled, which covers both theCONFIG_PROVE_RCU and not cases. Note that if someone usesrcu_read_lock_bh(), but then later enables BH, lockdep (if enabled)will show the situation. This is useful for debug checks in functionsthat require that they be called within an RCU read-side criticalsection.

Checkdebug_lockdep_rcu_enabled() to prevent false positives during boot.

Note thatrcu_read_lock_bh() is disallowed if the CPU is either idle oroffline from an RCU perspective, so check for those as well.

voidwakeme_after_rcu(structrcu_head*head)¶: Callback function to awaken a task after grace period

Parameters

structrcu_head*head: Pointer to rcu_head member within rcu_synchronize structure

Description

Awaken the corresponding task now that a grace period has elapsed.

voidinit_rcu_head_on_stack(structrcu_head*head)¶: initialize on-stack rcu_head for debugobjects

Parameters

structrcu_head*head: pointer to rcu_head structure to be initialized

Description

This function informs debugobjects of a new rcu_head structure thathas been allocated as an auto variable on the stack. This functionis not required for rcu_head structures that are statically defined orthat are dynamically allocated on the heap. This function has noeffect for !CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.

voiddestroy_rcu_head_on_stack(structrcu_head*head)¶: destroy on-stack rcu_head for debugobjects

Parameters

structrcu_head*head: pointer to rcu_head structure to be initialized

Description

This function informs debugobjects that an on-stack rcu_head structureis about to go out of scope. As withinit_rcu_head_on_stack(), thisfunction is not required for rcu_head structures that are staticallydefined or that are dynamically allocated on the heap. Also as withinit_rcu_head_on_stack(), this function has no effect for!CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.

unsignedlongget_completed_synchronize_rcu(void)¶: Return a pre-completed polled state cookie

Parameters

void: no arguments

Description

Returns a value that will always be treated by functions likepoll_state_synchronize_rcu() as a cookie whose grace period has alreadycompleted.

unsignedlongget_completed_synchronize_srcu(void)¶: Return a pre-completed polled state cookie

Parameters

void: no arguments

Description

Returns a value thatpoll_state_synchronize_srcu() will always treatas a cookie whose grace period has already completed.

boolsame_state_synchronize_srcu(unsignedlongoldstate1,unsignedlongoldstate2)¶: Are two old-state values identical?

Parameters

unsignedlongoldstate1: First old-state value.
unsignedlongoldstate2: Second old-state value.

Description

The two old-state values must have been obtained from eitherget_state_synchronize_srcu(),start_poll_synchronize_srcu(), orget_completed_synchronize_srcu(). Returnstrue if the two values areidentical andfalse otherwise. This allows structures whose lifetimesare tracked by old-state values to push these values to a list header,allowing those structures to be slightly smaller.

intsrcu_read_lock_held(conststructsrcu_struct*ssp)¶: might we be in SRCU read-side critical section?

Parameters

conststructsrcu_struct*ssp: The srcu_struct structure to check

Description

If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an SRCUread-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,this assumes we are in an SRCU read-side critical section unless it canprove otherwise.

Checksdebug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.

Note that SRCU is based on its own statemachine and it doesn’trelies on normal RCU, it can be called from the CPU whichis in the idle loop from an RCU point of view or offline.

srcu_dereference_check¶

srcu_dereference_check(p,ssp,c)

fetch SRCU-protected pointer for later dereferencing

Parameters

p: the pointer to fetch and protect for later dereferencing
ssp: pointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.
c: condition to check for update-side use

Description

If PROVE_RCU is enabled, invoking this outside of an RCU read-sidecritical section will result in an RCU-lockdep splat, unlessc evaluatesto 1. Thec argument will normally be a logical expression containinglockdep_is_held() calls.

srcu_dereference¶

srcu_dereference(p,ssp)

fetch SRCU-protected pointer for later dereferencing

Parameters

p: the pointer to fetch and protect for later dereferencing
ssp: pointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.

Description

Makesrcu_dereference_check() do the dirty work. If PROVE_RCUis enabled, invoking this outside of an RCU read-side criticalsection will result in an RCU-lockdep splat.

srcu_dereference_notrace¶

srcu_dereference_notrace(p,ssp)

no tracing and no lockdep calls from here

Parameters

p: the pointer to fetch and protect for later dereferencing
ssp: pointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.

intsrcu_read_lock(structsrcu_struct*ssp)¶: register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to register the new reader.

Description

Enter an SRCU read-side critical section. Note that SRCU read-sidecritical sections may be nested. However, it is illegal tocall anything that waits on an SRCU grace period for the samesrcu_struct, whether directly or indirectly. Please note thatone way to indirectly wait on an SRCU grace period is to acquirea mutex that is held elsewhere while callingsynchronize_srcu() orsynchronize_srcu_expedited().

The return value fromsrcu_read_lock() is guaranteed to benon-negative. This value must be passed unaltered to the matchingsrcu_read_unlock(). Note thatsrcu_read_lock() and the matchingsrcu_read_unlock() must occur in the same context, for example, it isillegal to invokesrcu_read_unlock() in an irq handler if the matchingsrcu_read_lock() was invoked in process context. Or, for that matter toinvokesrcu_read_unlock() from one task and the matchingsrcu_read_lock()from another.

structsrcu_ctr__percpu*srcu_read_lock_fast(structsrcu_struct*ssp)¶: register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to register the new reader.

Description

Enter an SRCU read-side critical section, but for a light-weightsmp_mb()-free reader. Seesrcu_read_lock() for more information. Thisfunction is NMI-safe, in a manner similar tosrcu_read_lock_nmisafe().

Forsrcu_read_lock_fast() to be used on an srcu_struct structure,that structure must have been defined using eitherDEFINE_SRCU_FAST()orDEFINE_STATIC_SRCU_FAST() on the one hand or initialized withinit_srcu_struct_fast() on the other. Such an srcu_struct structurecannot be passed to any non-fast variant of srcu_read_{,un}lock() orsrcu_{down,up}_read(). In kernels built with CONFIG_PROVE_RCU=y,__srcu_check_read_flavor() will complain bitterly if you ignore thisrestriction.

Grace-period auto-expediting is disabled for SRCU-fast srcu_structstructures because SRCU-fast expedited grace periods invokesynchronize_rcu_expedited(), IPIs and all. If you need expeditedSRCU-fast grace periods, usesynchronize_srcu_expedited().

Thesrcu_read_lock_fast() function can be invoked only from thosecontexts where RCU is watching, that is, from contexts where it wouldbe legal to invokercu_read_lock(). Otherwise, lockdep will complain.

structsrcu_ctr__percpu*srcu_read_lock_fast_updown(structsrcu_struct*ssp)¶: register a new reader for an SRCU-fast-updown structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to register the new reader.

Description

Enter an SRCU read-side critical section, but for a light-weightsmp_mb()-free reader. Seesrcu_read_lock() for more information.This function is compatible withsrcu_down_read_fast(), but is notNMI-safe.

Forsrcu_read_lock_fast_updown() to be used on an srcu_structstructure, that structure must have been defined using eitherDEFINE_SRCU_FAST_UPDOWN() orDEFINE_STATIC_SRCU_FAST_UPDOWN() on the onehand or initialized withinit_srcu_struct_fast_updown() on the other.Such an srcu_struct structure cannot be passed to any non-fast-updownvariant of srcu_read_{,un}lock() or srcu_{down,up}_read(). In kernelsbuilt with CONFIG_PROVE_RCU=y,__srcu_check_read_flavor() will complainbitterly if you ignore this * restriction.

Grace-period auto-expediting is disabled for SRCU-fast-updownsrcu_struct structures because SRCU-fast-updown expedited grace periodsinvokesynchronize_rcu_expedited(), IPIs and all. If you need expeditedSRCU-fast-updown grace periods, usesynchronize_srcu_expedited().

Thesrcu_read_lock_fast_updown() function can be invoked only fromthose contexts where RCU is watching, that is, from contexts whereit would be legal to invokercu_read_lock(). Otherwise, lockdep willcomplain.

structsrcu_ctr__percpu*srcu_down_read_fast(structsrcu_struct*ssp)¶: register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to register the new reader.

Description

Enter a semaphore-like SRCU read-side critical section, but fora light-weightsmp_mb()-free reader. Seesrcu_read_lock_fast() andsrcu_down_read() for more information.

The same srcu_struct may be used concurrently bysrcu_down_read_fast()andsrcu_read_lock_fast(). However, the same definition/initializationrequirements called out forsrcu_read_lock_safe() apply.

intsrcu_read_lock_nmisafe(structsrcu_struct*ssp)¶: register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to register the new reader.

Description

Enter an SRCU read-side critical section, but in an NMI-safe manner.Seesrcu_read_lock() for more information.

Ifsrcu_read_lock_nmisafe() is ever used on an srcu_struct structure,then none of the other flavors may be used, whether before, during,or after.

intsrcu_down_read(structsrcu_struct*ssp)¶: register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to register the new reader.

Description

Enter a semaphore-like SRCU read-side critical section. Note thatSRCU read-side critical sections may be nested. However, it isillegal to call anything that waits on an SRCU grace period for thesame srcu_struct, whether directly or indirectly. Please note thatone way to indirectly wait on an SRCU grace period is to acquirea mutex that is held elsewhere while callingsynchronize_srcu() orsynchronize_srcu_expedited(). But if you want lockdep to help youkeep this stuff straight, you should instead usesrcu_read_lock().

The semaphore-like nature ofsrcu_down_read() means that the matchingsrcu_up_read() can be invoked from some other context, for example,from some other task or from an irq handler. However, neithersrcu_down_read() norsrcu_up_read() may be invoked from an NMI handler.

Calls tosrcu_down_read() may be nested, similar to the manner inwhich calls todown_read() may be nested. The same srcu_struct may beused concurrently bysrcu_down_read() andsrcu_read_lock().

voidsrcu_read_unlock(structsrcu_struct*ssp,intidx)¶: unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to unregister the old reader.
intidx: return value from correspondingsrcu_read_lock().

Description

Exit an SRCU read-side critical section.

voidsrcu_read_unlock_fast(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)¶: unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to unregister the old reader.
structsrcu_ctr__percpu*scp: return value from correspondingsrcu_read_lock_fast().

Description

Exit a light-weight SRCU read-side critical section.

voidsrcu_read_unlock_fast_updown(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)¶: unregister a old reader from an SRCU-fast-updown structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to unregister the old reader.
structsrcu_ctr__percpu*scp: return value from correspondingsrcu_read_lock_fast_updown().

Description

Exit an SRCU-fast-updown read-side critical section.

voidsrcu_up_read_fast(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)¶: unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to unregister the old reader.
structsrcu_ctr__percpu*scp: return value from correspondingsrcu_read_lock_fast().

Description

Exit an SRCU read-side critical section, but not necessarily fromthe same context as the machingsrcu_down_read_fast().

voidsrcu_read_unlock_nmisafe(structsrcu_struct*ssp,intidx)¶: unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to unregister the old reader.
intidx: return value from correspondingsrcu_read_lock_nmisafe().

Description

Exit an SRCU read-side critical section, but in an NMI-safe manner.

voidsrcu_up_read(structsrcu_struct*ssp,intidx)¶: unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp: srcu_struct in which to unregister the old reader.
intidx: return value from correspondingsrcu_read_lock().

Description

Exit an SRCU read-side critical section, but not necessarily fromthe same context as the machingsrcu_down_read().

voidsmp_mb__after_srcu_read_unlock(void)¶: ensure full ordering after srcu_read_unlock

Parameters

void: no arguments

Description

Converts the preceding srcu_read_unlock into a two-way memory barrier.

Call this after srcu_read_unlock, to guarantee that all memory operationsthat occur after smp_mb__after_srcu_read_unlock will appear to happen afterthe preceding srcu_read_unlock.

voidsmp_mb__after_srcu_read_lock(void)¶: ensure full ordering after srcu_read_lock

Parameters

void: no arguments

Description

Converts the preceding srcu_read_lock into a two-way memory barrier.

Call this after srcu_read_lock, to guarantee that all memory operationsthat occur after smp_mb__after_srcu_read_lock will appear to happen afterthe preceding srcu_read_lock.

intinit_srcu_struct(structsrcu_struct*ssp)¶: initialize a sleep-RCU structure

Parameters

structsrcu_struct*ssp: structure to initialize.

Description

Use this in place ofDEFINE_SRCU() andDEFINE_STATIC_SRCU()for non-static srcu_struct structures that are to be passed tosrcu_read_lock(),srcu_read_lock_nmisafe(), and friends. It is necessaryto invoke this on a given srcu_struct before passing that srcu_structto any other function. Each srcu_struct represents a separate domainof SRCU protection.

intinit_srcu_struct_fast(structsrcu_struct*ssp)¶: initialize a fast-reader sleep-RCU structure

Parameters

structsrcu_struct*ssp: structure to initialize.

Description

Use this in place ofDEFINE_SRCU_FAST() andDEFINE_STATIC_SRCU_FAST()for non-static srcu_struct structures that are to be passed tosrcu_read_lock_fast() and friends. It is necessary to invoke this on agiven srcu_struct before passing that srcu_struct to any other function.Each srcu_struct represents a separate domain of SRCU protection.

intinit_srcu_struct_fast_updown(structsrcu_struct*ssp)¶: initialize a fast-reader up/down sleep-RCU structure

Parameters

structsrcu_struct*ssp: structure to initialize.

Description

Use this function in place ofDEFINE_SRCU_FAST_UPDOWN() andDEFINE_STATIC_SRCU_FAST_UPDOWN() for non-static srcu_structstructures that are to be passed tosrcu_read_lock_fast_updown(),srcu_down_read_fast(), and friends. It is necessary to invoke this on agiven srcu_struct before passing that srcu_struct to any other function.Each srcu_struct represents a separate domain of SRCU protection.

boolsrcu_readers_active(structsrcu_struct*ssp)¶: returns true if there are readers. and false otherwise

Parameters

structsrcu_struct*ssp: which srcu_struct to count active readers (holding srcu_read_lock).

Description

Note that this is not an atomic primitive, and can therefore suffersevere errors when invoked on an active srcu_struct. That said, itcan be useful as an error check at cleanup time.

voidcleanup_srcu_struct(structsrcu_struct*ssp)¶: deconstruct a sleep-RCU structure

Parameters

structsrcu_struct*ssp: structure to clean up.

Description

Must invoke this after you are finished using a given srcu_struct thatwas initialized viainit_srcu_struct(), else you leak memory.

voidcall_srcu(structsrcu_struct*ssp,structrcu_head*rhp,rcu_callback_tfunc)¶: Queue a callback for invocation after an SRCU grace period

Parameters

structsrcu_struct*ssp: srcu_struct in queue the callback
structrcu_head*rhp: structure to be used for queueing the SRCU callback.
rcu_callback_tfunc: function to be invoked after the SRCU grace period

Description

The callback function will be invoked some time after a full SRCUgrace period elapses, in other words after all pre-existing SRCUread-side critical sections have completed. However, the callbackfunction might well execute concurrently with other SRCU read-sidecritical sections that started aftercall_srcu() was invoked. SRCUread-side critical sections are delimited bysrcu_read_lock() andsrcu_read_unlock(), and may be nested.

The callback will be invoked from process context, but with bhdisabled. The callback function must therefore be fast and mustnot block.

See the description ofcall_rcu() for more detailed information onmemory ordering guarantees.

voidsynchronize_srcu_expedited(structsrcu_struct*ssp)¶: Brute-force SRCU grace period

Parameters

structsrcu_struct*ssp: srcu_struct with which to synchronize.

Description

Wait for an SRCU grace period to elapse, but be more aggressive aboutspinning rather than blocking when waiting.

Note thatsynchronize_srcu_expedited() has the same deadlock andmemory-ordering properties as doessynchronize_srcu().

voidsynchronize_srcu(structsrcu_struct*ssp)¶: wait for prior SRCU read-side critical-section completion

Parameters

structsrcu_struct*ssp: srcu_struct with which to synchronize.

Description

Wait for the count to drain to zero of both indexes. To avoid thepossible starvation ofsynchronize_srcu(), it waits for the count ofthe index=!(ssp->srcu_ctrp -ssp->sda->srcu_ctrs[0]) to drain to zeroat first, and then flip the ->srcu_ctrp and wait for the count of theother index.

Can block; must be called from process context.

Note that it is illegal to callsynchronize_srcu() from the correspondingSRCU read-side critical section; doing so will result in deadlock.However, it is perfectly legal to callsynchronize_srcu() on onesrcu_struct from some other srcu_struct’s read-side critical section,as long as the resulting graph of srcu_structs is acyclic.

There are memory-ordering constraints implied bysynchronize_srcu().On systems with more than one CPU, whensynchronize_srcu() returns,each CPU is guaranteed to have executed a full memory barrier sincethe end of its last corresponding SRCU read-side critical sectionwhose beginning preceded the call tosynchronize_srcu(). In addition,each CPU having an SRCU read-side critical section that extends beyondthe return fromsynchronize_srcu() is guaranteed to have executed afull memory barrier after the beginning ofsynchronize_srcu() and beforethe beginning of that SRCU read-side critical section. Note that theseguarantees include CPUs that are offline, idle, or executing in user mode,as well as CPUs that are executing in the kernel.

Furthermore, if CPU A invokedsynchronize_srcu(), which returnedto its caller on CPU B, then both CPU A and CPU B are guaranteedto have executed a full memory barrier during the execution ofsynchronize_srcu(). This guarantee applies even if CPU A and CPU Bare the same CPU, but again only if the system has more than one CPU.

Of course, these memory-ordering guarantees apply only whensynchronize_srcu(),srcu_read_lock(), andsrcu_read_unlock() arepassed the same srcu_struct structure.

Implementation of these memory-ordering guarantees is similar tothat ofsynchronize_rcu().

If SRCU is likely idle as determined bysrcu_should_expedite(),expedite the first request. This semantic was provided by Classic SRCU,and is relied upon by its users, so TREE SRCU must also provide it.Note that detecting idleness is heuristic and subject to both falsepositives and negatives.

unsignedlongget_state_synchronize_srcu(structsrcu_struct*ssp)¶: Provide an end-of-grace-period cookie

Parameters

structsrcu_struct*ssp: srcu_struct to provide cookie for.

Description

This function returns a cookie that can be passed topoll_state_synchronize_srcu(), which will return true if a full graceperiod has elapsed in the meantime. It is the caller’s responsibilityto make sure that grace period happens, for example, by invokingcall_srcu() after return fromget_state_synchronize_srcu().

unsignedlongstart_poll_synchronize_srcu(structsrcu_struct*ssp)¶: Provide cookie and start grace period

Parameters

structsrcu_struct*ssp: srcu_struct to provide cookie for.

Description

This function returns a cookie that can be passed topoll_state_synchronize_srcu(), which will return true if a full graceperiod has elapsed in the meantime. Unlikeget_state_synchronize_srcu(),this function also ensures that any needed SRCU grace period will bestarted. This convenience does come at a cost in terms of CPU overhead.

boolpoll_state_synchronize_srcu(structsrcu_struct*ssp,unsignedlongcookie)¶: Has cookie’s grace period ended?

Parameters

structsrcu_struct*ssp: srcu_struct to provide cookie for.
unsignedlongcookie: Return value fromget_state_synchronize_srcu() orstart_poll_synchronize_srcu().

Description

This function takes the cookie that was returned from eitherget_state_synchronize_srcu() orstart_poll_synchronize_srcu(), andreturnstrue if an SRCU grace period elapsed since the time that thecookie was created.

Because cookies are finite in size, wrapping/overflow is possible.This is more pronounced on 32-bit systems where cookies are 32 bits,where in theory wrapping could happen in about 14 hours assuming25-microsecond expedited SRCU grace periods. However, a more likelyoverflow lower bound is on the order of 24 days in the case ofone-millisecond SRCU grace periods. Of course, wrapping in a 64-bitsystem requires geologic timespans, as in more than seven million yearseven for expedited SRCU grace periods.

Wrapping/overflow is much more of an issue for CONFIG_SMP=n systemsthat also have CONFIG_PREEMPTION=n, which selects Tiny SRCU. This usesa 16-bit cookie, which rcutorture routinely wraps in a matter of afew minutes. If this proves to be a problem, this counter will beexpanded to the same size as for Tree SRCU.

voidsrcu_barrier(structsrcu_struct*ssp)¶: Wait until all in-flightcall_srcu() callbacks complete.

Parameters

structsrcu_struct*ssp: srcu_struct on which to wait for in-flight callbacks.

voidsrcu_expedite_current(structsrcu_struct*ssp)¶: Expedite the current SRCU grace period

Parameters

structsrcu_struct*ssp: srcu_struct to expedite.

Description

Cause the current SRCU grace period to become expedited. The graceperiod following the current one might also be expedited. If there isno current grace period, one might be created. If the current graceperiod is currently sleeping, that sleep will complete before expeditingwill take effect.

unsignedlongsrcu_batches_completed(structsrcu_struct*ssp)¶: return batches completed.

Parameters

structsrcu_struct*ssp: srcu_struct on which to report batch completion.

Description

Report the number of batches, correlated with, but not necessarilyprecisely the same as, the number of grace periods that have elapsed.

voidhlist_bl_del_rcu(structhlist_bl_node*n)¶: deletes entry from hash list without re-initialization

Parameters

structhlist_bl_node*n: the element to delete from the hash list.

Note

hlist_bl_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.

In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.

voidhlist_bl_add_head_rcu(structhlist_bl_node*n,structhlist_bl_head*h)¶

Parameters

structhlist_bl_node*n: the element to add to the hash list.
structhlist_bl_head*h: the list to add to.

Description

Adds the specified element to the specified hlist_bl,while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_bl_add_head_rcu()orhlist_bl_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_bl_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().

hlist_bl_for_each_entry_rcu¶

hlist_bl_for_each_entry_rcu(tpos,pos,head,member)

iterate over rcu list of given type

Parameters

tpos: the type * to use as a loop cursor.
pos: thestructhlist_bl_node to use as a loop cursor.
head: the head for your list.
member: the name of the hlist_bl_node within the struct.

list_for_each_rcu¶

list_for_each_rcu(pos,head)

Iterate over a list in an RCU-safe fashion

Parameters

pos: thestructlist_head to use as a loop cursor.
head: the head for your list.

list_tail_rcu¶

list_tail_rcu(head)

returns the prev pointer of the head of the list

Parameters

head: the head of the list

Note

This should only be used with the list header, and even thenonly iflist_del() and similar primitives are not also used on thelist header.

voidlist_add_rcu(structlist_head*new,structlist_head*head)¶: add a new entry to rcu-protected list

Parameters

structlist_head*new: new entry to be added
structlist_head*head: list head to add it after

Description

Insert a new entry after the specified head.This is good for implementing stacks.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_add_rcu()orlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu().

voidlist_add_tail_rcu(structlist_head*new,structlist_head*head)¶: add a new entry to rcu-protected list

Parameters

structlist_head*new: new entry to be added
structlist_head*head: list head to add it before

Description

Insert a new entry before the specified head.This is useful for implementing queues.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_add_tail_rcu()orlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu().

voidlist_del_rcu(structlist_head*entry)¶: deletes entry from list without re-initialization

Parameters

structlist_head*entry: the element to delete from the list.

Note

list_empty() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.

In particular, it means that we can not poison the forwardpointers that may still be used for walking the list.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_del_rcu()orlist_add_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu().

Note that the caller is not permitted to immediately freethe newly deleted entry. Instead, eithersynchronize_rcu()orcall_rcu() must be used to defer freeing until an RCUgrace period has elapsed.

voidlist_bidir_del_rcu(structlist_head*entry)¶: deletes entry from list without re-initialization

Parameters

structlist_head*entry: the element to delete from the list.

Description

In contrast tolist_del_rcu() doesn’t poison the prev pointer thusallowing backwards traversal vialist_bidir_prev_rcu().

Note

list_empty() on entry does not return true after this becausethe entry is in a special undefined state that permits RCU-basedlockfree reverse traversal. In particular this means that we can notpoison the forward and backwards pointers that may still be used forwalking the list.

The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with another list-mutationprimitive, such aslist_bidir_del_rcu() orlist_add_rcu(), running onthis same list. However, it is perfectly legal to run concurrentlywith the _rcu list-traversal primitives, such aslist_for_each_entry_rcu().

Note thatlist_del_rcu() andlist_bidir_del_rcu() must not be used onthe same list.

Note that the caller is not permitted to immediately freethe newly deleted entry. Instead, eithersynchronize_rcu()orcall_rcu() must be used to defer freeing until an RCUgrace period has elapsed.

voidhlist_del_init_rcu(structhlist_node*n)¶: deletes entry from hash list with re-initialization

Parameters

structhlist_node*n: the element to delete from the hash list.

Note

list_unhashed() on the node return true after this. It isuseful for RCU based read lockfree traversal if the writer sidemust know if the list entry is still hashed or already unhashed.

In particular, it means that we can not poison the forward pointersthat may still be used for walking the hash list and we can onlyzero the pprev pointer solist_unhashed() will return true afterthis.

voidlist_replace_rcu(structlist_head*old,structlist_head*new)¶: replace old entry by new one

Parameters

structlist_head*old: the element to be replaced
structlist_head*new: the new element to insert

Description

Theold entry will be replaced with thenew entry atomically fromthe perspective of concurrent readers. It is the caller’s responsibilityto synchronize with concurrent updaters, if any.

Note

old should not be empty.

void__list_splice_init_rcu(structlist_head*list,structlist_head*prev,structlist_head*next,void(*sync)(void))¶: join an RCU-protected list into an existing list.

Parameters

structlist_head*list: the RCU-protected list to splice
structlist_head*prev: points to the last element of the existing list
structlist_head*next: points to the first element of the existing list
void(*sync)(void): synchronize_rcu, synchronize_rcu_expedited, ...

Description

The list pointed to byprev andnext can be RCU-read traversedconcurrently with this function.

Note that this function blocks.

Important note: the caller must take whatever action is necessary to preventany other updates to the existing list. In principle, it is possible tomodify the list as soon assync() begins execution. If this sort of thingbecomes necessary, an alternative version based oncall_rcu() could becreated. But only if -really- needed -- there is no shortage of RCU APImembers.

voidlist_splice_init_rcu(structlist_head*list,structlist_head*head,void(*sync)(void))¶: splice an RCU-protected list into an existing list, designed for stacks.

Parameters

structlist_head*list: the RCU-protected list to splice
structlist_head*head: the place in the existing list to splice the first list into
void(*sync)(void): synchronize_rcu, synchronize_rcu_expedited, ...

voidlist_splice_tail_init_rcu(structlist_head*list,structlist_head*head,void(*sync)(void))¶: splice an RCU-protected list into an existing list, designed for queues.

Parameters

structlist_head*list: the RCU-protected list to splice
structlist_head*head: the place in the existing list to splice the first list into
void(*sync)(void): synchronize_rcu, synchronize_rcu_expedited, ...

list_entry_rcu¶

list_entry_rcu(ptr,type,member)

get the struct for this entry

Parameters

ptr: thestructlist_head pointer.
type: the type of thestructthis is embedded in.
member: the name of the list_head within the struct.

Description

This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu() as long as it’s guarded byrcu_read_lock().

list_first_or_null_rcu¶

list_first_or_null_rcu(ptr,type,member)

get the first element from a list

Parameters

ptr: the list head to take the element from.
type: the type of thestructthis is embedded in.
member: the name of the list_head within the struct.

Description

Note that if the list is empty, it returns NULL.

This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu() as long as it’s guarded byrcu_read_lock().

list_next_or_null_rcu¶

list_next_or_null_rcu(head,ptr,type,member)

get the next element from a list

Parameters

head: the head for the list.
ptr: the list head to take the next element from.
type: the type of thestructthis is embedded in.
member: the name of the list_head within the struct.

Description

Note that if the ptr is at the end of the list, NULL is returned.

This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu() as long as it’s guarded byrcu_read_lock().

list_for_each_entry_rcu¶

list_for_each_entry_rcu(pos,head,member,cond...)

iterate over rcu list of given type

Parameters

pos: the type * to use as a loop cursor.
head: the head for your list.
member: the name of the list_head within the struct.
cond...: optional lockdep expression if called from non-RCU protection.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such aslist_add_rcu()as long as the traversal is guarded byrcu_read_lock().

list_for_each_entry_srcu¶

list_for_each_entry_srcu(pos,head,member,cond)

iterate over rcu list of given type

Parameters

pos: the type * to use as a loop cursor.
head: the head for your list.
member: the name of the list_head within the struct.
cond: lockdep expression for the lock required to traverse the list.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such aslist_add_rcu()as long as the traversal is guarded bysrcu_read_lock().The lockdep expressionsrcu_read_lock_held() can be passed as thecond argument from read side.

list_entry_lockless¶

list_entry_lockless(ptr,type,member)

get the struct for this entry

Parameters

ptr: thestructlist_head pointer.
type: the type of thestructthis is embedded in.
member: the name of the list_head within the struct.

Description

This primitive may safely run concurrently with the _rculist-mutation primitives such aslist_add_rcu(), but requires someimplicit RCU read-side guarding. One example is running within a specialexception-time environment where preemption is disabled and where lockdepcannot be invoked. Another example is when items are added to the list,but never deleted.

list_for_each_entry_lockless¶

list_for_each_entry_lockless(pos,head,member)

iterate over rcu list of given type

Parameters

pos: the type * to use as a loop cursor.
head: the head for your list.
member: the name of the list_struct within the struct.

Description

list_for_each_entry_continue_rcu¶

list_for_each_entry_continue_rcu(pos,head,member)

continue iteration over list of given type

Parameters

pos: the type * to use as a loop cursor.
head: the head for your list.
member: the name of the list_head within the struct.

Description

Continue to iterate over list of given type, continuing afterthe current position which must have been in the list when the RCU readlock was taken.This would typically require either that you obtained the node from aprevious walk of the list in the same RCU read-side critical section, orthat you held some sort of non-RCU reference (such as a reference count)to keep the node aliveand in the list.

This iterator is similar tolist_for_each_entry_from_rcu() exceptthis starts after the given position and that one starts at the givenposition.

list_for_each_entry_from_rcu¶

list_for_each_entry_from_rcu(pos,head,member)

iterate over a list from current point

Parameters

pos: the type * to use as a loop cursor.
head: the head for your list.
member: the name of the list_node within the struct.

Description

Iterate over the tail of a list starting from a given position,which must have been in the list when the RCU read lock was taken.This would typically require either that you obtained the node from aprevious walk of the list in the same RCU read-side critical section, orthat you held some sort of non-RCU reference (such as a reference count)to keep the node aliveand in the list.

This iterator is similar tolist_for_each_entry_continue_rcu() exceptthis starts from the given position and that one starts from the positionafter the given position.

voidhlist_del_rcu(structhlist_node*n)¶: deletes entry from hash list without re-initialization

Parameters

structhlist_node*n: the element to delete from the hash list.

Note

list_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.

In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.

voidhlist_replace_rcu(structhlist_node*old,structhlist_node*new)¶: replace old entry by new one

Parameters

structhlist_node*old: the element to be replaced
structhlist_node*new: the new element to insert

Description

Theold entry will be replaced with thenew entry atomically fromthe perspective of concurrent readers. It is the caller’s responsibilityto synchronize with concurrent updaters, if any.

voidhlists_swap_heads_rcu(structhlist_head*left,structhlist_head*right)¶: swap the lists the hlist heads point to

Parameters

structhlist_head*left: The hlist head on the left
structhlist_head*right: The hlist head on the right

Description

The lists start out as [left ][node1 ... ] and: [right ][node2 ... ]
The lists end up as [left ][node2 ... ]: [right ][node1 ... ]

voidhlist_add_head_rcu(structhlist_node*n,structhlist_head*h)¶

Parameters

structhlist_node*n: the element to add to the hash list.
structhlist_head*h: the list to add to.

Description

Adds the specified element to the specified hlist,while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().

voidhlist_add_tail_rcu(structhlist_node*n,structhlist_head*h)¶

Parameters

structhlist_node*n: the element to add to the hash list.
structhlist_head*h: the list to add to.

Description

Adds the specified element to the specified hlist,while permitting racing traversals.

voidhlist_add_before_rcu(structhlist_node*n,structhlist_node*next)¶

Parameters

structhlist_node*n: the new element to add to the hash list.
structhlist_node*next: the existing element to add the new element before.

Description

Adds the specified element to the specified hlistbefore the specified node while permitting racing traversals.

voidhlist_add_behind_rcu(structhlist_node*n,structhlist_node*prev)¶

Parameters

structhlist_node*n: the new element to add to the hash list.
structhlist_node*prev: the existing element to add the new element after.

Description

Adds the specified element to the specified hlistafter the specified node while permitting racing traversals.

hlist_for_each_entry_rcu¶

hlist_for_each_entry_rcu(pos,head,member,cond...)

iterate over rcu list of given type

Parameters

pos: the type * to use as a loop cursor.
head: the head for your list.
member: the name of the hlist_node within the struct.
cond...: optional lockdep expression if called from non-RCU protection.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().

hlist_for_each_entry_srcu¶

hlist_for_each_entry_srcu(pos,head,member,cond)

iterate over rcu list of given type

Parameters

pos: the type * to use as a loop cursor.
head: the head for your list.
member: the name of the hlist_node within the struct.
cond: lockdep expression for the lock required to traverse the list.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded bysrcu_read_lock().The lockdep expressionsrcu_read_lock_held() can be passed as thecond argument from read side.

hlist_for_each_entry_rcu_notrace¶

hlist_for_each_entry_rcu_notrace(pos,head,member)

iterate over rcu list of given type (for tracing)

Parameters

pos: the type * to use as a loop cursor.
head: the head for your list.
member: the name of the hlist_node within the struct.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().

This is the same ashlist_for_each_entry_rcu() except that it doesnot do any RCU debugging or tracing.

hlist_for_each_entry_rcu_bh¶

hlist_for_each_entry_rcu_bh(pos,head,member)

iterate over rcu list of given type

Parameters

pos: the type * to use as a loop cursor.
head: the head for your list.
member: the name of the hlist_node within the struct.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().

hlist_for_each_entry_continue_rcu¶

hlist_for_each_entry_continue_rcu(pos,member)

iterate over a hlist continuing after current point

Parameters

pos: the type * to use as a loop cursor.
member: the name of the hlist_node within the struct.

hlist_for_each_entry_continue_rcu_bh¶

hlist_for_each_entry_continue_rcu_bh(pos,member)

iterate over a hlist continuing after current point

Parameters

pos: the type * to use as a loop cursor.
member: the name of the hlist_node within the struct.

hlist_for_each_entry_from_rcu¶

hlist_for_each_entry_from_rcu(pos,member)

iterate over a hlist continuing from current point

Parameters

pos: the type * to use as a loop cursor.
member: the name of the hlist_node within the struct.

voidhlist_nulls_del_init_rcu(structhlist_nulls_node*n)¶: deletes entry from hash list with re-initialization

Parameters

structhlist_nulls_node*n: the element to delete from the hash list.

Note

hlist_nulls_unhashed() on the node return true after this. It isuseful for RCU based read lockfree traversal if the writer sidemust know if the list entry is still hashed or already unhashed.

The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with anotherlist-mutation primitive, such ashlist_nulls_add_head_rcu() orhlist_nulls_del_rcu(), running on this same list. However, it isperfectly legal to run concurrently with the _rcu list-traversalprimitives, such ashlist_nulls_for_each_entry_rcu().

hlist_nulls_first_rcu¶

hlist_nulls_first_rcu(head)

returns the first element of the hash list.

Parameters

head: the head of the list.

hlist_nulls_next_rcu¶

hlist_nulls_next_rcu(node)

returns the element of the list afternode.

Parameters

node: element of the list.

hlist_nulls_pprev_rcu¶

hlist_nulls_pprev_rcu(node)

returns the dereferenced pprev ofnode.

Parameters

node: element of the list.

voidhlist_nulls_del_rcu(structhlist_nulls_node*n)¶: deletes entry from hash list without re-initialization

Parameters

structhlist_nulls_node*n: the element to delete from the hash list.

Note

hlist_nulls_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.

In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.

voidhlist_nulls_add_head_rcu(structhlist_nulls_node*n,structhlist_nulls_head*h)¶

Parameters

structhlist_nulls_node*n: the element to add to the hash list.
structhlist_nulls_head*h: the list to add to.

Description

Adds the specified element to the specified hlist_nulls,while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()orhlist_nulls_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().

voidhlist_nulls_add_tail_rcu(structhlist_nulls_node*n,structhlist_nulls_head*h)¶

Parameters

structhlist_nulls_node*n: the element to add to the hash list.
structhlist_nulls_head*h: the list to add to.

Description

Adds the specified element to the specified hlist_nulls,while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()orhlist_nulls_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().

voidhlist_nulls_replace_rcu(structhlist_nulls_node*old,structhlist_nulls_node*new)¶: replace an old entry by a new one

Parameters

structhlist_nulls_node*old: the element to be replaced
structhlist_nulls_node*new: the new element to insert

Description

Replace the old entry with the new one in a RCU-protected hlist_nulls, whilepermitting racing traversals.

The caller must take whatever precautions are necessary (such as holdingappropriate locks) to avoid racing with another list-mutation primitive, suchashlist_nulls_add_head_rcu() orhlist_nulls_del_rcu(), running on this samelist. However, it is perfectly legal to run concurrently with the _rculist-traversal primitives, such ashlist_nulls_for_each_entry_rcu().

voidhlist_nulls_replace_init_rcu(structhlist_nulls_node*old,structhlist_nulls_node*new)¶: replace an old entry by a new one and initialize the old

Parameters

structhlist_nulls_node*old: the element to be replaced
structhlist_nulls_node*new: the new element to insert

Description

Replace the old entry with the new one in a RCU-protected hlist_nulls, whilepermitting racing traversals, and reinitialize the old entry.

Note

old must be hashed.

The caller must take whatever precautions are necessary (such as holdingappropriate locks) to avoid racing with another list-mutation primitive, suchashlist_nulls_add_head_rcu() orhlist_nulls_del_rcu(), running on this samelist. However, it is perfectly legal to run concurrently with the _rculist-traversal primitives, such ashlist_nulls_for_each_entry_rcu().

hlist_nulls_for_each_entry_rcu¶

hlist_nulls_for_each_entry_rcu(tpos,pos,head,member)

iterate over rcu list of given type

Parameters

tpos: the type * to use as a loop cursor.
pos: thestructhlist_nulls_node to use as a loop cursor.
head: the head of the list.
member: the name of the hlist_nulls_node within the struct.

Description

Thebarrier() is needed to make sure compiler doesn’t cache first element [1],as this loop can be restarted [2][1] Documentation/memory-barriers.txt around line 1533[2]Using RCU hlist_nulls to protect list and objects around line 146

hlist_nulls_for_each_entry_safe¶

hlist_nulls_for_each_entry_safe(tpos,pos,head,member)

iterate over list of given type safe against removal of list entry

Parameters

tpos: the type * to use as a loop cursor.
pos: thestructhlist_nulls_node to use as a loop cursor.
head: the head of the list.
member: the name of the hlist_nulls_node within the struct.

boolrcu_sync_is_idle(structrcu_sync*rsp)¶: Are readers permitted to use their fastpaths?

Parameters

structrcu_sync*rsp: Pointer to rcu_sync structure to use for synchronization

Description

Returns true if readers are permitted to use their fastpaths. Must beinvoked within some flavor of RCU read-side critical section.

voidrcu_sync_init(structrcu_sync*rsp)¶: Initialize an rcu_sync structure

Parameters

structrcu_sync*rsp: Pointer to rcu_sync structure to be initialized

voidrcu_sync_func(structrcu_head*rhp)¶: Callback function managing reader access to fastpath

Parameters

structrcu_head*rhp: Pointer to rcu_head in rcu_sync structure to use for synchronization

Description

This function is passed tocall_rcu() function byrcu_sync_enter() andrcu_sync_exit(), so that it is invoked after a grace period following thethat invocation of enter/exit.

If it is called byrcu_sync_enter() it signals that all the readers wereswitched onto slow path.

If it is called byrcu_sync_exit() it takes action based on events thathave taken place in the meantime, so that closely spacedrcu_sync_enter()andrcu_sync_exit() pairs need not wait for a grace period.

If anotherrcu_sync_enter() is invoked before the grace periodended, reset state to allow the nextrcu_sync_exit() to let thereaders back onto their fastpaths (after a grace period). If bothanotherrcu_sync_enter() and its matchingrcu_sync_exit() are invokedbefore the grace period ended, re-invokecall_rcu() on behalf of thatrcu_sync_exit(). Otherwise, set all state back to idle so that readerscan again use their fastpaths.

voidrcu_sync_enter(structrcu_sync*rsp)¶: Force readers onto slowpath

Parameters

structrcu_sync*rsp: Pointer to rcu_sync structure to use for synchronization

Description

This function is used by updaters who need readers to make use ofa slowpath during the update. After this function returns, allsubsequent calls torcu_sync_is_idle() will return false, whichtells readers to stay off their fastpaths. A later call torcu_sync_exit() re-enables reader fastpaths.

When called in isolation,rcu_sync_enter() must wait for a graceperiod, however, closely spaced calls torcu_sync_enter() canoptimize away the grace-period wait via a state machine implementedbyrcu_sync_enter(),rcu_sync_exit(), andrcu_sync_func().

voidrcu_sync_exit(structrcu_sync*rsp)¶: Allow readers back onto fast path after grace period

Parameters

structrcu_sync*rsp: Pointer to rcu_sync structure to use for synchronization

Description

This function is used by updaters who have completed, and can thereforenow allow readers to make use of their fastpaths after a grace periodhas elapsed. After this grace period has completed, all subsequentcalls torcu_sync_is_idle() will return true, which tells readers thatthey can once again use their fastpaths.

voidrcu_sync_dtor(structrcu_sync*rsp)¶: Clean up an rcu_sync structure

Parameters

structrcu_sync*rsp: Pointer to rcu_sync structure to be cleaned up

structrcu_tasks_percpu¶: Per-CPU component of definition for a Tasks-RCU-like mechanism.

Definition:

struct rcu_tasks_percpu {    struct rcu_segcblist cblist;    raw_spinlock_t lock;    unsigned long rtp_jiffies;    unsigned long rtp_n_lock_retries;    struct timer_list lazy_timer;    unsigned int urgent_gp;    struct work_struct rtp_work;    struct irq_work rtp_irq_work;    struct rcu_head barrier_q_head;    struct list_head rtp_blkd_tasks;    struct list_head rtp_exit_list;    int cpu;    int index;    struct rcu_tasks *rtpp;};

Members

cblist: Callback list.
lock: Lock protecting per-CPU callback list.
rtp_jiffies: Jiffies counter value for statistics.
rtp_n_lock_retries: Rough lock-contention statistic.
lazy_timer: Timer to unlazify callbacks.
urgent_gp: Number of additional non-lazy grace periods.
rtp_work: Work queue for invoking callbacks.
rtp_irq_work: IRQ work queue for deferred wakeups.
barrier_q_head: RCU callback for barrier operation.
rtp_blkd_tasks: List of tasks blocked as readers.
rtp_exit_list: List of tasks in the latter portion ofdo_exit().
cpu: CPU number corresponding to this entry.
index: Index of this CPU in rtpcp_array of the rcu_tasks structure.
rtpp: Pointer to the rcu_tasks structure.

structrcu_tasks¶: Definition for a Tasks-RCU-like mechanism.

Definition:

struct rcu_tasks {    struct rcuwait cbs_wait;    raw_spinlock_t cbs_gbl_lock;    struct mutex tasks_gp_mutex;    int gp_state;    int gp_sleep;    int init_fract;    unsigned long gp_jiffies;    unsigned long gp_start;    unsigned long tasks_gp_seq;    unsigned long n_ipis;    unsigned long n_ipis_fails;    struct task_struct *kthread_ptr;    unsigned long lazy_jiffies;    rcu_tasks_gp_func_t gp_func;    pregp_func_t pregp_func;    pertask_func_t pertask_func;    postscan_func_t postscan_func;    holdouts_func_t holdouts_func;    postgp_func_t postgp_func;    call_rcu_func_t call_func;    unsigned int wait_state;    struct rcu_tasks_percpu __percpu *rtpcpu;    struct rcu_tasks_percpu **rtpcp_array;    int percpu_enqueue_shift;    int percpu_enqueue_lim;    int percpu_dequeue_lim;    unsigned long percpu_dequeue_gpseq;    struct mutex barrier_q_mutex;    atomic_t barrier_q_count;    struct completion barrier_q_completion;    unsigned long barrier_q_seq;    unsigned long barrier_q_start;    char *name;    char *kname;};

Members

cbs_wait: RCU wait allowing a new callback to get kthread’s attention.
cbs_gbl_lock: Lock protecting callback list.
tasks_gp_mutex: Mutex protecting grace period, needed during mid-boot dead zone.
gp_state: Grace period’s most recent state transition (debugging).
gp_sleep: Per-grace-period sleep to prevent CPU-bound looping.
init_fract: Initial backoff sleep interval.
gp_jiffies: Time of lastgp_state transition.
gp_start: Most recent grace-period start in jiffies.
tasks_gp_seq: Number of grace periods completed since boot in upper bits.
n_ipis: Number of IPIs sent to encourage grace periods to end.
n_ipis_fails: Number of IPI-send failures.
kthread_ptr: This flavor’s grace-period/callback-invocation kthread.
lazy_jiffies: Number of jiffies to allow callbacks to be lazy.
gp_func: This flavor’s grace-period-wait function.
pregp_func: This flavor’s pre-grace-period function (optional).
pertask_func: This flavor’s per-task scan function (optional).
postscan_func: This flavor’s post-task scan function (optional).
holdouts_func: This flavor’s holdout-list scan function (optional).
postgp_func: This flavor’s post-grace-period function (optional).
call_func: This flavor’scall_rcu()-equivalent function.
wait_state: Task state for synchronous grace-period waits (default TASK_UNINTERRUPTIBLE).
rtpcpu: This flavor’s rcu_tasks_percpu structure.
rtpcp_array: Array of pointers to rcu_tasks_percpu structure of CPUs in cpu_possible_mask.
percpu_enqueue_shift: Shift down CPU ID this much when enqueuing callbacks.
percpu_enqueue_lim: Number of per-CPU callback queues in use for enqueuing.
percpu_dequeue_lim: Number of per-CPU callback queues in use for dequeuing.
percpu_dequeue_gpseq: RCU grace-period number to propagate enqueue limit to dequeuers.
barrier_q_mutex: Serialize barrier operations.
barrier_q_count: Number of queues being waited on.
barrier_q_completion: Barrier wait/wakeup mechanism.
barrier_q_seq: Sequence number for barrier operations.
barrier_q_start: Most recent barrier start in jiffies.
name: This flavor’s textual name.
kname: This flavor’s kthread name.

voidcall_rcu_tasks(structrcu_head*rhp,rcu_callback_tfunc)¶: Queue an RCU for invocation task-based grace period

Parameters

structrcu_head*rhp: structure to be used for queueing the RCU updates.
rcu_callback_tfunc: actual callback function to be invoked after the grace period

Description

The callback function will be invoked some time after a full graceperiod elapses, in other words after all currently executing RCUread-side critical sections have completed.call_rcu_tasks() assumesthat the read-side critical sections end at a voluntary contextswitch (not a preemption!),cond_resched_tasks_rcu_qs(), entry into idle,or transition to usermode execution. As such, there are no read-sideprimitives analogous torcu_read_lock() andrcu_read_unlock() becausethis primitive is intended to determine that all tasks have passedthrough a safe state, not so much for data-structure synchronization.

See the description ofcall_rcu() for more detailed information onmemory ordering guarantees.

voidsynchronize_rcu_tasks(void)¶: wait until an rcu-tasks grace period has elapsed.

Parameters

void: no arguments

Description

Control will return to the caller some time after a full rcu-tasksgrace period has elapsed, in other words after all currentlyexecuting rcu-tasks read-side critical sections have elapsed. Theseread-side critical sections are delimited by calls toschedule(),cond_resched_tasks_rcu_qs(), idle execution, userspace execution, callstosynchronize_rcu_tasks(), and (in theory, anyway)cond_resched().

This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of functionpreambles and profiling hooks. Thesynchronize_rcu_tasks() functionis not (yet) intended for heavy use from multiple CPUs.

See the description ofsynchronize_rcu() for more detailed informationon memory ordering guarantees.

voidrcu_barrier_tasks(void)¶: Wait for in-flightcall_rcu_tasks() callbacks.

Parameters

void: no arguments

Description

Although the current implementation is guaranteed to wait, it is notobligated to, for example, if there are no pending callbacks.

voidsynchronize_rcu_tasks_rude(void)¶: wait for a rude rcu-tasks grace period

Parameters

void: no arguments

Description

Control will return to the caller some time after a rude rcu-tasksgrace period has elapsed, in other words after all currentlyexecuting rcu-tasks read-side critical sections have elapsed. Theseread-side critical sections are delimited by calls toschedule(),cond_resched_tasks_rcu_qs(), userspace execution (which is a schedulablecontext), and (in theory, anyway)cond_resched().

This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of function preamblesand profiling hooks. Thesynchronize_rcu_tasks_rude() function is not(yet) intended for heavy use from multiple CPUs.

See the description ofsynchronize_rcu() for more detailed informationon memory ordering guarantees.

voidcall_rcu_tasks_trace(structrcu_head*rhp,rcu_callback_tfunc)¶: Queue a callback trace task-based grace period

Parameters

structrcu_head*rhp: structure to be used for queueing the RCU updates.
rcu_callback_tfunc: actual callback function to be invoked after the grace period

Description

The callback function will be invoked some time after a trace rcu-tasksgrace period elapses, in other words after all currently executingtrace rcu-tasks read-side critical sections have completed. Theseread-side critical sections are delimited by calls torcu_read_lock_trace()andrcu_read_unlock_trace().

See the description ofcall_rcu() for more detailed information onmemory ordering guarantees.

voidsynchronize_rcu_tasks_trace(void)¶: wait for a trace rcu-tasks grace period

Parameters

void: no arguments

Description

Control will return to the caller some time after a trace rcu-tasksgrace period has elapsed, in other words after all currently executingtrace rcu-tasks read-side critical sections have elapsed. These read-sidecritical sections are delimited by calls torcu_read_lock_trace()andrcu_read_unlock_trace().

This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of function preamblesand profiling hooks. Thesynchronize_rcu_tasks_trace() function is not(yet) intended for heavy use from multiple CPUs.

See the description ofsynchronize_rcu() for more detailed informationon memory ordering guarantees.

voidrcu_barrier_tasks_trace(void)¶: Wait for in-flightcall_rcu_tasks_trace() callbacks.

Parameters

void: no arguments

Description

Although the current implementation is guaranteed to wait, it is notobligated to, for example, if there are no pending callbacks.

voidrcu_cpu_stall_reset(void)¶: restart stall-warning timeout for current grace period

Parameters

void: no arguments

Description

To perform the reset request from the caller, disable stall detection until3 fqs loops have passed. This is required to ensure a fresh jiffies isloaded. It should be safe to do from the fqs loop as enough timerinterrupts and context switches should have passed.

The caller must disable hard irqs.

intrcu_stall_chain_notifier_register(structnotifier_block*n)¶: Add an RCU CPU stall notifier

Parameters

structnotifier_block*n: Entry to add.

Description

Adds an RCU CPU stall notifier to an atomic notifier chain.Theaction passed to a notifier will beRCU_STALL_NOTIFY_NORM orfriends. Thedata will be the duration of the stalled grace period,in jiffies, coerced to a void* pointer.

Returns 0 on success,-EEXIST on error.

intrcu_stall_chain_notifier_unregister(structnotifier_block*n)¶: Remove an RCU CPU stall notifier

Parameters

structnotifier_block*n: Entry to add.

Description

Removes an RCU CPU stall notifier from an atomic notifier chain.

Returns zero on success,-ENOENT on failure.

voidrcu_read_lock_trace(void)¶: mark beginning of RCU-trace read-side critical section

Parameters

void: no arguments

Description

Whensynchronize_rcu_tasks_trace() is invoked by one task, then thattask is guaranteed to block until all other tasks exit their read-sidecritical sections. Similarly, ifcall_rcu_trace() is invoked on onetask while other tasks are within RCU read-side critical sections,invocation of the corresponding RCU callback is deferred until afterthe all the other tasks exit their critical sections.

For more details, please see the documentation forrcu_read_lock().

voidrcu_read_unlock_trace(void)¶: mark end of RCU-trace read-side critical section

Parameters

void: no arguments

Description

Pairs with a preceding call torcu_read_lock_trace(), and nesting isallowed. Invoking arcu_read_unlock_trace() when there is no matchingrcu_read_lock_trace() is verboten, and will result in lockdep complaints.

For more details, please see the documentation forrcu_read_unlock().

synchronize_rcu_mult¶

synchronize_rcu_mult(...)

Wait concurrently for multiple grace periods

Parameters

...: List ofcall_rcu() functions for different grace periods to wait on

Description

This macro waits concurrently for multiple types of RCU grace periods.For example, synchronize_rcu_mult(call_rcu, call_rcu_tasks) would waiton concurrent RCU and RCU-tasks grace periods. Waiting on a given SRCUdomain requires you to write a wrapper function for that SRCU domain’scall_srcu() function, with this wrapper supplying the pointer to thecorresponding srcu_struct.

Note thatcall_rcu_hurry() should be used instead ofcall_rcu()because in kernels built with CONFIG_RCU_LAZY=y the delay between theinvocation ofcall_rcu() and that of the corresponding RCU callbackcan be multiple seconds.

The first argument tells Tiny RCU’s_wait_rcu_gp() not tobother waiting for RCU. The reason for this is because anywheresynchronize_rcu_mult() can be called is automatically already a fullgrace period.

voidrcuref_init(rcuref_t*ref,unsignedintcnt)¶: Initialize a rcuref reference count with the given reference count

Parameters

rcuref_t*ref: Pointer to the reference count
unsignedintcnt: The initial reference count typically ‘1’

unsignedintrcuref_read(rcuref_t*ref)¶: Read the number of held reference counts of a rcuref

Parameters

rcuref_t*ref: Pointer to the reference count

Return

The number of held references (0 ... N). The value 0 does notindicate that it is safe to schedule the object, protected by this referencecounter, for deconstruction.If you want to know if the reference counter has been marked DEAD (assignaled byrcuref_put()) please usercuread_is_dead().

boolrcuref_is_dead(rcuref_t*ref)¶: Check if the rcuref has been already marked dead

Parameters

rcuref_t*ref: Pointer to the reference count

Return

True if the object has been marked DEAD. This signals that a previousinvocation ofrcuref_put() returned true on this reference counter meaningthe protected object can safely be scheduled for deconstruction.Otherwise, returns false.

boolrcuref_get(rcuref_t*ref)¶: Acquire one reference on a rcuref reference count

Parameters

rcuref_t*ref: Pointer to the reference count

Description

Similar toatomic_inc_not_zero() but saturates at RCUREF_MAXREF.

Provides no memory ordering, it is assumed the caller has guaranteed theobject memory to be stable (RCU, etc.). It does provide a control dependencyand thereby orders future stores. See documentation in lib/rcuref.c

True if a reference was successfully acquired

Return

False if the attempt to acquire a reference failed. This happenswhen the last reference has been put already

boolrcuref_put_rcusafe(rcuref_t*ref)¶

Release one reference for a rcuref reference count RCU safe

Parameters

rcuref_t*ref: Pointer to the reference count

Description

Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such thatfree()must come after.

Can be invoked from contexts, which guarantee that no grace period canhappen which would free the object concurrently if the decrement dropsthe last reference and the slowpath races against a concurrentget() andput() pair.rcu_read_lock()’ed and atomic contexts qualify.

False if there are still active references or theput() racedwith a concurrentget()/put() pair. Caller is not allowed torelease the protected object.

Return

True if this was the last reference with no future referencespossible. This signals the caller that it can safely release theobject which is protected by the reference counter.

boolrcuref_put(rcuref_t*ref)¶

Release one reference for a rcuref reference count

Parameters

rcuref_t*ref: Pointer to the reference count

Description

Can be invoked from any context.

Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such thatfree()must come after.

True if this was the last reference with no future referencespossible. This signals the caller that it can safely schedule theobject, which is protected by the reference counter, fordeconstruction.
False if there are still active references or theput() racedwith a concurrentget()/put() pair. Caller is not allowed todeconstruct the protected object.

boolsame_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp1,structrcu_gp_oldstate*rgosp2)¶: Are two old-state values identical?

Parameters

structrcu_gp_oldstate*rgosp1: First old-state value.
structrcu_gp_oldstate*rgosp2: Second old-state value.

Description

The two old-state values must have been obtained from eitherget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(),orget_completed_synchronize_rcu_full(). Returnstrue if the twovalues are identical andfalse otherwise. This allows structureswhose lifetimes are tracked by old-state values to push these valuesto a list header, allowing those structures to be slightly smaller.

Note that equality is judged on a bitwise basis, so that anrcu_gp_oldstate structure with an already-completed state in one fieldwill compare not-equal to a structure with an already-completed statein the other field. After all, thercu_gp_oldstate structure is opaqueso how did such a situation come to pass in the first place?

orig	tmp	dst
0	0	40
1	1	41
9	9	95
10	0	40[1]
1 3 5 7	1 3 5 7	41 43 48 61
0 1 2 3 4	0 1 2 3 4	40 41 42 43 45
0 9 18 27	0 9 8 7	40 61 74 95
0 10 20 30	0	40
0 11 22 33	0 1 2 3	40 41 42 43
0 12 24 36	0 2 4 6	40 42 45 53
78 102 211	1 2 8	41 42 74[1]

Movatterモバイル変換

The Linux Kernel API¶

Basic C Library Functions¶

String Conversions¶

String Manipulation¶

Basic Kernel Library Functions¶

Bit Operations¶

Bitmap Operations¶

Command-line Parsing¶

Error Pointers¶

Sorting¶

Text Searching¶

CRC and Math Functions in Linux¶

Arithmetic Overflow Checking¶

CRC Functions¶

Base 2 log and power Functions¶

Integer log and power Functions¶

Division Functions¶

UUID/GUID¶

Kernel IPC facilities¶

IPC utilities¶

FIFO Buffer¶

kfifo interface¶

relay interface support¶

relay interface¶

Module Support¶

Kernel module auto-loading¶

Module debugging¶

dup_failed_modules - tracks duplicate failed modules¶

module statistics debugfs counters¶

Inter Module support¶

Hardware Interfaces¶

DMA Channels¶

Resources Management¶

MTRR Handling¶

Security Framework¶

Audit Interfaces¶

Accounting Framework¶

Block Devices¶

Char devices¶

Clock Framework¶

Synchronization Primitives¶

Read-Copy Update (RCU)¶