The Linux Kernel API¶
List Management Functions¶
- voidINIT_LIST_HEAD(structlist_head*list)¶
Initialize a list_head structure
Parameters
structlist_head*list
list_head structure to be initialized.
Description
Initializes the list_head to point to itself. If it is a list header,the result is an empty list.
- voidlist_add(structlist_head*new,structlist_head*head)¶
add a new entry
Parameters
structlist_head*new
new entry to be added
structlist_head*head
list head to add it after
Description
Insert a new entry after the specified head.This is good for implementing stacks.
- voidlist_add_tail(structlist_head*new,structlist_head*head)¶
add a new entry
Parameters
structlist_head*new
new entry to be added
structlist_head*head
list head to add it before
Description
Insert a new entry before the specified head.This is useful for implementing queues.
- voidlist_del(structlist_head*entry)¶
deletes entry from list.
Parameters
structlist_head*entry
the element to delete from the list.
Note
list_empty()
on entry does not return true after this, the entry isin an undefined state.
- voidlist_replace(structlist_head*old,structlist_head*new)¶
replace old entry by new one
Parameters
structlist_head*old
the element to be replaced
structlist_head*new
the new element to insert
Description
Ifold was empty, it will be overwritten.
- voidlist_replace_init(structlist_head*old,structlist_head*new)¶
replace old entry by new one and initialize the old one
Parameters
structlist_head*old
the element to be replaced
structlist_head*new
the new element to insert
Description
Ifold was empty, it will be overwritten.
- voidlist_swap(structlist_head*entry1,structlist_head*entry2)¶
replace entry1 with entry2 and re-add entry1 at entry2’s position
Parameters
structlist_head*entry1
the location to place entry2
structlist_head*entry2
the location to place entry1
- voidlist_del_init(structlist_head*entry)¶
deletes entry from list and reinitialize it.
Parameters
structlist_head*entry
the element to delete from the list.
- voidlist_move(structlist_head*list,structlist_head*head)¶
delete from one list and add as another’s head
Parameters
structlist_head*list
the entry to move
structlist_head*head
the head that will precede our entry
- voidlist_move_tail(structlist_head*list,structlist_head*head)¶
delete from one list and add as another’s tail
Parameters
structlist_head*list
the entry to move
structlist_head*head
the head that will follow our entry
- voidlist_bulk_move_tail(structlist_head*head,structlist_head*first,structlist_head*last)¶
move a subsection of a list to its tail
Parameters
structlist_head*head
the head that will follow our entry
structlist_head*first
first entry to move
structlist_head*last
last entry to move, can be the same as first
Description
Move all entries betweenfirst and includinglast beforehead.All three entries must belong to the same linked list.
- intlist_is_first(conststructlist_head*list,conststructlist_head*head)¶
tests whetherlist is the first entry in listhead
Parameters
conststructlist_head*list
the entry to test
conststructlist_head*head
the head of the list
- intlist_is_last(conststructlist_head*list,conststructlist_head*head)¶
tests whetherlist is the last entry in listhead
Parameters
conststructlist_head*list
the entry to test
conststructlist_head*head
the head of the list
- intlist_is_head(conststructlist_head*list,conststructlist_head*head)¶
tests whetherlist is the listhead
Parameters
conststructlist_head*list
the entry to test
conststructlist_head*head
the head of the list
- intlist_empty(conststructlist_head*head)¶
tests whether a list is empty
Parameters
conststructlist_head*head
the list to test.
- voidlist_del_init_careful(structlist_head*entry)¶
deletes entry from list and reinitialize it.
Parameters
structlist_head*entry
the element to delete from the list.
Description
This is the same aslist_del_init()
, except designed to be usedtogether withlist_empty_careful()
in a way to guarantee orderingof other memory operations.
Any memory operations done before alist_del_init_careful()
areguaranteed to be visible after alist_empty_careful()
test.
- intlist_empty_careful(conststructlist_head*head)¶
tests whether a list is empty and not being modified
Parameters
conststructlist_head*head
the list to test
Description
tests whether a list is empty _and_ checks that no other CPU might bein the process of modifying either member (next or prev)
NOTE
usinglist_empty_careful()
without synchronizationcan only be safe if the only activity that can happento the list entry islist_del_init()
. Eg. it cannot be usedif another CPU could re-list_add()
it.
- voidlist_rotate_left(structlist_head*head)¶
rotate the list to the left
Parameters
structlist_head*head
the head of the list
- voidlist_rotate_to_front(structlist_head*list,structlist_head*head)¶
Rotate list to specific item.
Parameters
structlist_head*list
The desired new front of the list.
structlist_head*head
The head of the list.
Description
Rotates list so thatlist becomes the new front of the list.
- intlist_is_singular(conststructlist_head*head)¶
tests whether a list has just one entry.
Parameters
conststructlist_head*head
the list to test.
- voidlist_cut_position(structlist_head*list,structlist_head*head,structlist_head*entry)¶
cut a list into two
Parameters
structlist_head*list
a new list to add all removed entries
structlist_head*head
a list with entries
structlist_head*entry
an entry within head, could be the head itselfand if so we won’t cut the list
Description
This helper moves the initial part ofhead, up to andincludingentry, fromhead tolist. You shouldpass onentry an element you know is onhead.listshould be an empty list or a list you do not care aboutlosing its data.
- voidlist_cut_before(structlist_head*list,structlist_head*head,structlist_head*entry)¶
cut a list into two, before given entry
Parameters
structlist_head*list
a new list to add all removed entries
structlist_head*head
a list with entries
structlist_head*entry
an entry within head, could be the head itself
Description
This helper moves the initial part ofhead, up to butexcludingentry, fromhead tolist. You should passinentry an element you know is onhead.list shouldbe an empty list or a list you do not care about losingits data.Ifentry ==head, all entries onhead are moved tolist.
- voidlist_splice(conststructlist_head*list,structlist_head*head)¶
join two lists, this is designed for stacks
Parameters
conststructlist_head*list
the new list to add.
structlist_head*head
the place to add it in the first list.
- voidlist_splice_tail(structlist_head*list,structlist_head*head)¶
join two lists, each list being a queue
Parameters
structlist_head*list
the new list to add.
structlist_head*head
the place to add it in the first list.
- voidlist_splice_init(structlist_head*list,structlist_head*head)¶
join two lists and reinitialise the emptied list.
Parameters
structlist_head*list
the new list to add.
structlist_head*head
the place to add it in the first list.
Description
The list atlist is reinitialised
- voidlist_splice_tail_init(structlist_head*list,structlist_head*head)¶
join two lists and reinitialise the emptied list
Parameters
structlist_head*list
the new list to add.
structlist_head*head
the place to add it in the first list.
Description
Each of the lists is a queue.The list atlist is reinitialised
- list_entry¶
list_entry(ptr,type,member)
get the struct for this entry
Parameters
ptr
the
structlist_head
pointer.type
the type of the struct this is embedded in.
member
the name of the list_head within the struct.
- list_first_entry¶
list_first_entry(ptr,type,member)
get the first element from a list
Parameters
ptr
the list head to take the element from.
type
the type of the struct this is embedded in.
member
the name of the list_head within the struct.
Description
Note, that list is expected to be not empty.
- list_last_entry¶
list_last_entry(ptr,type,member)
get the last element from a list
Parameters
ptr
the list head to take the element from.
type
the type of the struct this is embedded in.
member
the name of the list_head within the struct.
Description
Note, that list is expected to be not empty.
- list_first_entry_or_null¶
list_first_entry_or_null(ptr,type,member)
get the first element from a list
Parameters
ptr
the list head to take the element from.
type
the type of the struct this is embedded in.
member
the name of the list_head within the struct.
Description
Note that if the list is empty, it returns NULL.
- list_next_entry¶
list_next_entry(pos,member)
get the next element in list
Parameters
pos
the type * to cursor
member
the name of the list_head within the struct.
- list_next_entry_circular¶
list_next_entry_circular(pos,head,member)
get the next element in list
Parameters
pos
the type * to cursor.
head
the list head to take the element from.
member
the name of the list_head within the struct.
Description
Wraparound if pos is the last element (return the first element).Note, that list is expected to be not empty.
- list_prev_entry¶
list_prev_entry(pos,member)
get the prev element in list
Parameters
pos
the type * to cursor
member
the name of the list_head within the struct.
- list_prev_entry_circular¶
list_prev_entry_circular(pos,head,member)
get the prev element in list
Parameters
pos
the type * to cursor.
head
the list head to take the element from.
member
the name of the list_head within the struct.
Description
Wraparound if pos is the first element (return the last element).Note, that list is expected to be not empty.
- list_for_each¶
list_for_each(pos,head)
iterate over a list
Parameters
pos
the
structlist_head
to use as a loop cursor.head
the head for your list.
- list_for_each_rcu¶
list_for_each_rcu(pos,head)
Iterate over a list in an RCU-safe fashion
Parameters
pos
the
structlist_head
to use as a loop cursor.head
the head for your list.
- list_for_each_continue¶
list_for_each_continue(pos,head)
continue iteration over a list
Parameters
pos
the
structlist_head
to use as a loop cursor.head
the head for your list.
Description
Continue to iterate over a list, continuing after the current position.
- list_for_each_prev¶
list_for_each_prev(pos,head)
iterate over a list backwards
Parameters
pos
the
structlist_head
to use as a loop cursor.head
the head for your list.
- list_for_each_safe¶
list_for_each_safe(pos,n,head)
iterate over a list safe against removal of list entry
Parameters
pos
the
structlist_head
to use as a loop cursor.n
another
structlist_head
to use as temporary storagehead
the head for your list.
- list_for_each_prev_safe¶
list_for_each_prev_safe(pos,n,head)
iterate over a list backwards safe against removal of list entry
Parameters
pos
the
structlist_head
to use as a loop cursor.n
another
structlist_head
to use as temporary storagehead
the head for your list.
- size_tlist_count_nodes(structlist_head*head)¶
count nodes in the list
Parameters
structlist_head*head
the head for your list.
- list_entry_is_head¶
list_entry_is_head(pos,head,member)
test if the entry points to the head of the list
Parameters
pos
the type * to cursor
head
the head for your list.
member
the name of the list_head within the struct.
- list_for_each_entry¶
list_for_each_entry(pos,head,member)
iterate over list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_head within the struct.
- list_for_each_entry_reverse¶
list_for_each_entry_reverse(pos,head,member)
iterate backwards over list of given type.
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_head within the struct.
- list_prepare_entry¶
list_prepare_entry(pos,head,member)
prepare a pos entry for use in
list_for_each_entry_continue()
Parameters
pos
the type * to use as a start point
head
the head of the list
member
the name of the list_head within the struct.
Description
Prepares a pos entry for use as a start point inlist_for_each_entry_continue()
.
- list_for_each_entry_continue¶
list_for_each_entry_continue(pos,head,member)
continue iteration over list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_head within the struct.
Description
Continue to iterate over list of given type, continuing afterthe current position.
- list_for_each_entry_continue_reverse¶
list_for_each_entry_continue_reverse(pos,head,member)
iterate backwards from the given point
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_head within the struct.
Description
Start to iterate over list of given type backwards, continuing afterthe current position.
- list_for_each_entry_from¶
list_for_each_entry_from(pos,head,member)
iterate over list of given type from the current point
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_head within the struct.
Description
Iterate over list of given type, continuing from current position.
- list_for_each_entry_from_reverse¶
list_for_each_entry_from_reverse(pos,head,member)
iterate backwards over list of given type from the current point
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_head within the struct.
Description
Iterate backwards over list of given type, continuing from current position.
- list_for_each_entry_safe¶
list_for_each_entry_safe(pos,n,head,member)
iterate over list of given type safe against removal of list entry
Parameters
pos
the type * to use as a loop cursor.
n
another type * to use as temporary storage
head
the head for your list.
member
the name of the list_head within the struct.
- list_for_each_entry_safe_continue¶
list_for_each_entry_safe_continue(pos,n,head,member)
continue list iteration safe against removal
Parameters
pos
the type * to use as a loop cursor.
n
another type * to use as temporary storage
head
the head for your list.
member
the name of the list_head within the struct.
Description
Iterate over list of given type, continuing after current point,safe against removal of list entry.
- list_for_each_entry_safe_from¶
list_for_each_entry_safe_from(pos,n,head,member)
iterate over list from current point safe against removal
Parameters
pos
the type * to use as a loop cursor.
n
another type * to use as temporary storage
head
the head for your list.
member
the name of the list_head within the struct.
Description
Iterate over list of given type from current point, safe againstremoval of list entry.
- list_for_each_entry_safe_reverse¶
list_for_each_entry_safe_reverse(pos,n,head,member)
iterate backwards over list safe against removal
Parameters
pos
the type * to use as a loop cursor.
n
another type * to use as temporary storage
head
the head for your list.
member
the name of the list_head within the struct.
Description
Iterate backwards over list of given type, safe against removalof list entry.
- list_safe_reset_next¶
list_safe_reset_next(pos,n,member)
reset a stale list_for_each_entry_safe loop
Parameters
pos
the loop cursor used in the list_for_each_entry_safe loop
n
temporary storage used in list_for_each_entry_safe
member
the name of the list_head within the struct.
Description
list_safe_reset_next is not safe to use in general if the list may bemodified concurrently (eg. the lock is dropped in the loop body). Anexception to this is if the cursor element (pos) is pinned in the list,and list_safe_reset_next is called after re-taking the lock and beforecompleting the current iteration of the loop body.
- inthlist_unhashed(conststructhlist_node*h)¶
Has node been removed from list and reinitialized?
Parameters
conststructhlist_node*h
Node to be checked
Description
Not that not all removal functions will leave a node in unhashedstate. For example,hlist_nulls_del_init_rcu()
does leave thenode in unhashed state, but hlist_nulls_del() does not.
- inthlist_unhashed_lockless(conststructhlist_node*h)¶
Version of hlist_unhashed for lockless use
Parameters
conststructhlist_node*h
Node to be checked
Description
This variant ofhlist_unhashed()
must be used in lockless contextsto avoid potential load-tearing. The READ_ONCE() is paired with thevarious WRITE_ONCE() in hlist helpers that are defined below.
- inthlist_empty(conststructhlist_head*h)¶
Is the specified hlist_head structure an empty hlist?
Parameters
conststructhlist_head*h
Structure to check.
- voidhlist_del(structhlist_node*n)¶
Delete the specified hlist_node from its list
Parameters
structhlist_node*n
Node to delete.
Description
Note that this function leaves the node in hashed state. Usehlist_del_init()
or similar instead to unhashn.
- voidhlist_del_init(structhlist_node*n)¶
Delete the specified hlist_node from its list and initialize
Parameters
structhlist_node*n
Node to delete.
Description
Note that this function leaves the node in unhashed state.
- voidhlist_add_head(structhlist_node*n,structhlist_head*h)¶
add a new entry at the beginning of the hlist
Parameters
structhlist_node*n
new entry to be added
structhlist_head*h
hlist head to add it after
Description
Insert a new entry after the specified head.This is good for implementing stacks.
- voidhlist_add_before(structhlist_node*n,structhlist_node*next)¶
add a new entry before the one specified
Parameters
structhlist_node*n
new entry to be added
structhlist_node*next
hlist node to add it before, which must be non-NULL
- voidhlist_add_behind(structhlist_node*n,structhlist_node*prev)¶
add a new entry after the one specified
Parameters
structhlist_node*n
new entry to be added
structhlist_node*prev
hlist node to add it after, which must be non-NULL
- voidhlist_add_fake(structhlist_node*n)¶
create a fake hlist consisting of a single headless node
Parameters
structhlist_node*n
Node to make a fake list out of
Description
This makesn appear to be its own predecessor on a headless hlist.The point of this is to allow things likehlist_del()
to work correctlyin cases where there is no list.
- boolhlist_fake(structhlist_node*h)¶
Is this node a fake hlist?
Parameters
structhlist_node*h
Node to check for being a self-referential fake hlist.
- boolhlist_is_singular_node(structhlist_node*n,structhlist_head*h)¶
is node the only element of the specified hlist?
Parameters
structhlist_node*n
Node to check for singularity.
structhlist_head*h
Header for potentially singular list.
Description
Check whether the node is the only node of the head withoutaccessing head, thus avoiding unnecessary cache misses.
- voidhlist_move_list(structhlist_head*old,structhlist_head*new)¶
Move an hlist
Parameters
structhlist_head*old
hlist_head for old list.
structhlist_head*new
hlist_head for new list.
Description
Move a list from one list head to another. Fixup the pprevreference of the first entry if it exists.
- voidhlist_splice_init(structhlist_head*from,structhlist_node*last,structhlist_head*to)¶
move all entries from one list to another
Parameters
structhlist_head*from
hlist_head from which entries will be moved
structhlist_node*last
last entry on thefrom list
structhlist_head*to
hlist_head to which entries will be moved
Description
to can be empty,from must contain at leastlast.
- hlist_for_each_entry¶
hlist_for_each_entry(pos,head,member)
iterate over list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the hlist_node within the struct.
- hlist_for_each_entry_continue¶
hlist_for_each_entry_continue(pos,member)
iterate over a hlist continuing after current point
Parameters
pos
the type * to use as a loop cursor.
member
the name of the hlist_node within the struct.
- hlist_for_each_entry_from¶
hlist_for_each_entry_from(pos,member)
iterate over a hlist continuing from current point
Parameters
pos
the type * to use as a loop cursor.
member
the name of the hlist_node within the struct.
- hlist_for_each_entry_safe¶
hlist_for_each_entry_safe(pos,n,head,member)
iterate over list of given type safe against removal of list entry
Parameters
pos
the type * to use as a loop cursor.
n
a
structhlist_node
to use as temporary storagehead
the head for your list.
member
the name of the hlist_node within the struct.
- size_thlist_count_nodes(structhlist_head*head)¶
count nodes in the hlist
Parameters
structhlist_head*head
the head for your hlist.
Basic C Library Functions¶
When writing drivers, you cannot in general use routines which are fromthe C Library. Some of the functions have been found generally usefuland they are listed below. The behaviour of these functions may varyslightly from those defined by ANSI, and these deviations are noted inthe text.
String Conversions¶
- unsignedlonglongsimple_strtoull(constchar*cp,char**endp,unsignedintbase)¶
convert a string to an unsigned long long
Parameters
constchar*cp
The start of the string
char**endp
A pointer to the end of the parsed string will be placed here
unsignedintbase
The number base to use
Description
This function has caveats. Please use kstrtoull instead.
- unsignedlongsimple_strtoul(constchar*cp,char**endp,unsignedintbase)¶
convert a string to an unsigned long
Parameters
constchar*cp
The start of the string
char**endp
A pointer to the end of the parsed string will be placed here
unsignedintbase
The number base to use
Description
This function has caveats. Please use kstrtoul instead.
- longsimple_strtol(constchar*cp,char**endp,unsignedintbase)¶
convert a string to a signed long
Parameters
constchar*cp
The start of the string
char**endp
A pointer to the end of the parsed string will be placed here
unsignedintbase
The number base to use
Description
This function has caveats. Please use kstrtol instead.
- longlongsimple_strtoll(constchar*cp,char**endp,unsignedintbase)¶
convert a string to a signed long long
Parameters
constchar*cp
The start of the string
char**endp
A pointer to the end of the parsed string will be placed here
unsignedintbase
The number base to use
Description
This function has caveats. Please use kstrtoll instead.
- intvsnprintf(char*buf,size_tsize,constchar*fmt_str,va_listargs)¶
Format a string and place it in a buffer
Parameters
char*buf
The buffer to place the result into
size_tsize
The size of the buffer, including the trailing null space
constchar*fmt_str
The format string to use
va_listargs
Arguments for the format string
Description
This function generally follows C99 vsnprintf, but has someextensions and a few limitations:
``n``
is unsupported
``p*``
is handled by pointer()
See pointer() orHow to get printk format specifiers right for moreextensive description.
Please update the documentation in both places when making changes
The return value is the number of characters which wouldbe generated for the given input, excluding the trailing‘0’, as per ISO C99. If you want to have the exactnumber of characters written intobuf as return value(not including the trailing ‘0’), usevscnprintf()
. If thereturn is greater than or equal tosize, the resultingstring is truncated.
If you’re not already dealing with a va_list consider usingsnprintf()
.
- intvscnprintf(char*buf,size_tsize,constchar*fmt,va_listargs)¶
Format a string and place it in a buffer
Parameters
char*buf
The buffer to place the result into
size_tsize
The size of the buffer, including the trailing null space
constchar*fmt
The format string to use
va_listargs
Arguments for the format string
Description
The return value is the number of characters which have been written intothebuf not including the trailing ‘0’. Ifsize is == 0 the functionreturns 0.
If you’re not already dealing with a va_list consider usingscnprintf()
.
See thevsnprintf()
documentation for format string extensions over C99.
- intsnprintf(char*buf,size_tsize,constchar*fmt,...)¶
Format a string and place it in a buffer
Parameters
char*buf
The buffer to place the result into
size_tsize
The size of the buffer, including the trailing null space
constchar*fmt
The format string to use
...
Arguments for the format string
Description
The return value is the number of characters which would begenerated for the given input, excluding the trailing null,as per ISO C99. If the return is greater than or equal tosize, the resulting string is truncated.
See thevsnprintf()
documentation for format string extensions over C99.
- intscnprintf(char*buf,size_tsize,constchar*fmt,...)¶
Format a string and place it in a buffer
Parameters
char*buf
The buffer to place the result into
size_tsize
The size of the buffer, including the trailing null space
constchar*fmt
The format string to use
...
Arguments for the format string
Description
The return value is the number of characters written intobuf not includingthe trailing ‘0’. Ifsize is == 0 the function returns 0.
- intvsprintf(char*buf,constchar*fmt,va_listargs)¶
Format a string and place it in a buffer
Parameters
char*buf
The buffer to place the result into
constchar*fmt
The format string to use
va_listargs
Arguments for the format string
Description
The function returns the number of characters writtenintobuf. Usevsnprintf()
orvscnprintf()
in order to avoidbuffer overflows.
If you’re not already dealing with a va_list consider usingsprintf()
.
See thevsnprintf()
documentation for format string extensions over C99.
- intsprintf(char*buf,constchar*fmt,...)¶
Format a string and place it in a buffer
Parameters
char*buf
The buffer to place the result into
constchar*fmt
The format string to use
...
Arguments for the format string
Description
The function returns the number of characters writtenintobuf. Usesnprintf()
orscnprintf()
in order to avoidbuffer overflows.
See thevsnprintf()
documentation for format string extensions over C99.
- intvbin_printf(u32*bin_buf,size_tsize,constchar*fmt_str,va_listargs)¶
Parse a format string and place args’ binary value in a buffer
Parameters
u32*bin_buf
The buffer to place args’ binary value
size_tsize
The size of the buffer(by words(32bits), not characters)
constchar*fmt_str
The format string to use
va_listargs
Arguments for the format string
Description
The format follows C99 vsnprintf, exceptn
is ignored, and its argumentis skipped.
The return value is the number of words(32bits) which would be generated forthe given input.
NOTE
If the return value is greater thansize, the resulting bin_buf is NOTvalid forbstr_printf()
.
- intbstr_printf(char*buf,size_tsize,constchar*fmt_str,constu32*bin_buf)¶
Format a string from binary arguments and place it in a buffer
Parameters
char*buf
The buffer to place the result into
size_tsize
The size of the buffer, including the trailing null space
constchar*fmt_str
The format string to use
constu32*bin_buf
Binary arguments for the format string
Description
This function like C99 vsnprintf, but the difference is that vsnprintf getsarguments from stack, and bstr_printf gets arguments frombin_buf which isa binary buffer that generated by vbin_printf.
- The format follows C99 vsnprintf, but has some extensions:
see vsnprintf comment for details.
The return value is the number of characters which wouldbe generated for the given input, excluding the trailing‘0’, as per ISO C99. If you want to have the exactnumber of characters written intobuf as return value(not including the trailing ‘0’), usevscnprintf()
. If thereturn is greater than or equal tosize, the resultingstring is truncated.
- intvsscanf(constchar*buf,constchar*fmt,va_listargs)¶
Unformat a buffer into a list of arguments
Parameters
constchar*buf
input buffer
constchar*fmt
format of buffer
va_listargs
arguments
- intsscanf(constchar*buf,constchar*fmt,...)¶
Unformat a buffer into a list of arguments
Parameters
constchar*buf
input buffer
constchar*fmt
formatting of buffer
...
resulting arguments
- intkstrtoul(constchar*s,unsignedintbase,unsignedlong*res)¶
convert a string to an unsigned long
Parameters
constchar*s
The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbase
The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedlong*res
Where to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoul()
. Return code must be checked.
- intkstrtol(constchar*s,unsignedintbase,long*res)¶
convert a string to a long
Parameters
constchar*s
The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbase
The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
long*res
Where to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtol()
. Return code must be checked.
- intkstrtoull(constchar*s,unsignedintbase,unsignedlonglong*res)¶
convert a string to an unsigned long long
Parameters
constchar*s
The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbase
The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedlonglong*res
Where to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoull()
. Return code must be checked.
- intkstrtoll(constchar*s,unsignedintbase,longlong*res)¶
convert a string to a long long
Parameters
constchar*s
The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbase
The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
longlong*res
Where to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoll()
. Return code must be checked.
- intkstrtouint(constchar*s,unsignedintbase,unsignedint*res)¶
convert a string to an unsigned int
Parameters
constchar*s
The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbase
The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedint*res
Where to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoul()
. Return code must be checked.
- intkstrtoint(constchar*s,unsignedintbase,int*res)¶
convert a string to an int
Parameters
constchar*s
The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbase
The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
int*res
Where to write the result of the conversion on success.
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtol()
. Return code must be checked.
- intkstrtobool(constchar*s,bool*res)¶
convert common user inputs into boolean values
Parameters
constchar*s
input string
bool*res
result
Description
This routine returns 0 iff the first character is one of ‘YyTt1NnFf0’, or[oO][NnFf] for “on” and “off”. Otherwise it will return -EINVAL. Valuepointed to by res is updated upon finding a match.
- intstring_get_size(u64size,u64blk_size,constenumstring_size_unitsunits,char*buf,intlen)¶
get the size in the specified units
Parameters
u64size
The size to be converted in blocks
u64blk_size
Size of the block (use 1 for size in bytes)
constenumstring_size_unitsunits
Units to use (powers of 1000 or 1024), whether to include space separator
char*buf
buffer to format to
intlen
length of buffer
Description
This function returns a string formatted to 3 significant figuresgiving the size in the required units.buf should have room forat least 9 bytes and will always be zero terminated.
Return value: number of characters of output that would have been written(which may be greater than len, if output was truncated).
- intparse_int_array_user(constchar__user*from,size_tcount,int**array)¶
Split string into a sequence of integers
Parameters
constchar__user*from
The user space buffer to read from
size_tcount
The maximum number of bytes to read
int**array
Returned pointer to sequence of integers
Description
On successarray is allocated and initialized with a sequence ofintegers extracted from thefrom plus an additional element thatbegins the sequence and specifies the integers count.
Caller takes responsibility for freeingarray when it is no longerneeded.
- intstring_unescape(char*src,char*dst,size_tsize,unsignedintflags)¶
unquote characters in the given string
Parameters
char*src
source buffer (escaped)
char*dst
destination buffer (unescaped)
size_tsize
size of the destination buffer (0 to unlimit)
unsignedintflags
combination of the flags.
Description
The function unquotes characters in the given string.
Because the size of the output will be the same as or less than the size ofthe input, the transformation may be performed in place.
Caller must provide valid source and destination pointers. Be aware thatdestination buffer will always be NULL-terminated. Source string must beNULL-terminated as well. The supported flags are:
UNESCAPE_SPACE: '\f' - form feed '\n' - new line '\r' - carriage return '\t' - horizontal tab '\v' - vertical tabUNESCAPE_OCTAL: '\NNN' - byte with octal value NNN (1 to 3 digits)UNESCAPE_HEX: '\xHH' - byte with hexadecimal value HH (1 to 2 digits)UNESCAPE_SPECIAL: '\"' - double quote '\\' - backslash '\a' - alert (BEL) '\e' - escapeUNESCAPE_ANY: all previous together
Return
The amount of the characters processed to the destination buffer excludingtrailing ‘0’ is returned.
- intstring_escape_mem(constchar*src,size_tisz,char*dst,size_tosz,unsignedintflags,constchar*only)¶
quote characters in the given memory buffer
Parameters
constchar*src
source buffer (unescaped)
size_tisz
source buffer size
char*dst
destination buffer (escaped)
size_tosz
destination buffer size
unsignedintflags
combination of the flags
constchar*only
NULL-terminated string containing characters used to limitthe selected escape class. If characters are included inonlythat would not normally be escaped by the classes selectedinflags, they will be copied todst unescaped.
Description
The process of escaping byte buffer includes several parts. They are appliedin the following sequence.
The character is not matched to the one fromonly string and thusmust go as-is to the output.
The character is matched to the printable and ASCII classes, if asked,and in case of match it passes through to the output.
The character is matched to the printable or ASCII class, if asked,and in case of match it passes through to the output.
The character is checked if it falls into the class given byflags.
ESCAPE_OCTAL
andESCAPE_HEX
are going last since they cover anycharacter. Note that they actually can’t go together, otherwiseESCAPE_HEX
will be ignored.
Caller must provide valid source and destination pointers. Be aware thatdestination buffer will not be NULL-terminated, thus caller have to appendit if needs. The supported flags are:
%ESCAPE_SPACE: (special white space, not space itself) '\f' - form feed '\n' - new line '\r' - carriage return '\t' - horizontal tab '\v' - vertical tab%ESCAPE_SPECIAL: '\"' - double quote '\\' - backslash '\a' - alert (BEL) '\e' - escape%ESCAPE_NULL: '\0' - null%ESCAPE_OCTAL: '\NNN' - byte with octal value NNN (3 digits)%ESCAPE_ANY: all previous together%ESCAPE_NP: escape only non-printable characters, checked by isprint()%ESCAPE_ANY_NP: all previous together%ESCAPE_HEX: '\xHH' - byte with hexadecimal value HH (2 digits)%ESCAPE_NA: escape only non-ascii characters, checked by isascii()%ESCAPE_NAP: escape only non-printable or non-ascii characters%ESCAPE_APPEND: append characters from @only to be escaped by the given classes
ESCAPE_APPEND
would help to pass additional characters to the escaped, whenone ofESCAPE_NP
,ESCAPE_NA
, orESCAPE_NAP
is provided.
One notable caveat, theESCAPE_NAP
,ESCAPE_NP
andESCAPE_NA
have thehigher priority than the rest of the flags (ESCAPE_NAP
is the highest).It doesn’t make much sense to use either of them withoutESCAPE_OCTAL
orESCAPE_HEX
, because they cover most of the other character classes.ESCAPE_NAP
can utilizeESCAPE_SPACE
orESCAPE_SPECIAL
in addition tothe above.
Return
The total size of the escaped output that would be generated forthe given input and flags. To check whether the output wastruncated, compare the return value to osz. There is room left indst for a ‘0’ terminator if and only if ret < osz.
- char**kasprintf_strarray(gfp_tgfp,constchar*prefix,size_tn)¶
allocate and fill array of sequential strings
Parameters
gfp_tgfp
flags for the slab allocator
constchar*prefix
prefix to be used
size_tn
amount of lines to be allocated and filled
Description
Allocates and fillsn strings using pattern “s-````zu
”, where prefixis provided by caller. The caller is responsible to free them withkfree_strarray()
after use.
Returns array of strings or NULL when memory can’t be allocated.
- voidkfree_strarray(char**array,size_tn)¶
free a number of dynamically allocated strings contained in an array and the array itself
Parameters
char**array
Dynamically allocated array of strings to free.
size_tn
Number of strings (starting from the beginning of the array) to free.
Description
Passing a non-NULLarray andn == 0 as well as NULLarray are validuse-cases. Ifarray is NULL, the function does nothing.
- char*skip_spaces(constchar*str)¶
Removes leading whitespace fromstr.
Parameters
constchar*str
The string to be stripped.
Description
Returns a pointer to the first non-whitespace character instr.
- char*strim(char*s)¶
Removes leading and trailing whitespace froms.
Parameters
char*s
The string to be stripped.
Description
Note that the first trailing whitespace is replaced with aNUL-terminator
in the given strings. Returns a pointer to the first non-whitespacecharacter ins.
- boolsysfs_streq(constchar*s1,constchar*s2)¶
return true if strings are equal, modulo trailing newline
Parameters
constchar*s1
one string
constchar*s2
another string
Description
This routine returns true iff two strings are equal, treating bothNUL and newline-then-NUL as equivalent string terminations. It’sgeared for use with sysfs input strings, which generally terminatewith newlines but are compared against values without newlines.
- intmatch_string(constchar*const*array,size_tn,constchar*string)¶
matches given string in an array
Parameters
constchar*const*array
array of strings
size_tn
number of strings in the array or -1 for NULL terminated arrays
constchar*string
string to match with
Description
This routine will look for a string in an array of strings up to then-th element in the array or until the first NULL element.
Historically the value of -1 forn, was used to search in arrays thatare NULL terminated. However, the function does not make a distinctionwhen finishing the search: eithern elements have been compared ORthe first NULL element was found.
Return
index of astring in thearray if matches, or-EINVAL
otherwise.
- int__sysfs_match_string(constchar*const*array,size_tn,constchar*str)¶
matches given string in an array
Parameters
constchar*const*array
array of strings
size_tn
number of strings in the array or -1 for NULL terminated arrays
constchar*str
string to match with
Description
Returns index ofstr in thearray or -EINVAL, just likematch_string()
.Uses sysfs_streq instead of strcmp for matching.
This routine will look for a string in an array of strings up to then-th element in the array or until the first NULL element.
Historically the value of -1 forn, was used to search in arrays thatare NULL terminated. However, the function does not make a distinctionwhen finishing the search: eithern elements have been compared ORthe first NULL element was found.
- char*strreplace(char*str,charold,charnew)¶
Replace all occurrences of character in string.
Parameters
char*str
The string to operate on.
charold
The character being replaced.
charnew
The characterold is replaced with.
Description
Replaces the eachold character with anew one in the given stringstr.
Return
pointer to the stringstr itself.
- voidmemcpy_and_pad(void*dest,size_tdest_len,constvoid*src,size_tcount,intpad)¶
Copy one buffer to another with padding
Parameters
void*dest
Where to copy to
size_tdest_len
The destination buffer size
constvoid*src
Where to copy from
size_tcount
The number of bytes to copy
intpad
Character to use for padding if space is left in destination.
String Manipulation¶
- unsafe_memcpy¶
unsafe_memcpy(dst,src,bytes,justification)
memcpy implementation with no FORTIFY bounds checking
Parameters
dst
Destination memory address to write to
src
Source memory address to read from
bytes
How many bytes to write todst fromsrc
justification
Free-form text or comment describing why the use is needed
Description
This should be used for corner cases where the compiler cannot do theright thing, or during transitions between APIs, etc. It should be usedvery rarely, and includes a place for justification detailing where boundschecking has happened, and why existing solutions cannot be employed.
- char*strncpy(char*constp,constchar*q,__kernel_size_tsize)¶
Copy a string to memory with non-guaranteed NUL padding
Parameters
char*constp
pointer to destination of copy
constchar*q
pointer to NUL-terminated source string to copy
__kernel_size_tsize
bytes to write atp
Description
If strlen(q) >=size, the copy ofq will stop aftersize bytes,andp will NOT be NUL-terminated
If strlen(q) <size, following the copy ofq, trailing NUL byteswill be written top untilsize total bytes have been written.
Do not use this function. While FORTIFY_SOURCE tries to avoidover-reads ofq, it cannot defend against writing unterminatedresults top. Usingstrncpy()
remains ambiguous and fragile.Instead, please choose an alternative, so that the expectationofp’s contents is unambiguous:
p needs to be: | padded tosize | not padded |
---|---|---|
NUL-terminated | ||
not NUL-terminated |
Note strscpy*()’s differing return values for detecting truncation,and strtomem*()’s expectation that the destination is marked with__nonstring when it is a character array.
- __kernel_size_tstrnlen(constchar*constp,__kernel_size_tmaxlen)¶
Return bounded count of characters in a NUL-terminated string
Parameters
constchar*constp
pointer to NUL-terminated string to count.
__kernel_size_tmaxlen
maximum number of characters to count.
Description
Returns number of characters inp (NOT including the final NUL), ormaxlen, if no NUL has been found up to there.
- strlen¶
strlen(p)
Return count of characters in a NUL-terminated string
Parameters
p
pointer to NUL-terminated string to count.
Description
Do not use this function unless the string length is known atcompile-time. Whenp is unterminated, this function may crashor return unexpected counts that could lead to memory contentexposures. Preferstrnlen()
.
Returns number of characters inp (NOT including the final NUL).
- size_tstrlcat(char*constp,constchar*constq,size_tavail)¶
Append a string to an existing string
Parameters
char*constp
pointer to
NUL-terminated
string to append toconstchar*constq
pointer to
NUL-terminated
string to append fromsize_tavail
Maximum bytes available inp
Description
AppendsNUL-terminated
stringq after theNUL-terminated
string atp, but will not write beyondavail bytes total,potentially truncating the copy fromq.p will stayNUL-terminated
only if aNUL
already existed withintheavail bytes ofp. If so, the resulting number ofbytes copied fromq will be at most “avail - strlen(p) - 1”.
Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when the sizesofp andq are known to the compiler. Prefer building thestring with formatting, viascnprintf()
, seq_buf, or similar.
Returns total bytes that _would_ have been contained bypregardless of truncation, similar tosnprintf()
. If returnvalue is >=avail, the string has been truncated.
- char*strcat(char*constp,constchar*q)¶
Append a string to an existing string
Parameters
char*constp
pointer to NUL-terminated string to append to
constchar*q
pointer to NUL-terminated source string to append from
Description
Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when thedestination buffer size is known to the compiler. Preferbuilding the string with formatting, viascnprintf()
or similar.At the very least, usestrncat()
.
Returnsp.
- char*strncat(char*constp,constchar*constq,__kernel_size_tcount)¶
Append a string to an existing string
Parameters
char*constp
pointer to NUL-terminated string to append to
constchar*constq
pointer to source string to append from
__kernel_size_tcount
Maximum bytes to read fromq
Description
Appends at mostcount bytes fromq (stopping at the firstNUL byte) after the NUL-terminated string atp.p will beNUL-terminated.
Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when the sizesofp andq are known to the compiler. Prefer building thestring with formatting, viascnprintf()
or similar.
Returnsp.
- char*strcpy(char*constp,constchar*constq)¶
Copy a string into another string buffer
Parameters
char*constp
pointer to destination of copy
constchar*constq
pointer to NUL-terminated source string to copy
Description
Do not use this function. While FORTIFY_SOURCE tries to avoidoverflows, this is only possible when the sizes ofq andp areknown to the compiler. Preferstrscpy()
, though note its differentreturn values for detecting truncation.
Returnsp.
- intstrncasecmp(constchar*s1,constchar*s2,size_tlen)¶
Case insensitive, length-limited string comparison
Parameters
constchar*s1
One string
constchar*s2
The other string
size_tlen
the maximum number of characters to compare
- char*stpcpy(char*__restrict__dest,constchar*__restrict__src)¶
copy a string from src to dest returning a pointer to the new end of dest, including src’s
NUL-terminator
. May overrun dest.
Parameters
char*__restrict__dest
pointer to end of string being copied into. Must be large enoughto receive copy.
constchar*__restrict__src
pointer to the beginning of string being copied from. Must not overlapdest.
Description
stpcpy differs from strcpy in a key way: the return value is a pointerto the newNUL-terminating
character indest. (For strcpy, the returnvalue is a pointer to the start ofdest). This interface is consideredunsafe as it doesn’t perform bounds checking of the inputs. As such it’snot recommended for usage. Instead, its definition is provided in casethe compiler lowers other libcalls to stpcpy.
- intstrcmp(constchar*cs,constchar*ct)¶
Compare two strings
Parameters
constchar*cs
One string
constchar*ct
Another string
- intstrncmp(constchar*cs,constchar*ct,size_tcount)¶
Compare two length-limited strings
Parameters
constchar*cs
One string
constchar*ct
Another string
size_tcount
The maximum number of bytes to compare
- char*strchr(constchar*s,intc)¶
Find the first occurrence of a character in a string
Parameters
constchar*s
The string to be searched
intc
The character to search for
Description
Note that theNUL-terminator
is considered part of the string, and canbe searched for.
- char*strchrnul(constchar*s,intc)¶
Find and return a character in a string, or end of string
Parameters
constchar*s
The string to be searched
intc
The character to search for
Description
Returns pointer to first occurrence of ‘c’ in s. If c is not found, thenreturn a pointer to the null byte at the end of s.
- char*strrchr(constchar*s,intc)¶
Find the last occurrence of a character in a string
Parameters
constchar*s
The string to be searched
intc
The character to search for
- char*strnchr(constchar*s,size_tcount,intc)¶
Find a character in a length limited string
Parameters
constchar*s
The string to be searched
size_tcount
The number of characters to be searched
intc
The character to search for
Description
Note that theNUL-terminator
is considered part of the string, and canbe searched for.
- size_tstrspn(constchar*s,constchar*accept)¶
Calculate the length of the initial substring ofs which only contain letters inaccept
Parameters
constchar*s
The string to be searched
constchar*accept
The string to search for
- size_tstrcspn(constchar*s,constchar*reject)¶
Calculate the length of the initial substring ofs which does not contain letters inreject
Parameters
constchar*s
The string to be searched
constchar*reject
The string to avoid
- char*strpbrk(constchar*cs,constchar*ct)¶
Find the first occurrence of a set of characters
Parameters
constchar*cs
The string to be searched
constchar*ct
The characters to search for
- char*strsep(char**s,constchar*ct)¶
Split a string into tokens
Parameters
char**s
The string to be searched
constchar*ct
The characters to search for
Description
strsep()
updatess to point after the token, ready for the next call.
It returns empty tokens, too, behaving exactly like the libc functionof that name. In fact, it was stolen from glibc2 and de-fancy-fied.Same semantics, slimmer shape. ;)
- void*memset(void*s,intc,size_tcount)¶
Fill a region of memory with the given value
Parameters
void*s
Pointer to the start of the area.
intc
The byte to fill the area with
size_tcount
The size of the area.
Description
Do not usememset()
to access IO space, use memset_io() instead.
- void*memset16(uint16_t*s,uint16_tv,size_tcount)¶
Fill a memory area with a uint16_t
Parameters
uint16_t*s
Pointer to the start of the area.
uint16_tv
The value to fill the area with
size_tcount
The number of values to store
Description
Differs frommemset()
in that it fills with a uint16_t insteadof a byte. Remember thatcount is the number of uint16_ts tostore, not the number of bytes.
- void*memset32(uint32_t*s,uint32_tv,size_tcount)¶
Fill a memory area with a uint32_t
Parameters
uint32_t*s
Pointer to the start of the area.
uint32_tv
The value to fill the area with
size_tcount
The number of values to store
Description
Differs frommemset()
in that it fills with a uint32_t insteadof a byte. Remember thatcount is the number of uint32_ts tostore, not the number of bytes.
- void*memset64(uint64_t*s,uint64_tv,size_tcount)¶
Fill a memory area with a uint64_t
Parameters
uint64_t*s
Pointer to the start of the area.
uint64_tv
The value to fill the area with
size_tcount
The number of values to store
Description
Differs frommemset()
in that it fills with a uint64_t insteadof a byte. Remember thatcount is the number of uint64_ts tostore, not the number of bytes.
- void*memcpy(void*dest,constvoid*src,size_tcount)¶
Copy one area of memory to another
Parameters
void*dest
Where to copy to
constvoid*src
Where to copy from
size_tcount
The size of the area.
Description
You should not use this function to access IO space, use memcpy_toio()or memcpy_fromio() instead.
- void*memmove(void*dest,constvoid*src,size_tcount)¶
Copy one area of memory to another
Parameters
void*dest
Where to copy to
constvoid*src
Where to copy from
size_tcount
The size of the area.
Description
- __visibleintmemcmp(constvoid*cs,constvoid*ct,size_tcount)¶
Compare two areas of memory
Parameters
constvoid*cs
One area of memory
constvoid*ct
Another area of memory
size_tcount
The size of the area.
- intbcmp(constvoid*a,constvoid*b,size_tlen)¶
returns 0 if and only if the buffers have identical contents.
Parameters
constvoid*a
pointer to first buffer.
constvoid*b
pointer to second buffer.
size_tlen
size of buffers.
Description
The sign or magnitude of a non-zero return value has no particularmeaning, and architectures may implement their own more efficientbcmp()
. Sowhile this particular implementation is a simple (tail) call to memcmp, donot rely on anything but whether the return value is zero or non-zero.
- void*memscan(void*addr,intc,size_tsize)¶
Find a character in an area of memory.
Parameters
void*addr
The memory area
intc
The byte to search for
size_tsize
The size of the area.
Description
returns the address of the first occurrence ofc, or 1 byte pastthe area ifc is not found
- char*strstr(constchar*s1,constchar*s2)¶
Find the first substring in a
NUL
terminated string
Parameters
constchar*s1
The string to be searched
constchar*s2
The string to search for
- char*strnstr(constchar*s1,constchar*s2,size_tlen)¶
Find the first substring in a length-limited string
Parameters
constchar*s1
The string to be searched
constchar*s2
The string to search for
size_tlen
the maximum number of characters to search
- void*memchr(constvoid*s,intc,size_tn)¶
Find a character in an area of memory.
Parameters
constvoid*s
The memory area
intc
The byte to search for
size_tn
The size of the area.
Description
returns the address of the first occurrence ofc, orNULL
ifc is not found
- void*memchr_inv(constvoid*start,intc,size_tbytes)¶
Find an unmatching character in an area of memory.
Parameters
constvoid*start
The memory area
intc
Find a character other than c
size_tbytes
The size of the area.
Description
returns the address of the first character other thanc, orNULL
if the whole buffer contains justc.
- void*memdup_array_user(constvoid__user*src,size_tn,size_tsize)¶
duplicate array from user space
Parameters
constvoid__user*src
source address in user space
size_tn
number of array members to copy
size_tsize
size of one array member
Return
anERR_PTR()
on failure. Result is physicallycontiguous, to be freed bykfree()
.
- void*vmemdup_array_user(constvoid__user*src,size_tn,size_tsize)¶
duplicate array from user space
Parameters
constvoid__user*src
source address in user space
size_tn
number of array members to copy
size_tsize
size of one array member
Return
anERR_PTR()
on failure. Result may be notphysically contiguous. Usekvfree()
to free.
- strscpy¶
strscpy(dst,src,...)
Copy a C-string into a sized buffer
Parameters
dst
Where to copy the string to
src
Where to copy the string from
...
Size of destination buffer (optional)
Description
Copy the source stringsrc, or as much of it as fits, into thedestinationdst buffer. The behavior is undefined if the stringbuffers overlap. The destinationdst buffer is always NUL terminated,unless it’s zero-sized.
The size argument... is only required whendst is not an array, orwhen the copy needs to be smaller than sizeof(dst).
Preferred tostrncpy()
since it always returns a valid string, anddoesn’t unnecessarily force the tail of the destination buffer to bezero padded. If padding is desired please usestrscpy_pad()
.
Returns the number of characters copied indst (not including thetrailingNUL
) or -E2BIG ifsize is 0 or the copy fromsrc wastruncated.
- strscpy_pad¶
strscpy_pad(dst,src,...)
Copy a C-string into a sized buffer
Parameters
dst
Where to copy the string to
src
Where to copy the string from
...
Size of destination buffer
Description
Copy the string, or as much of it as fits, into the dest buffer. Thebehavior is undefined if the string buffers overlap. The destinationbuffer is alwaysNUL
terminated, unless it’s zero-sized.
If the source string is shorter than the destination buffer, theremaining bytes in the buffer will be filled withNUL
bytes.
For full explanation of why you may want to consider using the‘strscpy’ functions please see the function docstring forstrscpy()
.
Return
The number of characters copied (not including the trailing
NULs
)-E2BIG if count is 0 orsrc was truncated.
- boolmem_is_zero(constvoid*s,size_tn)¶
Check if an area of memory is all 0’s.
Parameters
constvoid*s
The memory area
size_tn
The size of the area
Return
True if the area of memory is all 0’s.
- sysfs_match_string¶
sysfs_match_string(_a,_s)
matches given string in an array
Parameters
_a
array of strings
_s
string to match with
Description
Helper for__sysfs_match_string()
. Calculates the size ofa automatically.
- boolstrstarts(constchar*str,constchar*prefix)¶
doesstr start withprefix?
Parameters
constchar*str
string to examine
constchar*prefix
prefix to look for.
- voidmemzero_explicit(void*s,size_tcount)¶
Fill a region of memory (e.g. sensitive keying data) with 0s.
Parameters
void*s
Pointer to the start of the area.
size_tcount
The size of the area.
Note
usually usingmemset()
is just fine (!), but in caseswhere clearing out _local_ data at the end of a scope isnecessary,memzero_explicit()
should be used instead inorder to prevent the compiler from optimising away zeroing.
Description
memzero_explicit()
doesn’t need an arch-specific version asit just invokes the one ofmemset()
implicitly.
- constchar*kbasename(constchar*path)¶
return the last part of a pathname.
Parameters
constchar*path
path to extract the filename from.
- strtomem_pad¶
strtomem_pad(dest,src,pad)
Copy NUL-terminated string to non-NUL-terminated buffer
Parameters
dest
Pointer of destination character array (marked as __nonstring)
src
Pointer to NUL-terminated string
pad
Padding character to fill any remaining bytes ofdest after copy
Description
This is a replacement forstrncpy()
uses where the destination is nota NUL-terminated string, but with bounds checking on the source size, andan explicit padding character. If padding is not required, usestrtomem()
.
Note that the size ofdest is not an argument, as the length ofdestmust be discoverable by the compiler.
- strtomem¶
strtomem(dest,src)
Copy NUL-terminated string to non-NUL-terminated buffer
Parameters
dest
Pointer of destination character array (marked as __nonstring)
src
Pointer to NUL-terminated string
Description
This is a replacement forstrncpy()
uses where the destination is nota NUL-terminated string, but with bounds checking on the source size, andwithout trailing padding. If padding is required, usestrtomem_pad()
.
Note that the size ofdest is not an argument, as the length ofdestmust be discoverable by the compiler.
- memtostr¶
memtostr(dest,src)
Copy a possibly non-NUL-term string to a NUL-term string
Parameters
dest
Pointer to destination NUL-terminates string
src
Pointer to character array (likely marked as __nonstring)
Description
This is a replacement forstrncpy()
uses where the source is nota NUL-terminated string.
Note that sizes ofdest andsrc must be known at compile-time.
- memtostr_pad¶
memtostr_pad(dest,src)
Copy a possibly non-NUL-term string to a NUL-term string with NUL padding in the destination
Parameters
dest
Pointer to destination NUL-terminates string
src
Pointer to character array (likely marked as __nonstring)
Description
This is a replacement forstrncpy()
uses where the source is nota NUL-terminated string.
Note that sizes ofdest andsrc must be known at compile-time.
- memset_after¶
memset_after(obj,v,member)
Set a value after a struct member to the end of a struct
Parameters
obj
Address of target struct instance
v
Byte value to repeatedly write
member
after which struct member to start writing bytes
Description
This is good for clearing padding following the given member.
- memset_startat¶
memset_startat(obj,v,member)
Set a value starting at a member to the end of a struct
Parameters
obj
Address of target struct instance
v
Byte value to repeatedly write
member
struct member to start writing at
Description
Note that if there is padding between the prior member and the targetmember,memset_after()
should be used to clear the prior padding.
- size_tstr_has_prefix(constchar*str,constchar*prefix)¶
Test if a string has a given prefix
Parameters
constchar*str
The string to test
constchar*prefix
The string to see ifstr starts with
Description
- A common way to test a prefix of a string is to do:
strncmp(str, prefix, sizeof(prefix) - 1)
But this can lead to bugs due to typos, or if prefix is a pointerand not a constant. Instead usestr_has_prefix()
.
Return
strlen(prefix) ifstr starts withprefix
0 ifstr does not start withprefix
- char*kstrdup(constchar*s,gfp_tgfp)¶
allocate space for and copy an existing string
Parameters
constchar*s
the string to duplicate
gfp_tgfp
the GFP mask used in the
kmalloc()
call when allocating memory
Return
newly allocated copy ofs orNULL
in case of error
- constchar*kstrdup_const(constchar*s,gfp_tgfp)¶
conditionally duplicate an existing const string
Parameters
constchar*s
the string to duplicate
gfp_tgfp
the GFP mask used in the
kmalloc()
call when allocating memory
Note
Strings allocated by kstrdup_const should be freed by kfree_const andmust not be passed to krealloc().
Return
source string if it is in .rodata section otherwisefallback to kstrdup.
- char*kstrndup(constchar*s,size_tmax,gfp_tgfp)¶
allocate space for and copy an existing string
Parameters
constchar*s
the string to duplicate
size_tmax
read at mostmax chars froms
gfp_tgfp
the GFP mask used in the
kmalloc()
call when allocating memory
Note
Usekmemdup_nul()
instead if the size is known exactly.
Return
newly allocated copy ofs orNULL
in case of error
- void*kmemdup(constvoid*src,size_tlen,gfp_tgfp)¶
duplicate region of memory
Parameters
constvoid*src
memory region to duplicate
size_tlen
memory region length
gfp_tgfp
GFP mask to use
Return
newly allocated copy ofsrc orNULL
in case of error,result is physically contiguous. Usekfree()
to free.
- char*kmemdup_nul(constchar*s,size_tlen,gfp_tgfp)¶
Create a NUL-terminated string from unterminated data
Parameters
constchar*s
The data to stringify
size_tlen
The size of the data
gfp_tgfp
the GFP mask used in the
kmalloc()
call when allocating memory
Return
newly allocated copy ofs with NUL-termination orNULL
incase of error
- void*memdup_user(constvoid__user*src,size_tlen)¶
duplicate memory region from user space
Parameters
constvoid__user*src
source address in user space
size_tlen
number of bytes to copy
Return
anERR_PTR()
on failure. Result is physicallycontiguous, to be freed bykfree()
.
- void*vmemdup_user(constvoid__user*src,size_tlen)¶
duplicate memory region from user space
Parameters
constvoid__user*src
source address in user space
size_tlen
number of bytes to copy
Return
anERR_PTR()
on failure. Result may be notphysically contiguous. Usekvfree()
to free.
- char*strndup_user(constchar__user*s,longn)¶
duplicate an existing string from user space
Parameters
constchar__user*s
The string to duplicate
longn
Maximum number of bytes to copy, including the trailing NUL.
Return
newly allocated copy ofs or anERR_PTR()
in case of error
- void*memdup_user_nul(constvoid__user*src,size_tlen)¶
duplicate memory region from user space and NUL-terminate
Parameters
constvoid__user*src
source address in user space
size_tlen
number of bytes to copy
Return
anERR_PTR()
on failure.
Basic Kernel Library Functions¶
The Linux kernel provides more basic utility functions.
Bit Operations¶
- voidset_bit(longnr,volatileunsignedlong*addr)¶
Atomically set a bit in memory
Parameters
longnr
the bit to set
volatileunsignedlong*addr
the address to start counting from
Description
This is a relaxed atomic operation (no implied memory barriers).
Note thatnr may be almost arbitrarily large; this function is notrestricted to acting on a single-word quantity.
- voidclear_bit(longnr,volatileunsignedlong*addr)¶
Clears a bit in memory
Parameters
longnr
Bit to clear
volatileunsignedlong*addr
Address to start counting from
Description
This is a relaxed atomic operation (no implied memory barriers).
- voidchange_bit(longnr,volatileunsignedlong*addr)¶
Toggle a bit in memory
Parameters
longnr
Bit to change
volatileunsignedlong*addr
Address to start counting from
Description
This is a relaxed atomic operation (no implied memory barriers).
Note thatnr may be almost arbitrarily large; this function is notrestricted to acting on a single-word quantity.
- booltest_and_set_bit(longnr,volatileunsignedlong*addr)¶
Set a bit and return its old value
Parameters
longnr
Bit to set
volatileunsignedlong*addr
Address to count from
Description
This is an atomic fully-ordered operation (implied full memory barrier).
- booltest_and_clear_bit(longnr,volatileunsignedlong*addr)¶
Clear a bit and return its old value
Parameters
longnr
Bit to clear
volatileunsignedlong*addr
Address to count from
Description
This is an atomic fully-ordered operation (implied full memory barrier).
- booltest_and_change_bit(longnr,volatileunsignedlong*addr)¶
Change a bit and return its old value
Parameters
longnr
Bit to change
volatileunsignedlong*addr
Address to count from
Description
This is an atomic fully-ordered operation (implied full memory barrier).
- void___set_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Set a bit in memory
Parameters
unsignedlongnr
the bit to set
volatileunsignedlong*addr
the address to start counting from
Description
Unlikeset_bit()
, this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.
- void___clear_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Clears a bit in memory
Parameters
unsignedlongnr
the bit to clear
volatileunsignedlong*addr
the address to start counting from
Description
Unlikeclear_bit()
, this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.
- void___change_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Toggle a bit in memory
Parameters
unsignedlongnr
the bit to change
volatileunsignedlong*addr
the address to start counting from
Description
Unlikechange_bit()
, this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.
- bool___test_and_set_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Set a bit and return its old value
Parameters
unsignedlongnr
Bit to set
volatileunsignedlong*addr
Address to count from
Description
This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.
- bool___test_and_clear_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Clear a bit and return its old value
Parameters
unsignedlongnr
Bit to clear
volatileunsignedlong*addr
Address to count from
Description
This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.
- bool___test_and_change_bit(unsignedlongnr,volatileunsignedlong*addr)¶
Change a bit and return its old value
Parameters
unsignedlongnr
Bit to change
volatileunsignedlong*addr
Address to count from
Description
This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.
- bool_test_bit(unsignedlongnr,volatileconstunsignedlong*addr)¶
Determine whether a bit is set
Parameters
unsignedlongnr
bit number to test
constvolatileunsignedlong*addr
Address to start counting from
- bool_test_bit_acquire(unsignedlongnr,volatileconstunsignedlong*addr)¶
Determine, with acquire semantics, whether a bit is set
Parameters
unsignedlongnr
bit number to test
constvolatileunsignedlong*addr
Address to start counting from
- voidclear_bit_unlock(longnr,volatileunsignedlong*addr)¶
Clear a bit in memory, for unlock
Parameters
longnr
the bit to set
volatileunsignedlong*addr
the address to start counting from
Description
This operation is atomic and provides release barrier semantics.
- void__clear_bit_unlock(longnr,volatileunsignedlong*addr)¶
Clears a bit in memory
Parameters
longnr
Bit to clear
volatileunsignedlong*addr
Address to start counting from
Description
This is a non-atomic operation but implies a release barrier before thememory operation. It can be used for an unlock if no other CPUs canconcurrently modify other bits in the word.
- booltest_and_set_bit_lock(longnr,volatileunsignedlong*addr)¶
Set a bit and return its old value, for lock
Parameters
longnr
Bit to set
volatileunsignedlong*addr
Address to count from
Description
This operation is atomic and provides acquire barrier semantics ifthe returned value is 0.It can be used to implement bit locks.
- boolxor_unlock_is_negative_byte(unsignedlongmask,volatileunsignedlong*addr)¶
XOR a single byte in memory and test if it is negative, for unlock.
Parameters
unsignedlongmask
Change the bits which are set in this mask.
volatileunsignedlong*addr
The address of the word containing the byte to change.
Description
Changes some of bits 0-6 in the word pointed to byaddr.This operation is atomic and provides release barrier semantics.Used to optimise some folio operations which are commonly pairedwith an unlock or end of writeback. Bit 7 is used as PG_waiters toindicate whether anybody is waiting for the unlock.
Return
Whether the top bit of the byte is set.
Bitmap Operations¶
bitmaps provide an array of bits, implemented using anarray of unsigned longs. The number of valid bits in agiven bitmap does _not_ need to be an exact multiple ofBITS_PER_LONG.
The possible unused bits in the last, partially used wordof a bitmap are ‘don’t care’. The implementation makesno particular effort to keep them zero. It ensures thattheir value will not affect the results of any operation.The bitmap operations that return Boolean (bitmap_empty,for example) or scalar (bitmap_weight, for example) resultscarefully filter out these unused bits from impacting theirresults.
The byte ordering of bitmaps is more natural on littleendian architectures. See the big-endian headersinclude/asm-ppc64/bitops.h and include/asm-s390/bitops.hfor the best explanations of this ordering.
The DECLARE_BITMAP(name,bits) macro, in linux/types.h, can be usedto declare an array named ‘name’ of just enough unsigned longs tocontain all bit positions from 0 to ‘bits’ - 1.
The available bitmap operations and their rough meaning in thecase that the bitmap is a single unsigned long are thus:
The generated code is more efficient when nbits is known atcompile-time and at most BITS_PER_LONG.
bitmap_zero(dst, nbits) *dst = 0ULbitmap_fill(dst, nbits) *dst = ~0ULbitmap_copy(dst, src, nbits) *dst = *srcbitmap_and(dst, src1, src2, nbits) *dst = *src1 & *src2bitmap_or(dst, src1, src2, nbits) *dst = *src1 | *src2bitmap_xor(dst, src1, src2, nbits) *dst = *src1 ^ *src2bitmap_andnot(dst, src1, src2, nbits) *dst = *src1 & ~(*src2)bitmap_complement(dst, src, nbits) *dst = ~(*src)bitmap_equal(src1, src2, nbits) Are *src1 and *src2 equal?bitmap_intersects(src1, src2, nbits) Do *src1 and *src2 overlap?bitmap_subset(src1, src2, nbits) Is *src1 a subset of *src2?bitmap_empty(src, nbits) Are all bits zero in *src?bitmap_full(src, nbits) Are all bits set in *src?bitmap_weight(src, nbits) Hamming Weight: number set bitsbitmap_weight_and(src1, src2, nbits) Hamming Weight of and'ed bitmapbitmap_weight_andnot(src1, src2, nbits) Hamming Weight of andnot'ed bitmapbitmap_set(dst, pos, nbits) Set specified bit areabitmap_clear(dst, pos, nbits) Clear specified bit areabitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free areabitmap_find_next_zero_area_off(buf, len, pos, n, mask, mask_off) as abovebitmap_shift_right(dst, src, n, nbits) *dst = *src >> nbitmap_shift_left(dst, src, n, nbits) *dst = *src << nbitmap_cut(dst, src, first, n, nbits) Cut n bits from first, copy restbitmap_replace(dst, old, new, mask, nbits) *dst = (*old & ~(*mask)) | (*new & *mask)bitmap_scatter(dst, src, mask, nbits) *dst = map(dense, sparse)(src)bitmap_gather(dst, src, mask, nbits) *dst = map(sparse, dense)(src)bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)bitmap_bitremap(oldbit, old, new, nbits) newbit = map(old, new)(oldbit)bitmap_onto(dst, orig, relmap, nbits) *dst = orig relative to relmapbitmap_fold(dst, orig, sz, nbits) dst bits = orig bits mod szbitmap_parse(buf, buflen, dst, nbits) Parse bitmap dst from kernel bufbitmap_parse_user(ubuf, ulen, dst, nbits) Parse bitmap dst from user bufbitmap_parselist(buf, dst, nbits) Parse bitmap dst from kernel bufbitmap_parselist_user(buf, dst, nbits) Parse bitmap dst from user bufbitmap_find_free_region(bitmap, bits, order) Find and allocate bit regionbitmap_release_region(bitmap, pos, order) Free specified bit regionbitmap_allocate_region(bitmap, pos, order) Allocate specified bit regionbitmap_from_arr32(dst, buf, nbits) Copy nbits from u32[] buf to dstbitmap_from_arr64(dst, buf, nbits) Copy nbits from u64[] buf to dstbitmap_to_arr32(buf, src, nbits) Copy nbits from buf to u32[] dstbitmap_to_arr64(buf, src, nbits) Copy nbits from buf to u64[] dstbitmap_get_value8(map, start) Get 8bit value from map at startbitmap_set_value8(map, value, start) Set 8bit value to map at startbitmap_read(map, start, nbits) Read an nbits-sized value from map at startbitmap_write(map, value, start, nbits) Write an nbits-sized value to map at start
Note, bitmap_zero() and bitmap_fill() operate over the region ofunsigned longs, that is, bits behind bitmap till the unsigned longboundary will be zeroed or filled as well. Consider to usebitmap_clear() or bitmap_set() to make explicit zeroing or fillingrespectively.
Also the following operations in asm/bitops.h apply to bitmaps.:
set_bit(bit, addr) *addr |= bitclear_bit(bit, addr) *addr &= ~bitchange_bit(bit, addr) *addr ^= bittest_bit(bit, addr) Is bit set in *addr?test_and_set_bit(bit, addr) Set bit and return old valuetest_and_clear_bit(bit, addr) Clear bit and return old valuetest_and_change_bit(bit, addr) Change bit and return old valuefind_first_zero_bit(addr, nbits) Position first zero bit in *addrfind_first_bit(addr, nbits) Position first set bit in *addrfind_next_zero_bit(addr, nbits, bit) Position next zero bit in *addr >= bitfind_next_bit(addr, nbits, bit) Position next set bit in *addr >= bitfind_next_and_bit(addr1, addr2, nbits, bit) Same as find_next_bit, but in (*addr1 & *addr2)
- void__bitmap_shift_right(unsignedlong*dst,constunsignedlong*src,unsignedshift,unsignednbits)¶
logical right shift of the bits in a bitmap
Parameters
unsignedlong*dst
destination bitmap
constunsignedlong*src
source bitmap
unsignedshift
shift by this many bits
unsignednbits
bitmap size, in bits
Description
Shifting right (dividing) means moving bits in the MS -> LS bitdirection. Zeros are fed into the vacated MS positions and theLS bits shifted off the bottom are lost.
- void__bitmap_shift_left(unsignedlong*dst,constunsignedlong*src,unsignedintshift,unsignedintnbits)¶
logical left shift of the bits in a bitmap
Parameters
unsignedlong*dst
destination bitmap
constunsignedlong*src
source bitmap
unsignedintshift
shift by this many bits
unsignedintnbits
bitmap size, in bits
Description
Shifting left (multiplying) means moving bits in the LS -> MSdirection. Zeros are fed into the vacated LS bit positionsand those MS bits shifted off the top are lost.
- voidbitmap_cut(unsignedlong*dst,constunsignedlong*src,unsignedintfirst,unsignedintcut,unsignedintnbits)¶
remove bit region from bitmap and right shift remaining bits
Parameters
unsignedlong*dst
destination bitmap, might overlap with src
constunsignedlong*src
source bitmap
unsignedintfirst
start bit of region to be removed
unsignedintcut
number of bits to remove
unsignedintnbits
bitmap size, in bits
Description
Set the n-th bit ofdst iff the n-th bit ofsrc is set andn is less thanfirst, or the m-th bit ofsrc is set for anym such thatfirst <= n < nbits, and m = n +cut.
In pictures, example for a big-endian 32-bit architecture:
Thesrc bitmap is:
31 63| |10000000 11000001 11110010 00010101 10000000 11000001 01110010 00010101 | | | | 16 14 0 32
ifcut is 3, andfirst is 14, bits 14-16 insrc are cut anddst is:
31 63| |10110000 00011000 00110010 00010101 00010000 00011000 00101110 01000010 | | | 14 (bit 17 0 32 from @src)
Note thatdst andsrc might overlap partially or entirely.
This is implemented in the obvious way, with a shift and carrystep for each moved bit. Optimisation is left as an exercisefor the compiler.
- unsignedlongbitmap_find_next_zero_area_off(unsignedlong*map,unsignedlongsize,unsignedlongstart,unsignedintnr,unsignedlongalign_mask,unsignedlongalign_offset)¶
find a contiguous aligned zero area
Parameters
unsignedlong*map
The address to base the search on
unsignedlongsize
The bitmap size in bits
unsignedlongstart
The bitnumber to start searching at
unsignedintnr
The number of zeroed bits we’re looking for
unsignedlongalign_mask
Alignment mask for zero area
unsignedlongalign_offset
Alignment offset for zero area.
Description
Thealign_mask should be one less than a power of 2; the effect is thatthe bit offset of all zero areas this function finds plusalign_offsetis multiple of that power of 2.
- voidbitmap_remap(unsignedlong*dst,constunsignedlong*src,constunsignedlong*old,constunsignedlong*new,unsignedintnbits)¶
Apply map defined by a pair of bitmaps to another bitmap
Parameters
unsignedlong*dst
remapped result
constunsignedlong*src
subset to be remapped
constunsignedlong*old
defines domain of map
constunsignedlong*new
defines range of map
unsignedintnbits
number of bits in each of these bitmaps
Description
Letold andnew define a mapping of bit positions, such thatwhatever position is held by the n-th set bit inold is mappedto the n-th set bit innew. In the more general case, allowingfor the possibility that the weight ‘w’ ofnew is less than theweight ofold, map the position of the n-th set bit inold tothe position of the m-th set bit innew, where m == n % w.
If either of theold andnew bitmaps are empty, or ifsrc anddst point to the same location, then this routine copiessrctodst.
The positions of unset bits inold are mapped to themselves(the identity map).
Apply the above specified mapping tosrc, placing the result indst, clearing any bits previously set indst.
For example, lets say thatold has bits 4 through 7 set, andnew has bits 12 through 15 set. This defines the mapping of bitposition 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all otherbit positions unchanged. So if saysrc comes into this routinewith bits 1, 5 and 7 set, thendst should leave with bits 1,13 and 15 set.
- intbitmap_bitremap(intoldbit,constunsignedlong*old,constunsignedlong*new,intbits)¶
Apply map defined by a pair of bitmaps to a single bit
Parameters
intoldbit
bit position to be mapped
constunsignedlong*old
defines domain of map
constunsignedlong*new
defines range of map
intbits
number of bits in each of these bitmaps
Description
Letold andnew define a mapping of bit positions, such thatwhatever position is held by the n-th set bit inold is mappedto the n-th set bit innew. In the more general case, allowingfor the possibility that the weight ‘w’ ofnew is less than theweight ofold, map the position of the n-th set bit inold tothe position of the m-th set bit innew, where m == n % w.
The positions of unset bits inold are mapped to themselves(the identity map).
Apply the above specified mapping to bit positionoldbit, returningthe new bit position.
For example, lets say thatold has bits 4 through 7 set, andnew has bits 12 through 15 set. This defines the mapping of bitposition 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all otherbit positions unchanged. So if sayoldbit is 5, then this routinereturns 13.
- voidbitmap_from_arr32(unsignedlong*bitmap,constu32*buf,unsignedintnbits)¶
copy the contents of u32 array of bits to bitmap
Parameters
unsignedlong*bitmap
array of unsigned longs, the destination bitmap
constu32*buf
array of u32 (in host byte order), the source bitmap
unsignedintnbits
number of bits inbitmap
- voidbitmap_to_arr32(u32*buf,constunsignedlong*bitmap,unsignedintnbits)¶
copy the contents of bitmap to a u32 array of bits
Parameters
u32*buf
array of u32 (in host byte order), the dest bitmap
constunsignedlong*bitmap
array of unsigned longs, the source bitmap
unsignedintnbits
number of bits inbitmap
- voidbitmap_from_arr64(unsignedlong*bitmap,constu64*buf,unsignedintnbits)¶
copy the contents of u64 array of bits to bitmap
Parameters
unsignedlong*bitmap
array of unsigned longs, the destination bitmap
constu64*buf
array of u64 (in host byte order), the source bitmap
unsignedintnbits
number of bits inbitmap
- voidbitmap_to_arr64(u64*buf,constunsignedlong*bitmap,unsignedintnbits)¶
copy the contents of bitmap to a u64 array of bits
Parameters
u64*buf
array of u64 (in host byte order), the dest bitmap
constunsignedlong*bitmap
array of unsigned longs, the source bitmap
unsignedintnbits
number of bits inbitmap
- intbitmap_pos_to_ord(constunsignedlong*buf,unsignedintpos,unsignedintnbits)¶
find ordinal of set bit at given position in bitmap
Parameters
constunsignedlong*buf
pointer to a bitmap
unsignedintpos
a bit position inbuf (0 <=pos <nbits)
unsignedintnbits
number of valid bit positions inbuf
Description
Map the bit at positionpos inbuf (of lengthnbits) to theordinal of which set bit it is. If it is not set or ifposis not a valid bit position, map to -1.
If for example, just bits 4 through 7 are set inbuf, thenposvalues 4 through 7 will get mapped to 0 through 3, respectively,and otherpos values will get mapped to -1. Whenpos value 7gets mapped to (returns)ord value 3 in this example, that meansthat bit 7 is the 3rd (starting with 0th) set bit inbuf.
The bit positions 0 throughbits are valid positions inbuf.
- voidbitmap_onto(unsignedlong*dst,constunsignedlong*orig,constunsignedlong*relmap,unsignedintbits)¶
translate one bitmap relative to another
Parameters
unsignedlong*dst
resulting translated bitmap
constunsignedlong*orig
original untranslated bitmap
constunsignedlong*relmap
bitmap relative to which translated
unsignedintbits
number of bits in each of these bitmaps
Description
Set the n-th bit ofdst iff there exists some m such that then-th bit ofrelmap is set, the m-th bit oforig is set, andthe n-th bit ofrelmap is also the m-th _set_ bit ofrelmap.(If you understood the previous sentence the first time yourread it, you’re overqualified for your current job.)
In other words,orig is mapped onto (surjectively)dst,using the map { <n, m> | the n-th bit ofrelmap is them-th set bit ofrelmap }.
Any set bits inorig above bit number W, where W is theweight of (number of set bits in)relmap are mapped nowhere.In particular, if for all bits m set inorig, m >= W, thendst will end up empty. In situations where the possibilityof such an empty result is not desired, one way to avoid it isto use thebitmap_fold()
operator, below, to first fold theorig bitmap over itself so that all its set bits x are in therange 0 <= x < W. Thebitmap_fold()
operator does this bysetting the bit (m % W) indst, for each bit (m) set inorig.
- Example [1] for bitmap_onto():
Let’s sayrelmap has bits 30-39 set, andorig has bits1, 3, 5, 7, 9 and 11 set. Then on return from this routine,dst will have bits 31, 33, 35, 37 and 39 set.
When bit 0 is set inorig, it means turn on the bit indst corresponding to whatever is the first bit (if any)that is turned on inrelmap. Since bit 0 was off in theabove example, we leave off that bit (bit 30) indst.
When bit 1 is set inorig (as in the above example), itmeans turn on the bit indst corresponding to whateveris the second bit that is turned on inrelmap. The secondbit inrelmap that was turned on in the above example wasbit 31, so we turned on bit 31 indst.
Similarly, we turned on bits 33, 35, 37 and 39 indst,because they were the 4th, 6th, 8th and 10th set bitsset inrelmap, and the 4th, 6th, 8th and 10th bits oforig (i.e. bits 3, 5, 7 and 9) were also set.
When bit 11 is set inorig, it means turn on the bit indst corresponding to whatever is the twelfth bit that isturned on inrelmap. In the above example, there wereonly ten bits turned on inrelmap (30..39), so that bit11 was set inorig had no affect ondst.
- Example [2] for bitmap_fold() + bitmap_onto():
Let’s sayrelmap has these ten bits set:
40 41 42 43 45 48 53 61 74 95
(for the curious, that’s 40 plus the first ten terms of theFibonacci sequence.)
Further lets say we use the following code, invoking
bitmap_fold()
then bitmap_onto, as suggested above toavoid the possibility of an emptydst result:unsigned long *tmp; // a temporary bitmap's bitsbitmap_fold(tmp, orig, bitmap_weight(relmap, bits), bits);bitmap_onto(dst, tmp, relmap, bits);
Then this table shows what various values ofdst would be, forvariousorig’s. I list the zero-based positions of each set bit.The tmp column shows the intermediate result, as computed byusing
bitmap_fold()
to fold theorig bitmap modulo ten(the weight ofrelmap):
For these marked lines, if we hadn’t first donebitmap_fold()
into tmp, then thedst result would have been empty.
If either oforig orrelmap is empty (no set bits), thendstwill be returned empty.
If (as explained above) the only set bits inorig are in positionsm where m >= W, (where W is the weight ofrelmap) thendst willonce again be returned empty.
All bits indst not set by the above rule are cleared.
- voidbitmap_fold(unsignedlong*dst,constunsignedlong*orig,unsignedintsz,unsignedintnbits)¶
fold larger bitmap into smaller, modulo specified size
Parameters
unsignedlong*dst
resulting smaller bitmap
constunsignedlong*orig
original larger bitmap
unsignedintsz
specified size
unsignedintnbits
number of bits in each of these bitmaps
Description
For each bit oldbit inorig, set bit oldbit modsz indst.Clear all other bits indst. See further the comment andExample [2] forbitmap_onto()
for why and how to use this.
- unsignedlongbitmap_find_next_zero_area(unsignedlong*map,unsignedlongsize,unsignedlongstart,unsignedintnr,unsignedlongalign_mask)¶
find a contiguous aligned zero area
Parameters
unsignedlong*map
The address to base the search on
unsignedlongsize
The bitmap size in bits
unsignedlongstart
The bitnumber to start searching at
unsignedintnr
The number of zeroed bits we’re looking for
unsignedlongalign_mask
Alignment mask for zero area
Description
Thealign_mask should be one less than a power of 2; the effect is thatthe bit offset of all zero areas this function finds is multiples of thatpower of 2. Aalign_mask of 0 means no alignment is required.
- boolbitmap_or_equal(constunsignedlong*src1,constunsignedlong*src2,constunsignedlong*src3,unsignedintnbits)¶
Check whether the or of two bitmaps is equal to a third
Parameters
constunsignedlong*src1
Pointer to bitmap 1
constunsignedlong*src2
Pointer to bitmap 2 will be or’ed with bitmap 1
constunsignedlong*src3
Pointer to bitmap 3. Compare to the result of*src1 |*src2
unsignedintnbits
number of bits in each of these bitmaps
Return
True if (*src1 |*src2) ==*src3, false otherwise
- voidbitmap_scatter(unsignedlong*dst,constunsignedlong*src,constunsignedlong*mask,unsignedintnbits)¶
Scatter a bitmap according to the given mask
Parameters
unsignedlong*dst
scattered bitmap
constunsignedlong*src
gathered bitmap
constunsignedlong*mask
mask representing bits to assign to in the scattered bitmap
unsignedintnbits
number of bits in each of these bitmaps
Description
Scatters bitmap with sequential bits according to the givenmask.
Or in binary formsrcmaskdst0000000001011010 0001001100010011 0000001100000010
(Bits 0, 1, 2, 3, 4, 5 are copied to the bits 0, 1, 4, 8, 9, 12)
A more ‘visual’ description of the operation:
src: 0000000001011010 |||||| +------+||||| | +----+|||| | |+----+||| | || +-+|| | || | ||mask: ...v..vv...v..vv ...0..11...0..10dst: 0000001100000010
A relationship exists betweenbitmap_scatter()
andbitmap_gather()
. Seebitmap_gather()
for the bitmap gather detailed operations. TL;DR:bitmap_gather()
can be seen as the ‘reverse’bitmap_scatter()
operation.
Example
Ifsrc bitmap = 0x005a, withmask = 0x1313,dst will be 0x0302.
- voidbitmap_gather(unsignedlong*dst,constunsignedlong*src,constunsignedlong*mask,unsignedintnbits)¶
Gather a bitmap according to given mask
Parameters
unsignedlong*dst
gathered bitmap
constunsignedlong*src
scattered bitmap
constunsignedlong*mask
mask representing bits to extract from in the scattered bitmap
unsignedintnbits
number of bits in each of these bitmaps
Description
Gathers bitmap with sparse bits according to the givenmask.
Or in binary formsrcmaskdst0000001100000010 0001001100010011 0000000000011010
(Bits 0, 1, 4, 8, 9, 12 are copied to the bits 0, 1, 2, 3, 4, 5)
A more ‘visual’ description of the operation:
mask: ...v..vv...v..vvsrc: 0000001100000010 ^ ^^ ^ 0 | || | 10 | || > 010 | |+--> 1010 | +--> 11010 +----> 011010dst: 0000000000011010
A relationship exists betweenbitmap_gather()
andbitmap_scatter()
. Seebitmap_scatter()
for the bitmap scatter detailed operations. TL;DR:bitmap_scatter()
can be seen as the ‘reverse’bitmap_gather()
operation.
Suppose scattered computed using bitmap_scatter(scattered, src, mask, n).The operation bitmap_gather(result, scattered, mask, n) leads to a resultequal or equivalent to src.
The result can be ‘equivalent’ becausebitmap_scatter()
andbitmap_gather()
are not bijective.The result and src values are equivalent in that sense that a call tobitmap_scatter(res, src, mask, n) and a call tobitmap_scatter(res, result, mask, n) will lead to the same res value.
Example
Ifsrc bitmap = 0x0302, withmask = 0x1313,dst will be 0x001a.
- voidbitmap_release_region(unsignedlong*bitmap,unsignedintpos,intorder)¶
release allocated bitmap region
Parameters
unsignedlong*bitmap
array of unsigned longs corresponding to the bitmap
unsignedintpos
beginning of bit region to release
intorder
region size (log base 2 of number of bits) to release
Description
This is the complement to __bitmap_find_free_region() and releasesthe found region (by clearing it in the bitmap).
- intbitmap_allocate_region(unsignedlong*bitmap,unsignedintpos,intorder)¶
allocate bitmap region
Parameters
unsignedlong*bitmap
array of unsigned longs corresponding to the bitmap
unsignedintpos
beginning of bit region to allocate
intorder
region size (log base 2 of number of bits) to allocate
Description
Allocate (set bits in) a specified region of a bitmap.
Return
0 on success, or-EBUSY
if specified region wasn’tfree (not all bits were zero).
- intbitmap_find_free_region(unsignedlong*bitmap,unsignedintbits,intorder)¶
find a contiguous aligned mem region
Parameters
unsignedlong*bitmap
array of unsigned longs corresponding to the bitmap
unsignedintbits
number of bits in the bitmap
intorder
region size (log base 2 of number of bits) to find
Description
Find a region of free (zero) bits in abitmap ofbits bits andallocate them (set them to one). Only consider regions of lengtha power (order) of two, aligned to that power of two, whichmakes the search algorithm much faster.
Return
the bit offset in bitmap of the allocated region,or -errno on failure.
- BITMAP_FROM_U64¶
BITMAP_FROM_U64(n)
Represent u64 value in the format suitable for bitmap.
Parameters
n
u64 value
Description
Linux bitmaps are internally arrays of unsigned longs, i.e. 32-bitintegers in 32-bit environment, and 64-bit integers in 64-bit one.
There are four combinations of endianness and length of the word in linuxABIs: LE64, BE64, LE32 and BE32.
On 64-bit kernels 64-bit LE and BE numbers are naturally ordered inbitmaps and therefore don’t require any special handling.
On 32-bit kernels 32-bit LE ABI orders lo word of 64-bit number in memoryprior to hi, and 32-bit BE orders hi word prior to lo. The bitmap on theother hand is represented as an array of 32-bit words and the position ofbit N may therefore be calculated as: word #(N/32) and bit #(N``32``) in thatword. For example, bit #42 is located at 10th position of 2nd word.It matches 32-bit LE ABI, and we can simply let the compiler store 64-bitvalues in memory as it usually does. But for BE we need to swap hi and lowords manually.
With all that, the macroBITMAP_FROM_U64()
does explicit reordering of hi andlo parts of u64. For LE32 it does nothing, and for BE environment it swapshi and lo words, as is expected by bitmap.
- voidbitmap_from_u64(unsignedlong*dst,u64mask)¶
Check and swap words within u64.
Parameters
unsignedlong*dst
destination bitmap
u64mask
source bitmap
Description
In 32-bit Big Endian kernel, when using(u32*)(:c:type:`val`)[*]
to read u64 mask, we will get the wrong word.That is(u32*)(:c:type:`val`)[0]
gets the upper 32 bits,but we expect the lower 32-bits of u64.
- unsignedlongbitmap_read(constunsignedlong*map,unsignedlongstart,unsignedlongnbits)¶
read a value of n-bits from the memory region
Parameters
constunsignedlong*map
address to the bitmap memory region
unsignedlongstart
bit offset of the n-bit value
unsignedlongnbits
size of value in bits, nonzero, up to BITS_PER_LONG
Return
value ofnbits bits located at thestart bit offset within themap memory region. Fornbits = 0 andnbits > BITS_PER_LONG the returnvalue is undefined.
- voidbitmap_write(unsignedlong*map,unsignedlongvalue,unsignedlongstart,unsignedlongnbits)¶
write n-bit value within a memory region
Parameters
unsignedlong*map
address to the bitmap memory region
unsignedlongvalue
value to write, clamped to nbits
unsignedlongstart
bit offset of the n-bit value
unsignedlongnbits
size of value in bits, nonzero, up to BITS_PER_LONG.
Description
bitmap_write()
behaves as-if implemented asnbits calls of __assign_bit(),i.e. bits beyondnbits are ignored:
- for (bit = 0; bit < nbits; bit++)
__assign_bit(start + bit, bitmap, val & BIT(bit));
Fornbits == 0 andnbits > BITS_PER_LONG no writes are performed.
Command-line Parsing¶
- intget_option(char**str,int*pint)¶
Parse integer from an option string
Parameters
char**str
option string
int*pint
(optional output) integer value parsed fromstr
Read an int from an option string; if available accept a subsequentcomma as well.
Whenpint is NULL the function can be used as a validator ofthe current option in the string.
Return values:0 - no int in string1 - int found, no subsequent comma2 - int found including a subsequent comma3 - hyphen found to denote a range
Leading hyphen without integer is no integer case, but we consume itfor the sake of simplification.
- char*get_options(constchar*str,intnints,int*ints)¶
Parse a string into a list of integers
Parameters
constchar*str
String to be parsed
intnints
size of integer array
int*ints
integer array (must have room for at least one element)
This function parses a string containing a comma-separatedlist of integers, a hyphen-separated range of _positive_ integers,or a combination of both. The parse halts when the array isfull, or when no more numbers can be retrieved from thestring.
Whennints is 0, the function just validates the givenstr andreturns the amount of parseable integers as described below.
Return
The first element is filled by the number of collected integersin the range. The rest is what was parsed from thestr.
Return value is the character in the string which causedthe parse to end (typically a null terminator, ifstr iscompletely parseable).
- unsignedlonglongmemparse(constchar*ptr,char**retptr)¶
parse a string with mem suffixes into a number
Parameters
constchar*ptr
Where parse begins
char**retptr
(output) Optional pointer to next char after parse completes
Parses a string into a number. The number stored atptr ispotentially suffixed with K, M, G, T, P, E.
Error Pointers¶
- IS_ERR_VALUE¶
IS_ERR_VALUE(x)
Detect an error pointer.
Parameters
x
The pointer to check.
Description
LikeIS_ERR()
, but does not generate a compiler warning if result is unused.
- void*ERR_PTR(longerror)¶
Create an error pointer.
Parameters
longerror
A negative error code.
Description
Encodeserror into a pointer value. Users should consider the resultopaque and not assume anything about how the error is encoded.
Return
A pointer witherror encoded within its value.
- longPTR_ERR(__forceconstvoid*ptr)¶
Extract the error code from an error pointer.
Parameters
__forceconstvoid*ptr
An error pointer.
Return
The error code withinptr.
- boolIS_ERR(__forceconstvoid*ptr)¶
Detect an error pointer.
Parameters
__forceconstvoid*ptr
The pointer to check.
Return
true ifptr is an error pointer, false otherwise.
- boolIS_ERR_OR_NULL(__forceconstvoid*ptr)¶
Detect an error pointer or a null pointer.
Parameters
__forceconstvoid*ptr
The pointer to check.
Description
LikeIS_ERR()
, but also returns true for a null pointer.
- void*ERR_CAST(__forceconstvoid*ptr)¶
Explicitly cast an error-valued pointer to another pointer type
Parameters
__forceconstvoid*ptr
The pointer to cast.
Description
Explicitly cast an error-valued pointer to another pointer type in such away as to make it clear that’s what’s going on.
- intPTR_ERR_OR_ZERO(__forceconstvoid*ptr)¶
Extract the error code from a pointer if it has one.
Parameters
__forceconstvoid*ptr
A potential error pointer.
Description
Convenience function that can be used inside a function that returnsan error code to propagate errors received as error pointers.For example,returnPTR_ERR_OR_ZERO(ptr);
replaces:
if(IS_ERR(ptr))returnPTR_ERR(ptr);elsereturn0;
Return
The error code withinptr if it is an error pointer; 0 otherwise.
Sorting¶
- voidsort_r(void*base,size_tnum,size_tsize,cmp_r_func_tcmp_func,swap_r_func_tswap_func,constvoid*priv)¶
sort an array of elements
Parameters
void*base
pointer to data to sort
size_tnum
number of elements
size_tsize
size of each element
cmp_r_func_tcmp_func
pointer to comparison function
swap_r_func_tswap_func
pointer to swap function or NULL
constvoid*priv
third argument passed to comparison function
Description
This function does a heapsort on the given array. You may providea swap_func function if you need to do something more than a memorycopy (e.g. fix up pointers or auxiliary data), but the built-in swapavoids a slow retpoline and so is significantly faster.
The comparison function must adhere to specific mathematicalproperties to ensure correct and stable sorting:- Antisymmetry: cmp_func(a, b) must return the opposite sign ofcmp_func(b, a).- Transitivity: if cmp_func(a, b) <= 0 and cmp_func(b, c) <= 0, thencmp_func(a, c) <= 0.
Sorting time is O(n log n) both on average and worst-case. Whilequicksort is slightly faster on average, it suffers from exploitableO(n*n) worst-case behavior and extra memory requirements that makeit less suitable for kernel use.
- voidsort_r_nonatomic(void*base,size_tnum,size_tsize,cmp_r_func_tcmp_func,swap_r_func_tswap_func,constvoid*priv)¶
sort an array of elements, with cond_resched
Parameters
void*base
pointer to data to sort
size_tnum
number of elements
size_tsize
size of each element
cmp_r_func_tcmp_func
pointer to comparison function
swap_r_func_tswap_func
pointer to swap function or NULL
constvoid*priv
third argument passed to comparison function
Description
Same as sort_r, but preferred for larger arrays as it does a periodiccond_resched().
- voidlist_sort(void*priv,structlist_head*head,list_cmp_func_tcmp)¶
sort a list
Parameters
void*priv
private data, opaque to
list_sort()
, passed tocmpstructlist_head*head
the list to sort
list_cmp_func_tcmp
the elements comparison function
Description
The comparison functioncmp must return > 0 ifa should sort afterb (”a >b” if you want an ascending sort), and <= 0 ifa shouldsort beforebor their original order should be preserved. It isalways called with the element that came first in the input ina,and list_sort is a stable sort, so it is not necessary to distinguishthea <b anda ==b cases.
The comparison function must adhere to specific mathematical propertiesto ensure correct and stable sorting:- Antisymmetry: cmp(a,b) must return the opposite sign ofcmp(b,a).- Transitivity: if cmp(a,b) <= 0 and cmp(b,c) <= 0, thencmp(a,c) <= 0.
This is compatible with two styles ofcmp function:- The traditional style which returns <0 / =0 / >0, or- Returning a boolean 0/1.The latter offers a chance to save a few cycles in the comparison(which is used by e.g. plug_ctx_cmp() in block/blk-mq.c).
A good way to write a multi-word comparison is:
if (a->high != b->high) return a->high > b->high;if (a->middle != b->middle) return a->middle > b->middle;return a->low > b->low;
This mergesort is as eager as possible while always performing at least2:1 balanced merges. Given two pending sublists of size 2^k, they aremerged to a size-2^(k+1) list as soon as we have 2^k following elements.
Thus, it will avoid cache thrashing as long as 3*2^k elements canfit into the cache. Not quite as good as a fully-eager bottom-upmergesort, but it does use 0.2*n fewer comparisons, so is faster inthe common case that everything fits into L1.
The merging is controlled by “count”, the number of elements in thepending lists. This is beautifully simple code, but rather subtle.
Each time we increment “count”, we set one bit (bit k) and clearbits k-1 .. 0. Each time this happens (except the very first timefor each bit, when count increments to 2^k), we merge two lists ofsize 2^k into one list of size 2^(k+1).
This merge happens exactly when the count reaches an odd multiple of2^k, which is when we have 2^k elements pending in smaller lists,so it’s safe to merge away two lists of size 2^k.
After this happens twice, we have created two lists of size 2^(k+1),which will be merged into a list of size 2^(k+2) before we createa third list of size 2^(k+1), so there are never more than two pending.
The number of pending lists of size 2^k is determined by thestate of bit k of “count” plus two extra pieces of information:
The state of bit k-1 (when k == 0, consider bit -1 always set), and
Whether the higher-order bits are zero or non-zero (i.e.is count >= 2^(k+1)).
There are six states we distinguish. “x” represents some arbitrarybits, and “y” represents some arbitrary non-zero bits:0: 00x: 0 pending of size 2^k; x pending of sizes < 2^k1: 01x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k2: x10x: 0 pending of size 2^k; 2^k + x pending of sizes < 2^k3: x11x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k4: y00x: 1 pending of size 2^k; 2^k + x pending of sizes < 2^k5: y01x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k(merge and loop back to state 2)
We gain lists of size 2^k in the 2->3 and 4->5 transitions (becausebit k-1 is set while the more significant bits are non-zero) andmerge them away in the 5->2 transition. Note in particular that justbefore the 5->2 transition, all lower-order bits are 11 (state 3),so there is one list of each smaller size.
When we reach the end of the input, we merge all the pendinglists, from smallest to largest. If you work through cases 2 to5 above, you can see that the number of elements we merge with a listof size 2^k varies from 2^(k-1) (cases 3 and 5 when x == 0) to2^(k+1) - 1 (second merge of case 5 when x == 2^(k-1) - 1).
Text Searching¶
INTRODUCTION
The textsearch infrastructure provides text searching facilities forboth linear and non-linear data. Individual search algorithms areimplemented in modules and chosen by the user.
ARCHITECTURE
User +----------------+ | finish()|<--------------(6)-----------------+ |get_next_block()|<--------------(5)---------------+ | | | Algorithm | | | | +------------------------------+ | | | init() find() destroy() | | | +------------------------------+ | | Core API ^ ^ ^ | | +---------------+ (2) (4) (8) | (1)|----->| prepare() |---+ | | | (3)|----->| find()/next() |-----------+ | | (7)|----->| destroy() |----------------------+ +----------------+ +---------------+(1) User configures a search by calling textsearch_prepare() specifying the search parameters such as the pattern and algorithm name.(2) Core requests the algorithm to allocate and initialize a search configuration according to the specified parameters.(3) User starts the search(es) by calling textsearch_find() or textsearch_next() to fetch subsequent occurrences. A state variable is provided to the algorithm to store persistent variables.(4) Core eventually resets the search offset and forwards the find() request to the algorithm.(5) Algorithm calls get_next_block() provided by the user continuously to fetch the data to be searched in block by block.(6) Algorithm invokes finish() after the last call to get_next_block to clean up any leftovers from get_next_block. (Optional)(7) User destroys the configuration by calling textsearch_destroy().(8) Core notifies the algorithm to destroy algorithm specific allocations. (Optional)
USAGE
Before a search can be performed, a configuration must be createdby calling
textsearch_prepare()
specifying the searching algorithm,the pattern to look for and flags. As a flag, you can set TS_IGNORECASEto perform case insensitive matching. But it might slow downperformance of algorithm, so you should use it at own your risk.The returned configuration may then be used for an arbitraryamount of times and even in parallel as long as a separate structts_state variable is provided to every instance.The actual search is performed by either calling
textsearch_find_continuous()
for linear data or by providingan own get_next_block() implementation andcallingtextsearch_find()
. Both functions returnthe position of the first occurrence of the pattern or UINT_MAX ifno match was found. Subsequent occurrences can be found by callingtextsearch_next()
regardless of the linearity of the data.Once you’re done using a configuration it must be given back viatextsearch_destroy.
EXAMPLE:
int pos;struct ts_config *conf;struct ts_state state;const char *pattern = "chicken";const char *example = "We dance the funky chicken";conf = textsearch_prepare("kmp", pattern, strlen(pattern), GFP_KERNEL, TS_AUTOLOAD);if (IS_ERR(conf)) { err = PTR_ERR(conf); goto errout;}pos = textsearch_find_continuous(conf, &state, example, strlen(example));if (pos != UINT_MAX) panic("Oh my god, dancing chickens at %d\n", pos);textsearch_destroy(conf);
- inttextsearch_register(structts_ops*ops)¶
register a textsearch module
Parameters
structts_ops*ops
operations lookup table
Description
This function must be called by textsearch modules to announcetheir presence. The specified &**ops** must havename
set to aunique identifier and the callbacks find(), init(), get_pattern(),and get_pattern_len() must be implemented.
Returns 0 or -EEXISTS if another module has already registeredwith same name.
- inttextsearch_unregister(structts_ops*ops)¶
unregister a textsearch module
Parameters
structts_ops*ops
operations lookup table
Description
This function must be called by textsearch modules to announcetheir disappearance for examples when the module gets unloaded.Theops
parameter must be the same as the one during theregistration.
Returns 0 on success or -ENOENT if no matching textsearchregistration was found.
- unsignedinttextsearch_find_continuous(structts_config*conf,structts_state*state,constvoid*data,unsignedintlen)¶
search a pattern in continuous/linear data
Parameters
structts_config*conf
search configuration
structts_state*state
search state
constvoid*data
data to search in
unsignedintlen
length of data
Description
A simplified version oftextsearch_find()
for continuous/linear data.Calltextsearch_next()
to retrieve subsequent matches.
Returns the position of first occurrence of the pattern orUINT_MAX
if no occurrence was found.
- structts_config*textsearch_prepare(constchar*algo,constvoid*pattern,unsignedintlen,gfp_tgfp_mask,intflags)¶
Prepare a search
Parameters
constchar*algo
name of search algorithm
constvoid*pattern
pattern data
unsignedintlen
length of pattern
gfp_tgfp_mask
allocation mask
intflags
search flags
Description
Looks up the search algorithm module and creates a new textsearchconfiguration for the specified pattern.
Returns a new textsearch configuration according to the specifiedparameters or aERR_PTR()
. If a zero length pattern is passed, thisfunction returns EINVAL.
Note
- The format of the pattern may not be compatible between
the various search algorithms.
- voidtextsearch_destroy(structts_config*conf)¶
destroy a search configuration
Parameters
structts_config*conf
search configuration
Description
Releases all references of the configuration and freesup the memory.
- unsignedinttextsearch_next(structts_config*conf,structts_state*state)¶
continue searching for a pattern
Parameters
structts_config*conf
search configuration
structts_state*state
search state
Description
Continues a search looking for more occurrences of the pattern.textsearch_find()
must be called to find the first occurrencein order to reset the state.
Returns the position of the next occurrence of the pattern orUINT_MAX if not match was found.
- unsignedinttextsearch_find(structts_config*conf,structts_state*state)¶
start searching for a pattern
Parameters
structts_config*conf
search configuration
structts_state*state
search state
Description
Returns the position of first occurrence of the pattern orUINT_MAX if no match was found.
- void*textsearch_get_pattern(structts_config*conf)¶
return head of the pattern
Parameters
structts_config*conf
search configuration
- unsignedinttextsearch_get_pattern_len(structts_config*conf)¶
return length of the pattern
Parameters
structts_config*conf
search configuration
CRC and Math Functions in Linux¶
Arithmetic Overflow Checking¶
- check_add_overflow¶
check_add_overflow(a,b,d)
Calculate addition with overflow checking
Parameters
a
first addend
b
second addend
d
pointer to store sum
Description
Returns true on wrap-around, false otherwise.
*d holds the results of the attempted addition, regardless of whetherwrap-around occurred.
- wrapping_add¶
wrapping_add(type,a,b)
Intentionally perform a wrapping addition
Parameters
type
type for result of calculation
a
first addend
b
second addend
Description
Return the potentially wrapped-around addition withouttripping any wrap-around sanitizers that may be enabled.
- wrapping_assign_add¶
wrapping_assign_add(var,offset)
Intentionally perform a wrapping increment assignment
Parameters
var
variable to be incremented
offset
amount to add
Description
Incrementsvar byoffset with wrap-around. Returns the resultingvalue ofvar. Will not trip any wrap-around sanitizers.
Returns the new value ofvar.
- check_sub_overflow¶
check_sub_overflow(a,b,d)
Calculate subtraction with overflow checking
Parameters
a
minuend; value to subtract from
b
subtrahend; value to subtract froma
d
pointer to store difference
Description
Returns true on wrap-around, false otherwise.
*d holds the results of the attempted subtraction, regardless of whetherwrap-around occurred.
- wrapping_sub¶
wrapping_sub(type,a,b)
Intentionally perform a wrapping subtraction
Parameters
type
type for result of calculation
a
minuend; value to subtract from
b
subtrahend; value to subtract froma
Description
Return the potentially wrapped-around subtraction withouttripping any wrap-around sanitizers that may be enabled.
- wrapping_assign_sub¶
wrapping_assign_sub(var,offset)
Intentionally perform a wrapping decrement assign
Parameters
var
variable to be decremented
offset
amount to subtract
Description
Decrementsvar byoffset with wrap-around. Returns the resultingvalue ofvar. Will not trip any wrap-around sanitizers.
Returns the new value ofvar.
- check_mul_overflow¶
check_mul_overflow(a,b,d)
Calculate multiplication with overflow checking
Parameters
a
first factor
b
second factor
d
pointer to store product
Description
Returns true on wrap-around, false otherwise.
*d holds the results of the attempted multiplication, regardless of whetherwrap-around occurred.
- wrapping_mul¶
wrapping_mul(type,a,b)
Intentionally perform a wrapping multiplication
Parameters
type
type for result of calculation
a
first factor
b
second factor
Description
Return the potentially wrapped-around multiplication withouttripping any wrap-around sanitizers that may be enabled.
- check_shl_overflow¶
check_shl_overflow(a,s,d)
Calculate a left-shifted value and check overflow
Parameters
a
Value to be shifted
s
How many bits left to shift
d
Pointer to where to store the result
Description
Computes*d = (a <<s)
Returns true if ‘*d’ cannot hold the result or when ‘a <<s’ doesn’tmake sense. Example conditions:
‘a <<s’ causes bits to be lost when stored in*d.
‘s’ is garbage (e.g. negative) or so large that the result of‘a <<s’ is guaranteed to be 0.
‘a’ is negative.
‘a <<s’ sets the sign bit, if any, in ‘*d’.
‘*d’ will hold the results of the attempted shift, but is notconsidered “safe for use” if true is returned.
- overflows_type¶
overflows_type(n,T)
helper for checking the overflows between value, variables, or data type
Parameters
n
source constant value or variable to be checked
T
destination variable or data type proposed to storex
Description
Compares thex expression for whether or not it can safely fit inthe storage of the type inT.x andT can have different types.Ifx is a constant expression, this will also resolve to a constantexpression.
Return
true if overflow can occur, false otherwise.
- castable_to_type¶
castable_to_type(n,T)
like __same_type(), but also allows for casted literals
Parameters
n
variable or constant value
T
variable or data type
Description
Unlike the __same_type() macro, this allows a constant value as thefirst argument. If this value would not overflow into an assignmentof the second argument’s type, it returns true. Otherwise, this fallsback to __same_type().
- size_tsize_mul(size_tfactor1,size_tfactor2)¶
Calculate size_t multiplication with saturation at SIZE_MAX
Parameters
size_tfactor1
first factor
size_tfactor2
second factor
Return
calculatefactor1 *factor2, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Thelvalue must be size_t to avoid implicit type conversion.
- size_tsize_add(size_taddend1,size_taddend2)¶
Calculate size_t addition with saturation at SIZE_MAX
Parameters
size_taddend1
first addend
size_taddend2
second addend
Return
calculateaddend1 +addend2, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Thelvalue must be size_t to avoid implicit type conversion.
- size_tsize_sub(size_tminuend,size_tsubtrahend)¶
Calculate size_t subtraction with saturation at SIZE_MAX
Parameters
size_tminuend
value to subtract from
size_tsubtrahend
value to subtract fromminuend
Return
calculateminuend -subtrahend, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Forcomposition with thesize_add()
andsize_mul()
helpers, neitherargument may be SIZE_MAX (or the result with be forced to SIZE_MAX).The lvalue must be size_t to avoid implicit type conversion.
- array_size¶
array_size(a,b)
Calculate size of 2-dimensional array.
Parameters
a
dimension one
b
dimension two
Description
Calculates size of 2-dimensional array:a *b.
Return
number of bytes needed to represent the array or SIZE_MAX onoverflow.
- array3_size¶
array3_size(a,b,c)
Calculate size of 3-dimensional array.
Parameters
a
dimension one
b
dimension two
c
dimension three
Description
Calculates size of 3-dimensional array:a *b *c.
Return
number of bytes needed to represent the array or SIZE_MAX onoverflow.
- flex_array_size¶
flex_array_size(p,member,count)
Calculate size of a flexible array member within an enclosing structure.
Parameters
p
Pointer to the structure.
member
Name of the flexible array member.
count
Number of elements in the array.
Description
Calculates size of a flexible array ofcount number ofmemberelements, at the end of structurep.
Return
number of bytes needed or SIZE_MAX on overflow.
- struct_size¶
struct_size(p,member,count)
Calculate size of structure with trailing flexible array.
Parameters
p
Pointer to the structure.
member
Name of the array member.
count
Number of elements in the array.
Description
Calculates size of memory needed for structure ofp followed by anarray ofcount number ofmember elements.
Return
number of bytes needed or SIZE_MAX on overflow.
- struct_size_t¶
struct_size_t(type,member,count)
Calculate size of structure with trailing flexible array
Parameters
type
structure type name.
member
Name of the array member.
count
Number of elements in the array.
Description
Calculates size of memory needed for structuretype followed by anarray ofcount number ofmember elements. Prefer usingstruct_size()
when possible instead, to keep calculations associated with a specificinstance variable of typetype.
Return
number of bytes needed or SIZE_MAX on overflow.
- __DEFINE_FLEX¶
__DEFINE_FLEX(type,name,member,count,trailer...)
helper macro for
DEFINE_FLEX()
family. Enables caller macro to pass arbitrary trailing expressions
Parameters
type
structure type name, including “struct” keyword.
name
Name for a variable to define.
member
Name of the array member.
count
Number of elements in the array; must be compile-time const.
trailer...
Trailing expressions for attributes and/or initializers.
- _DEFINE_FLEX¶
_DEFINE_FLEX(type,name,member,count,initializer...)
helper macro for
DEFINE_FLEX()
family. Enables caller macro to pass (different) initializer.
Parameters
type
structure type name, including “struct” keyword.
name
Name for a variable to define.
member
Name of the array member.
count
Number of elements in the array; must be compile-time const.
initializer...
Initializer expression (e.g., pass= { } at minimum).
- DEFINE_RAW_FLEX¶
DEFINE_RAW_FLEX(type,name,member,count)
Define an on-stack instance of structure with a trailing flexible array member, when it does not have a __counted_by annotation.
Parameters
type
structure type name, including “struct” keyword.
name
Name for a variable to define.
member
Name of the array member.
count
Number of elements in the array; must be compile-time const.
Description
Define a zeroed, on-stack, instance oftype structure with a trailingflexible array member.Use __struct_size(name) to get compile-time size of it afterwards.Use __member_size(name->member) to get compile-time size ofname members.Use STACK_FLEX_ARRAY_SIZE(name,member) to get compile-time number ofelements in arraymember.
- DEFINE_FLEX¶
DEFINE_FLEX(TYPE,NAME,MEMBER,COUNTER,COUNT)
Define an on-stack instance of structure with a trailing flexible array member.
Parameters
TYPE
structure type name, including “struct” keyword.
NAME
Name for a variable to define.
MEMBER
Name of the array member.
COUNTER
Name of the __counted_by member.
COUNT
Number of elements in the array; must be compile-time const.
Description
Define a zeroed, on-stack, instance ofTYPE structure with a trailingflexible array member.Use __struct_size(NAME) to get compile-time size of it afterwards.Use __member_size(NAME->member) to get compile-time size ofNAME members.Use STACK_FLEX_ARRAY_SIZE(name,member) to get compile-time number ofelements in arraymember.
- STACK_FLEX_ARRAY_SIZE¶
STACK_FLEX_ARRAY_SIZE(name,array)
helper macro for
DEFINE_FLEX()
family. Returns the number of elements inarray.
Parameters
name
Name for a variable defined in
DEFINE_RAW_FLEX()
/DEFINE_FLEX()
.array
Name of the array member.
CRC Functions¶
- uint8_tcrc4(uint8_tc,uint64_tx,intbits)¶
calculate the 4-bit crc of a value.
Parameters
uint8_tc
starting crc4
uint64_tx
value to checksum
intbits
number of bits inx to checksum
Description
Returns the crc4 value ofx, using polynomial 0b10111.
Thex value is treated as left-aligned, and bits abovebits are ignoredin the crc calculations.
- u8crc7_be(u8crc,constu8*buffer,size_tlen)¶
update the CRC7 for the data buffer
Parameters
u8crc
previous CRC7 value
constu8*buffer
data pointer
size_tlen
number of bytes in the buffer
Context
any
Description
Returns the updated CRC7 value.The CRC7 is left-aligned in the byte (the lsbit is always 0), as thatmakes the computation easier, and all callers want it in that form.
- voidcrc8_populate_msb(u8table[CRC8_TABLE_SIZE],u8polynomial)¶
fill crc table for given polynomial in reverse bit order.
Parameters
u8table[CRC8_TABLE_SIZE]
table to be filled.
u8polynomial
polynomial for which table is to be filled.
- voidcrc8_populate_lsb(u8table[CRC8_TABLE_SIZE],u8polynomial)¶
fill crc table for given polynomial in regular bit order.
Parameters
u8table[CRC8_TABLE_SIZE]
table to be filled.
u8polynomial
polynomial for which table is to be filled.
- u8crc8(constu8table[CRC8_TABLE_SIZE],constu8*pdata,size_tnbytes,u8crc)¶
calculate a crc8 over the given input data.
Parameters
constu8table[CRC8_TABLE_SIZE]
crc table used for calculation.
constu8*pdata
pointer to data buffer.
size_tnbytes
number of bytes in data buffer.
u8crc
previous returned crc8 value.
- u16crc16(u16crc,constu8*p,size_tlen)¶
compute the CRC-16 for the data buffer
Parameters
u16crc
previous CRC value
constu8*p
data pointer
size_tlen
number of bytes in the buffer
Description
Returns the updated CRC value.
- u32crc32_generic_shift(u32crc,size_tlen,u32polynomial)¶
Appendlen 0 bytes to crc, in logarithmic time
Parameters
u32crc
The original little-endian CRC (i.e. lsbit is x^31 coefficient)
size_tlen
The number of bytes.crc is multiplied by x^(8***len**)
u32polynomial
The modulus used to reduce the result to 32 bits.
Description
It’s possible to parallelize CRC computations by computing a CRCover separate ranges of a buffer, then summing them.This shifts the given CRC by 8*len bits (i.e. produces the same effectas appending len bytes of zero to the data), in time proportionalto log(len).
- u16crc_ccitt(u16crc,u8const*buffer,size_tlen)¶
recompute the CRC (CRC-CCITT variant) for the data buffer
Parameters
u16crc
previous CRC value
u8const*buffer
data pointer
size_tlen
number of bytes in the buffer
- u16crc_itu_t(u16crc,constu8*buffer,size_tlen)¶
Compute the CRC-ITU-T for the data buffer
Parameters
u16crc
previous CRC value
constu8*buffer
data pointer
size_tlen
number of bytes in the buffer
Description
Returns the updated CRC value
Base 2 log and power Functions¶
- boolis_power_of_2(unsignedlongn)¶
check if a value is a power of two
Parameters
unsignedlongn
the value to check
Description
Determine whether some value is a power of two, where zero isnot considered a power of two.
Return
true ifn is a power of 2, otherwise false.
- unsignedlong__roundup_pow_of_two(unsignedlongn)¶
round up to nearest power of two
Parameters
unsignedlongn
value to round up
- unsignedlong__rounddown_pow_of_two(unsignedlongn)¶
round down to nearest power of two
Parameters
unsignedlongn
value to round down
- const_ilog2¶
const_ilog2(n)
log base 2 of 32-bit or a 64-bit constant unsigned value
Parameters
n
parameter
Description
Use this where sparse expects a true constant expression, e.g. for arrayindices.
- ilog2¶
ilog2(n)
log base 2 of 32-bit or a 64-bit unsigned value
Parameters
n
parameter
Description
constant-capable log of base 2 calculation- this can be used to initialise global variables from constant data, hencethe massive ternary operator construction
selects the appropriately-sized optimised version depending on sizeof(n)
- roundup_pow_of_two¶
roundup_pow_of_two(n)
round the given value up to nearest power of two
Parameters
n
parameter
Description
round the given value up to the nearest power of two- the result is undefined when n == 0- this can be used to initialise global variables from constant data
- rounddown_pow_of_two¶
rounddown_pow_of_two(n)
round the given value down to nearest power of two
Parameters
n
parameter
Description
round the given value down to the nearest power of two- the result is undefined when n == 0- this can be used to initialise global variables from constant data
- order_base_2¶
order_base_2(n)
calculate the (rounded up) base 2 order of the argument
Parameters
n
parameter
Description
- The first few values calculated by this routine:
ob2(0) = 0ob2(1) = 0ob2(2) = 1ob2(3) = 2ob2(4) = 2ob2(5) = 3... and so on.
- bits_per¶
bits_per(n)
calculate the number of bits required for the argument
Parameters
n
parameter
Description
This is constant-capable and can be used for compile timeinitializations, e.g bitfields.
The first few values calculated by this routine:bf(0) = 1bf(1) = 1bf(2) = 2bf(3) = 2bf(4) = 3... and so on.
Integer log and power Functions¶
- unsignedintintlog2(u32value)¶
computes log2 of a value; the result is shifted left by 24 bits
Parameters
u32value
The value (must be != 0)
Description
to use rational values you can use the following method:
intlog2(value) = intlog2(value * 2^x) - x * 2^24
Some usecase examples:
intlog2(8) will give 3 << 24 = 3 * 2^24
intlog2(9) will give 3 << 24 + ... = 3.16... * 2^24
intlog2(1.5) = intlog2(3) - 2^24 = 0.584... * 2^24
Return
log2(value) * 2^24
- unsignedintintlog10(u32value)¶
computes log10 of a value; the result is shifted left by 24 bits
Parameters
u32value
The value (must be != 0)
Description
to use rational values you can use the following method:
intlog10(value) = intlog10(value * 10^x) - x * 2^24
An usecase example:
intlog10(1000) will give 3 << 24 = 3 * 2^24
due to the implementation intlog10(1000) might be not exactly 3 * 2^24
look at intlog2 for similar examples
Return
log10(value) * 2^24
- u64int_pow(u64base,unsignedintexp)¶
computes the exponentiation of the given base and exponent
Parameters
u64base
base which will be raised to the given power
unsignedintexp
power to be raised to
Description
Computes: pow(base, exp), i.e.base raised to theexp power
- unsignedlongint_sqrt(unsignedlongx)¶
computes the integer square root
Parameters
unsignedlongx
integer of which to calculate the sqrt
Description
Computes: floor(sqrt(x))
- u32int_sqrt64(u64x)¶
strongly typed int_sqrt function when minimum 64 bit input is expected.
Parameters
u64x
64bit integer of which to calculate the sqrt
Division Functions¶
- do_div¶
do_div(n,base)
returns 2 values: calculate remainder and update new dividend
Parameters
n
uint64_t dividend (will be updated)
base
uint32_t divisor
Description
Summary:uint32_tremainder=n%base;
n=n/base;
Return
(uint32_t)remainder
NOTE
macro parametern is evaluated multiple times,beware of side effects!
- u64div_u64_rem(u64dividend,u32divisor,u32*remainder)¶
unsigned 64bit divide with 32bit divisor with remainder
Parameters
u64dividend
unsigned 64bit dividend
u32divisor
unsigned 32bit divisor
u32*remainder
pointer to unsigned 32bit remainder
Return
sets*remainder
, then returns dividend / divisor
Description
This is commonly provided by 32bit archs to provide an optimized 64bitdivide.
- s64div_s64_rem(s64dividend,s32divisor,s32*remainder)¶
signed 64bit divide with 32bit divisor with remainder
Parameters
s64dividend
signed 64bit dividend
s32divisor
signed 32bit divisor
s32*remainder
pointer to signed 32bit remainder
Return
sets*remainder
, then returns dividend / divisor
- u64div64_u64_rem(u64dividend,u64divisor,u64*remainder)¶
unsigned 64bit divide with 64bit divisor and remainder
Parameters
u64dividend
unsigned 64bit dividend
u64divisor
unsigned 64bit divisor
u64*remainder
pointer to unsigned 64bit remainder
Return
sets*remainder
, then returns dividend / divisor
- u64div64_u64(u64dividend,u64divisor)¶
unsigned 64bit divide with 64bit divisor
Parameters
u64dividend
unsigned 64bit dividend
u64divisor
unsigned 64bit divisor
Return
dividend / divisor
- s64div64_s64(s64dividend,s64divisor)¶
signed 64bit divide with 64bit divisor
Parameters
s64dividend
signed 64bit dividend
s64divisor
signed 64bit divisor
Return
dividend / divisor
- u64div_u64(u64dividend,u32divisor)¶
unsigned 64bit divide with 32bit divisor
Parameters
u64dividend
unsigned 64bit dividend
u32divisor
unsigned 32bit divisor
Description
This is the most common 64bit divide and should be used if possible,as many 32bit archs can optimize this variant better than a full 64bitdivide.
Return
dividend / divisor
- s64div_s64(s64dividend,s32divisor)¶
signed 64bit divide with 32bit divisor
Parameters
s64dividend
signed 64bit dividend
s32divisor
signed 32bit divisor
Return
dividend / divisor
- DIV64_U64_ROUND_UP¶
DIV64_U64_ROUND_UP(ll,d)
unsigned 64bit divide with 64bit divisor rounded up
Parameters
ll
unsigned 64bit dividend
d
unsigned 64bit divisor
Description
Divide unsigned 64bit dividend by unsigned 64bit divisorand round up.
Return
dividend / divisor rounded up
- DIV_U64_ROUND_UP¶
DIV_U64_ROUND_UP(ll,d)
unsigned 64bit divide with 32bit divisor rounded up
Parameters
ll
unsigned 64bit dividend
d
unsigned 32bit divisor
Description
Divide unsigned 64bit dividend by unsigned 32bit divisorand round up.
Return
dividend / divisor rounded up
- DIV64_U64_ROUND_CLOSEST¶
DIV64_U64_ROUND_CLOSEST(dividend,divisor)
unsigned 64bit divide with 64bit divisor rounded to nearest integer
Parameters
dividend
unsigned 64bit dividend
divisor
unsigned 64bit divisor
Description
Divide unsigned 64bit dividend by unsigned 64bit divisorand round to closest integer.
Return
dividend / divisor rounded to nearest integer
- DIV_U64_ROUND_CLOSEST¶
DIV_U64_ROUND_CLOSEST(dividend,divisor)
unsigned 64bit divide with 32bit divisor rounded to nearest integer
Parameters
dividend
unsigned 64bit dividend
divisor
unsigned 32bit divisor
Description
Divide unsigned 64bit dividend by unsigned 32bit divisorand round to closest integer.
Return
dividend / divisor rounded to nearest integer
- DIV_S64_ROUND_CLOSEST¶
DIV_S64_ROUND_CLOSEST(dividend,divisor)
signed 64bit divide with 32bit divisor rounded to nearest integer
Parameters
dividend
signed 64bit dividend
divisor
signed 32bit divisor
Description
Divide signed 64bit dividend by signed 32bit divisorand round to closest integer.
Return
dividend / divisor rounded to nearest integer
- u64roundup_u64(u64x,u32y)¶
Round up a 64bit value to the next specified 32bit multiple
Parameters
u64x
the value to up
u32y
32bit multiple to round up to
Description
Roundsx to the next multiple ofy. For 32bitx values, see roundup andthe faster round_up() for powers of 2.
Return
rounded up value.
- unsignedlonggcd(unsignedlonga,unsignedlongb)¶
calculate and return the greatest common divisor of 2 unsigned longs
Parameters
unsignedlonga
first value
unsignedlongb
second value
UUID/GUID¶
- voidgenerate_random_uuid(unsignedcharuuid[16])¶
generate a random UUID
Parameters
unsignedcharuuid[16]
where to put the generated UUID
Description
Random UUID interface
Used to create a Boot ID or a filesystem UUID/GUID, but can beuseful for other kernel drivers.
- booluuid_is_valid(constchar*uuid)¶
checks if a UUID string is valid
Parameters
constchar*uuid
UUID string to check
Description
- It checks if the UUID string is following the format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
where x is a hex digit.
Return
true if input is valid UUID string.
Kernel IPC facilities¶
IPC utilities¶
- intipc_init(void)¶
initialise ipc subsystem
Parameters
void
no arguments
Description
The various sysv ipc resources (semaphores, messages and sharedmemory) are initialised.
A callback routine is registered into the memory hotplug notifierchain: since msgmni scales to lowmem this callback routine will becalled upon successful memory add / remove to recompute msmgni.
- voidipc_init_ids(structipc_ids*ids)¶
initialise ipc identifiers
Parameters
structipc_ids*ids
ipc identifier set
Description
Set up the sequence range to use for the ipc identifier range (limitedbelow ipc_mni) then initialise the keys hashtable and ids idr.
- voidipc_init_proc_interface(constchar*path,constchar*header,intids,int(*show)(structseq_file*,void*))¶
create a proc interface for sysipc types using a seq_file interface.
Parameters
constchar*path
Path in procfs
constchar*header
Banner to be printed at the beginning of the file.
intids
ipc id table to iterate.
int(*show)(structseq_file*,void*)
show routine.
- structkern_ipc_perm*ipc_findkey(structipc_ids*ids,key_tkey)¶
find a key in an ipc identifier set
Parameters
structipc_ids*ids
ipc identifier set
key_tkey
key to find
Description
Returns the locked pointer to the ipc structure if found or NULLotherwise. If key is found ipc points to the owning ipc structure
Called with writer ipc_ids.rwsem held.
- intipc_addid(structipc_ids*ids,structkern_ipc_perm*new,intlimit)¶
add an ipc identifier
Parameters
structipc_ids*ids
ipc identifier set
structkern_ipc_perm*new
new ipc permission set
intlimit
limit for the number of used ids
Description
Add an entry ‘new’ to the ipc ids idr. The permissions object isinitialised and the first free entry is set up and the index assignedis returned. The ‘new’ entry is returned in a locked state on success.
On failure the entry is not locked and a negative err-code is returned.The caller must use ipc_rcu_putref() to free the identifier.
Called with writer ipc_ids.rwsem held.
- intipcget_new(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)¶
create a new ipc object
Parameters
structipc_namespace*ns
ipc namespace
structipc_ids*ids
ipc identifier set
conststructipc_ops*ops
the actual creation routine to call
structipc_params*params
its parameters
Description
This routine is called by sys_msgget, sys_semget() and sys_shmget()when the key is IPC_PRIVATE.
- intipc_check_perms(structipc_namespace*ns,structkern_ipc_perm*ipcp,conststructipc_ops*ops,structipc_params*params)¶
check security and permissions for an ipc object
Parameters
structipc_namespace*ns
ipc namespace
structkern_ipc_perm*ipcp
ipc permission set
conststructipc_ops*ops
the actual security routine to call
structipc_params*params
its parameters
Description
This routine is called by sys_msgget(), sys_semget() and sys_shmget()when the key is not IPC_PRIVATE and that key already exists in theds IDR.
On success, the ipc id is returned.
It is called with ipc_ids.rwsem and ipcp->lock held.
- intipcget_public(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)¶
get an ipc object or create a new one
Parameters
structipc_namespace*ns
ipc namespace
structipc_ids*ids
ipc identifier set
conststructipc_ops*ops
the actual creation routine to call
structipc_params*params
its parameters
Description
This routine is called by sys_msgget, sys_semget() and sys_shmget()when the key is not IPC_PRIVATE.It adds a new entry if the key is not found and does some permission/ security checkings if the key is found.
On success, the ipc id is returned.
- voidipc_kht_remove(structipc_ids*ids,structkern_ipc_perm*ipcp)¶
remove an ipc from the key hashtable
Parameters
structipc_ids*ids
ipc identifier set
structkern_ipc_perm*ipcp
ipc perm structure containing the key to remove
Description
ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.
- intipc_search_maxidx(structipc_ids*ids,intlimit)¶
search for the highest assigned index
Parameters
structipc_ids*ids
ipc identifier set
intlimit
known upper limit for highest assigned index
Description
The function determines the highest assigned index inids. It is intendedto be called when ids->max_idx needs to be updated.Updating ids->max_idx is necessary when the current highest index ipcobject is deleted.If no ipc object is allocated, then -1 is returned.
ipc_ids.rwsem needs to be held by the caller.
- voidipc_rmid(structipc_ids*ids,structkern_ipc_perm*ipcp)¶
remove an ipc identifier
Parameters
structipc_ids*ids
ipc identifier set
structkern_ipc_perm*ipcp
ipc perm structure containing the identifier to remove
Description
ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.
- voidipc_set_key_private(structipc_ids*ids,structkern_ipc_perm*ipcp)¶
switch the key of an existing ipc to IPC_PRIVATE
Parameters
structipc_ids*ids
ipc identifier set
structkern_ipc_perm*ipcp
ipc perm structure containing the key to modify
Description
ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.
- intipcperms(structipc_namespace*ns,structkern_ipc_perm*ipcp,shortflag)¶
check ipc permissions
Parameters
structipc_namespace*ns
ipc namespace
structkern_ipc_perm*ipcp
ipc permission set
shortflag
desired permission set
Description
Check user, group, other permissions for accessto ipc resources. return 0 if allowed
flag will most probably be 0 orS_...UGO
from <linux/stat.h>
- voidkernel_to_ipc64_perm(structkern_ipc_perm*in,structipc64_perm*out)¶
convert kernel ipc permissions to user
Parameters
structkern_ipc_perm*in
kernel permissions
structipc64_perm*out
new style ipc permissions
Description
Turn the kernel objectin into a set of permissions descriptionsfor returning to userspace (out).
- voidipc64_perm_to_ipc_perm(structipc64_perm*in,structipc_perm*out)¶
convert new ipc permissions to old
Parameters
structipc64_perm*in
new style ipc permissions
structipc_perm*out
old style ipc permissions
Description
Turn the new style permissions objectin into a compatibilityobject and store it into theout pointer.
- structkern_ipc_perm*ipc_obtain_object_idr(structipc_ids*ids,intid)¶
Look for an id in the ipc ids idr and return associated ipc object.
Parameters
structipc_ids*ids
ipc identifier set
intid
ipc id to look for
Description
Call inside the RCU critical section.The ipc object isnot locked on exit.
- structkern_ipc_perm*ipc_obtain_object_check(structipc_ids*ids,intid)¶
Similar to
ipc_obtain_object_idr()
but also checks the ipc object sequence number.
Parameters
structipc_ids*ids
ipc identifier set
intid
ipc id to look for
Description
Call inside the RCU critical section.The ipc object isnot locked on exit.
- intipcget(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)¶
Common sys_*get() code
Parameters
structipc_namespace*ns
namespace
structipc_ids*ids
ipc identifier set
conststructipc_ops*ops
operations to be called on ipc object creation, permission checksand further checks
structipc_params*params
the parameters needed by the previous operations.
Description
Common routine called by sys_msgget(), sys_semget() and sys_shmget().
- intipc_update_perm(structipc64_perm*in,structkern_ipc_perm*out)¶
update the permissions of an ipc object
Parameters
structipc64_perm*in
the permission given as input.
structkern_ipc_perm*out
the permission of the ipc to set.
- structkern_ipc_perm*ipcctl_obtain_check(structipc_namespace*ns,structipc_ids*ids,intid,intcmd,structipc64_perm*perm,intextra_perm)¶
retrieve an ipc object and check permissions
Parameters
structipc_namespace*ns
ipc namespace
structipc_ids*ids
the table of ids where to look for the ipc
intid
the id of the ipc to retrieve
intcmd
the cmd to check
structipc64_perm*perm
the permission to set
intextra_perm
one extra permission parameter used by msq
Description
This function does some common audit and permissions check for some IPC_XXXcmd and is called from semctl_down, shmctl_down and msgctl_down.
- It:
retrieves the ipc object with the given id in the given table.
performs some audit and permission check, depending on the given cmd
returns a pointer to the ipc object or otherwise, the correspondingerror.
Call holding the both the rwsem and the rcu read lock.
- intipc_parse_version(int*cmd)¶
ipc call version
Parameters
int*cmd
pointer to command
Description
Return IPC_64 for new style IPC and IPC_OLD for old style IPC.Thecmd value is turned from an encoding command and version intojust the command code.
- structkern_ipc_perm*sysvipc_find_ipc(structipc_ids*ids,loff_t*pos)¶
Find and lock the ipc structure based on seq pos
Parameters
structipc_ids*ids
ipc identifier set
loff_t*pos
expected position
Description
The function finds an ipc structure, based on the sequence filepositionpos. If there is no ipc structure at positionpos, thenthe successor is selected.If a structure is found, then it is locked (bothrcu_read_lock()
andipc_lock_object()) andpos is set to the position needed to locatethe found ipc structure.If nothing is found (i.e. EOF),pos is not modified.
The function returns the found ipc structure, or NULL at EOF.
FIFO Buffer¶
kfifo interface¶
- DECLARE_KFIFO_PTR¶
DECLARE_KFIFO_PTR(fifo,type)
macro to declare a fifo pointer object
Parameters
fifo
name of the declared fifo
type
type of the fifo elements
- DECLARE_KFIFO¶
DECLARE_KFIFO(fifo,type,size)
macro to declare a fifo object
Parameters
fifo
name of the declared fifo
type
type of the fifo elements
size
the number of elements in the fifo, this must be a power of 2
- INIT_KFIFO¶
INIT_KFIFO(fifo)
Initialize a fifo declared by DECLARE_KFIFO
Parameters
fifo
name of the declared fifo datatype
- DEFINE_KFIFO¶
DEFINE_KFIFO(fifo,type,size)
macro to define and initialize a fifo
Parameters
fifo
name of the declared fifo datatype
type
type of the fifo elements
size
the number of elements in the fifo, this must be a power of 2
Note
the macro can be used for global and local fifo data type variables.
- kfifo_initialized¶
kfifo_initialized(fifo)
Check if the fifo is initialized
Parameters
fifo
address of the fifo to check
Description
Returntrue
if fifo is initialized, otherwisefalse
.Assumes the fifo was 0 before.
- kfifo_esize¶
kfifo_esize(fifo)
returns the size of the element managed by the fifo
Parameters
fifo
address of the fifo to be used
- kfifo_recsize¶
kfifo_recsize(fifo)
returns the size of the record length field
Parameters
fifo
address of the fifo to be used
- kfifo_size¶
kfifo_size(fifo)
returns the size of the fifo in elements
Parameters
fifo
address of the fifo to be used
- kfifo_reset¶
kfifo_reset(fifo)
removes the entire fifo content
Parameters
fifo
address of the fifo to be used
Note
usage ofkfifo_reset()
is dangerous. It should be only called when thefifo is exclusived locked or when it is secured that no other thread isaccessing the fifo.
- kfifo_reset_out¶
kfifo_reset_out(fifo)
skip fifo content
Parameters
fifo
address of the fifo to be used
Note
The usage ofkfifo_reset_out()
is safe until it will be only calledfrom the reader thread and there is only one concurrent reader. Otherwiseit is dangerous and must be handled in the same way askfifo_reset()
.
- kfifo_len¶
kfifo_len(fifo)
returns the number of used elements in the fifo
Parameters
fifo
address of the fifo to be used
- kfifo_is_empty¶
kfifo_is_empty(fifo)
returns true if the fifo is empty
Parameters
fifo
address of the fifo to be used
- kfifo_is_empty_spinlocked¶
kfifo_is_empty_spinlocked(fifo,lock)
returns true if the fifo is empty using a spinlock for locking
Parameters
fifo
address of the fifo to be used
lock
spinlock to be used for locking
- kfifo_is_empty_spinlocked_noirqsave¶
kfifo_is_empty_spinlocked_noirqsave(fifo,lock)
returns true if the fifo is empty using a spinlock for locking, doesn’t disable interrupts
Parameters
fifo
address of the fifo to be used
lock
spinlock to be used for locking
- kfifo_is_full¶
kfifo_is_full(fifo)
returns true if the fifo is full
Parameters
fifo
address of the fifo to be used
- kfifo_avail¶
kfifo_avail(fifo)
returns the number of unused elements in the fifo
Parameters
fifo
address of the fifo to be used
- kfifo_skip_count¶
kfifo_skip_count(fifo,count)
skip output data
Parameters
fifo
address of the fifo to be used
count
count of data to skip
- kfifo_skip¶
kfifo_skip(fifo)
skip output data
Parameters
fifo
address of the fifo to be used
- kfifo_peek_len¶
kfifo_peek_len(fifo)
gets the size of the next fifo record
Parameters
fifo
address of the fifo to be used
Description
This function returns the size of the next fifo record in number of bytes.
- kfifo_alloc¶
kfifo_alloc(fifo,size,gfp_mask)
dynamically allocates a new fifo buffer
Parameters
fifo
pointer to the fifo
size
the number of elements in the fifo, this must be a power of 2
gfp_mask
get_free_pages mask, passed to
kmalloc()
Description
This macro dynamically allocates a new fifo buffer.
The number of elements will be rounded-up to a power of 2.The fifo will be release withkfifo_free()
.Return 0 if no error, otherwise an error code.
- kfifo_free¶
kfifo_free(fifo)
frees the fifo
Parameters
fifo
the fifo to be freed
- kfifo_init¶
kfifo_init(fifo,buffer,size)
initialize a fifo using a preallocated buffer
Parameters
fifo
the fifo to assign the buffer
buffer
the preallocated buffer to be used
size
the size of the internal buffer, this have to be a power of 2
Description
This macro initializes a fifo using a preallocated buffer.
The number of elements will be rounded-up to a power of 2.Return 0 if no error, otherwise an error code.
- kfifo_put¶
kfifo_put(fifo,val)
put data into the fifo
Parameters
fifo
address of the fifo to be used
val
the data to be added
Description
This macro copies the given value into the fifo.It returns 0 if the fifo was full. Otherwise it returns the numberprocessed elements.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_get¶
kfifo_get(fifo,val)
get data from the fifo
Parameters
fifo
address of the fifo to be used
val
address where to store the data
Description
This macro reads the data from the fifo.It returns 0 if the fifo was empty. Otherwise it returns the numberprocessed elements.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_peek¶
kfifo_peek(fifo,val)
get data from the fifo without removing
Parameters
fifo
address of the fifo to be used
val
address where to store the data
Description
This reads the data from the fifo without removing it from the fifo.It returns 0 if the fifo was empty. Otherwise it returns the numberprocessed elements.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_in¶
kfifo_in(fifo,buf,n)
put data into the fifo
Parameters
fifo
address of the fifo to be used
buf
the data to be added
n
number of elements to be added
Description
This macro copies the given buffer into the fifo and returns thenumber of copied elements.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_in_spinlocked¶
kfifo_in_spinlocked(fifo,buf,n,lock)
put data into the fifo using a spinlock for locking
Parameters
fifo
address of the fifo to be used
buf
the data to be added
n
number of elements to be added
lock
pointer to the spinlock to use for locking
Description
This macro copies the given values buffer into the fifo and returns thenumber of copied elements.
- kfifo_in_spinlocked_noirqsave¶
kfifo_in_spinlocked_noirqsave(fifo,buf,n,lock)
put data into fifo using a spinlock for locking, don’t disable interrupts
Parameters
fifo
address of the fifo to be used
buf
the data to be added
n
number of elements to be added
lock
pointer to the spinlock to use for locking
Description
This is a variant ofkfifo_in_spinlocked()
but uses spin_lock/unlock()for locking and doesn’t disable interrupts.
- kfifo_out¶
kfifo_out(fifo,buf,n)
get data from the fifo
Parameters
fifo
address of the fifo to be used
buf
pointer to the storage buffer
n
max. number of elements to get
Description
This macro gets some data from the fifo and returns the numbers of elementscopied.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_out_spinlocked¶
kfifo_out_spinlocked(fifo,buf,n,lock)
get data from the fifo using a spinlock for locking
Parameters
fifo
address of the fifo to be used
buf
pointer to the storage buffer
n
max. number of elements to get
lock
pointer to the spinlock to use for locking
Description
This macro gets the data from the fifo and returns the numbers of elementscopied.
- kfifo_out_spinlocked_noirqsave¶
kfifo_out_spinlocked_noirqsave(fifo,buf,n,lock)
get data from the fifo using a spinlock for locking, don’t disable interrupts
Parameters
fifo
address of the fifo to be used
buf
pointer to the storage buffer
n
max. number of elements to get
lock
pointer to the spinlock to use for locking
Description
This is a variant ofkfifo_out_spinlocked()
which uses spin_lock/unlock()for locking and doesn’t disable interrupts.
- kfifo_from_user¶
kfifo_from_user(fifo,from,len,copied)
puts some data from user space into the fifo
Parameters
fifo
address of the fifo to be used
from
pointer to the data to be added
len
the length of the data to be added
copied
pointer to output variable to store the number of copied bytes
Description
This macro copies at mostlen bytes from thefrom into thefifo, depending of the available space and returns -EFAULT/0.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_to_user¶
kfifo_to_user(fifo,to,len,copied)
copies data from the fifo into user space
Parameters
fifo
address of the fifo to be used
to
where the data must be copied
len
the size of the destination buffer
copied
pointer to output variable to store the number of copied bytes
Description
This macro copies at mostlen bytes from the fifo into theto buffer and returns -EFAULT/0.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_dma_in_prepare_mapped¶
kfifo_dma_in_prepare_mapped(fifo,sgl,nents,len,dma)
setup a scatterlist for DMA input
Parameters
fifo
address of the fifo to be used
sgl
pointer to the scatterlist array
nents
number of entries in the scatterlist array
len
number of elements to transfer
dma
mapped dma address to fill intosgl
Description
This macro fills a scatterlist for DMA input.It returns the number entries in the scatterlist array.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.
- kfifo_dma_in_finish¶
kfifo_dma_in_finish(fifo,len)
finish a DMA IN operation
Parameters
fifo
address of the fifo to be used
len
number of bytes to received
Description
This macro finishes a DMA IN operation. The in counter will be updated bythe len parameter. No error checking will be done.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.
- kfifo_dma_out_prepare_mapped¶
kfifo_dma_out_prepare_mapped(fifo,sgl,nents,len,dma)
setup a scatterlist for DMA output
Parameters
fifo
address of the fifo to be used
sgl
pointer to the scatterlist array
nents
number of entries in the scatterlist array
len
number of elements to transfer
dma
mapped dma address to fill intosgl
Description
This macro fills a scatterlist for DMA output which at mostlen bytesto transfer.It returns the number entries in the scatterlist array.A zero means there is no space available and the scatterlist is not filled.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.
- kfifo_dma_out_finish¶
kfifo_dma_out_finish(fifo,len)
finish a DMA OUT operation
Parameters
fifo
address of the fifo to be used
len
number of bytes transferred
Description
This macro finishes a DMA OUT operation. The out counter will be updated bythe len parameter. No error checking will be done.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.
- kfifo_out_peek¶
kfifo_out_peek(fifo,buf,n)
gets some data from the fifo
Parameters
fifo
address of the fifo to be used
buf
pointer to the storage buffer
n
max. number of elements to get
Description
This macro gets the data from the fifo and returns the numbers of elementscopied. The data is not removed from the fifo.
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_out_linear¶
kfifo_out_linear(fifo,tail,n)
gets a tail of/offset to available data
Parameters
fifo
address of the fifo to be used
tail
pointer to an unsigned int to store the value of tail
n
max. number of elements to point at
Description
This macro obtains the offset (tail) to the available data in the fifobuffer and returns thenumbers of elements available. It returns the available count till the endof data or till the end of the buffer. So that it can be used for lineardata processing (likememcpy()
of (fifo->data +tail) with countreturned).
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
- kfifo_out_linear_ptr¶
kfifo_out_linear_ptr(fifo,ptr,n)
gets a pointer to the available data
Parameters
fifo
address of the fifo to be used
ptr
pointer to data to store the pointer to tail
n
max. number of elements to point at
Description
Similarly tokfifo_out_linear()
, this macro obtains the pointer to theavailable data in the fifo buffer and returns the numbers of elementsavailable. It returns the available count till the end of available data ortill the end of the buffer. So that it can be used for linear dataprocessing (likememcpy()
ofptr with count returned).
Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.
relay interface support¶
Relay interface support is designed to provide an efficient mechanismfor tools and facilities to relay large amounts of data from kernelspace to user space.
relay interface¶
- intrelay_buf_full(structrchan_buf*buf)¶
boolean, is the channel buffer full?
Parameters
structrchan_buf*buf
channel buffer
Returns 1 if the buffer is full, 0 otherwise.
- voidrelay_reset(structrchan*chan)¶
reset the channel
Parameters
structrchan*chan
the channel
This has the effect of erasing all data from all channel buffersand restarting the channel in its initial state. The buffersare not freed, so any mappings are still in effect.
NOTE. Care should be taken that the channel isn’t actuallybeing used by anything when this call is made.
- structrchan*relay_open(constchar*base_filename,structdentry*parent,size_tsubbuf_size,size_tn_subbufs,conststructrchan_callbacks*cb,void*private_data)¶
create a new relay channel
Parameters
constchar*base_filename
base name of files to create
structdentry*parent
dentry of parent directory,
NULL
for root directory or buffersize_tsubbuf_size
size of sub-buffers
size_tn_subbufs
number of sub-buffers
conststructrchan_callbacks*cb
client callback functions
void*private_data
user-defined data
Returns channel pointer if successful,
NULL
otherwise.Creates a channel buffer for each cpu using the sizes andattributes specified. The created channel buffer fileswill be named base_filename0...base_filenameN-1. Filepermissions will be
S_IRUSR
.
- size_trelay_switch_subbuf(structrchan_buf*buf,size_tlength)¶
switch to a new sub-buffer
Parameters
structrchan_buf*buf
channel buffer
size_tlength
size of current event
Returns either the length passed in or 0 if full.
Performs sub-buffer-switch tasks such as invoking callbacks,updating padding counts, waking up readers, etc.
- voidrelay_subbufs_consumed(structrchan*chan,unsignedintcpu,size_tsubbufs_consumed)¶
update the buffer’s sub-buffers-consumed count
Parameters
structrchan*chan
the channel
unsignedintcpu
the cpu associated with the channel buffer to update
size_tsubbufs_consumed
number of sub-buffers to add to current buf’s count
Adds to the channel buffer’s consumed sub-buffer count.subbufs_consumed should be the number of sub-buffers newly consumed,not the total consumed.
NOTE. Kernel clients don’t need to call this function if the channelmode is ‘overwrite’.
- voidrelay_close(structrchan*chan)¶
close the channel
Parameters
structrchan*chan
the channel
Closes all channel buffers and frees the channel.
- voidrelay_flush(structrchan*chan)¶
close the channel
Parameters
structrchan*chan
the channel
Flushes all channel buffers, i.e. forces buffer switch.
- intrelay_mmap_buf(structrchan_buf*buf,structvm_area_struct*vma)¶
mmap channel buffer to process address space
Parameters
structrchan_buf*buf
relay channel buffer
structvm_area_struct*vma
vm_area_struct describing memory to be mapped
Returns 0 if ok, negative on error
Caller should already have grabbed mmap_lock.
- void*relay_alloc_buf(structrchan_buf*buf,size_t*size)¶
allocate a channel buffer
Parameters
structrchan_buf*buf
the buffer struct
size_t*size
total size of the buffer
Returns a pointer to the resulting buffer,
NULL
if unsuccessful. Thepassed in size will get page aligned, if it isn’t already.
- structrchan_buf*relay_create_buf(structrchan*chan)¶
allocate and initialize a channel buffer
Parameters
structrchan*chan
the relay channel
Returns channel buffer if successful,
NULL
otherwise.
Parameters
structkref*kref
target kernel reference that contains the relay channel
Should only be called from
kref_put()
.
- voidrelay_destroy_buf(structrchan_buf*buf)¶
destroy an rchan_buf struct and associated buffer
Parameters
structrchan_buf*buf
the buffer struct
Parameters
structkref*kref
target kernel reference that contains the relay buffer
Removes the file from the filesystem, which also frees therchan_buf_struct and the channel buffer. Should only be called from
kref_put()
.
- intrelay_buf_empty(structrchan_buf*buf)¶
boolean, is the channel buffer empty?
Parameters
structrchan_buf*buf
channel buffer
Returns 1 if the buffer is empty, 0 otherwise.
- voidwakeup_readers(structirq_work*work)¶
wake up readers waiting on a channel
Parameters
structirq_work*work
contains the channel buffer
This is the function used to defer reader waking
- void__relay_reset(structrchan_buf*buf,unsignedintinit)¶
reset a channel buffer
Parameters
structrchan_buf*buf
the channel buffer
unsignedintinit
1 if this is a first-time initialization
See
relay_reset()
for description of effect.
- voidrelay_close_buf(structrchan_buf*buf)¶
close a channel buffer
Parameters
structrchan_buf*buf
channel buffer
Marks the buffer finalized and restores the default callbacks.The channel buffer and channel buffer data structure are then freedautomatically when the last reference is given up.
Parameters
structinode*inode
the inode
structfile*filp
the file
Increments the channel buffer refcount.
Parameters
structfile*filp
the file
structvm_area_struct*vma
the vma describing what to map
Calls upon
relay_mmap_buf()
to map the file into user space.
Parameters
structfile*filp
the file
poll_table*wait
poll table
Poll implemention.
Parameters
structinode*inode
the inode
structfile*filp
the file
Decrements the channel refcount, as the filesystem isno longer using it.
- size_trelay_file_read_subbuf_avail(size_tread_pos,structrchan_buf*buf)¶
return bytes available in sub-buffer
Parameters
size_tread_pos
file read position
structrchan_buf*buf
relay channel buffer
- size_trelay_file_read_start_pos(structrchan_buf*buf)¶
find the first available byte to read
Parameters
structrchan_buf*buf
relay channel buffer
If the read_pos is in the middle of padding, return theposition of the first actually available byte, otherwisereturn the original value.
- size_trelay_file_read_end_pos(structrchan_buf*buf,size_tread_pos,size_tcount)¶
return the new read position
Parameters
structrchan_buf*buf
relay channel buffer
size_tread_pos
file read position
size_tcount
number of bytes to be read
Module Support¶
Kernel module auto-loading¶
- int__request_module(boolwait,constchar*fmt,...)¶
try to load a kernel module
Parameters
boolwait
wait (or not) for the operation to complete
constchar*fmt
printf style format string for the name of the module
...
arguments as specified in the format string
Description
Load a module using the user mode module loader. The function returnszero on success or a negative errno code or positive exit code from“modprobe” on failure. Note that a successful module load does not meanthe module did not then unload and exit on an error of its own. Callersmust check that the service they requested is now available not blindlyinvoke it.
If module auto-loading support is disabled then this functionsimply returns -ENOENT.
Module debugging¶
Enabling CONFIG_MODULE_STATS enables module debugging statistics whichare useful to monitor and root cause memory pressure issues with moduleloading. These statistics are useful to allow us to improve productionworkloads.
The current module debugging statistics supported help keep track of moduleloading failures to enable improvements either for kernel module auto-loadingusage (request_module()) or interactions with userspace. Statistics areprovided to track all possible failures in the finit_module() path and memorywasted in this process space. Each of the failure counters are associatedto a type of module loading failure which is known to incur a certain amountof memory allocation loss. In the worst case loading a module will fail aftera 3 step memory allocation process:
memory allocated with kernel_read_file_from_fd()
module decompression processes the file read fromkernel_read_file_from_fd(), and
vmap()
is used to mapthe decompressed module to a new local buffer which representsa copy of the decompressed module passed from userspace. The bufferfrom kernel_read_file_from_fd() is freed right away.layout_and_allocate() allocates space for the final restingplace where we would keep the module if it were to be processedsuccessfully.
If a failure occurs after these three different allocations only onecounter will be incremented with the summation of the allocated bytes freedincurred during this failure. Likewise, if module loading failed only afterstep b) a separate counter is used and incremented for the bytes freed andnot used during both of those allocations.
Virtual memory space can be limited, for example on x86 virtual memory sizedefaults to 128 MiB. We should strive to limit and avoid wasting virtualmemory allocations when possible. These module debugging statistics helpto evaluate how much memory is being wasted on bootup due to module loadingfailures.
All counters are designed to be incremental. Atomic counters are used so toremain simple and avoid delays and deadlocks.
dup_failed_modules - tracks duplicate failed modules¶
Linked list of modules which failed to be loaded because an already existingmodule with the same name was already being processed or already loaded.The finit_module() system call incurs heavy virtual memory allocations. Inthe worst case an finit_module() system call can end up allocating virtualmemory 3 times:
kernel_read_file_from_fd() call uses vmalloc()
optional module decompression uses
vmap()
layout_and allocate() can use vzalloc() or an arch specific variation ofvmalloc to deal with ELF sections requiring special permissions
In practice on a typical boot today most finit_module() calls fail due tothe module with the same name already being loaded or about to be processed.All virtual memory allocated to these failed modules will be freed withno functional use.
To help with this the dup_failed_modules allows us to track modules whichfailed to load due to the fact that a module was already loaded or beingprocessed. There are only two points at which we can fail such calls,we list them below along with the number of virtual memory allocationcalls:
FAIL_DUP_MOD_BECOMING: at the end of early_mod_check() beforelayout_and_allocate().- with module decompression: 2 virtual memory allocation calls- without module decompression: 1 virtual memory allocation calls
FAIL_DUP_MOD_LOAD: after layout_and_allocate() on add_unformed_module()- with module decompression 3 virtual memory allocation calls- without module decompression 2 virtual memory allocation calls
We should strive to get this list to be as small as possible. If this listis not empty it is a reflection of possible work or optimizations possibleeither in-kernel or in userspace.
module statistics debugfs counters¶
The total amount of wasted virtual memory allocation space during moduleloading can be computed by adding the total from the summation:
invalid_kread_bytes +invalid_decompress_bytes +invalid_becoming_bytes +invalid_mod_bytes
The following debugfs counters are available to inspect module loadingfailures:
total_mod_size: total bytes ever used by all modules we’ve dealt with onthis system
total_text_size: total bytes of the .text and .init.text ELF sectionsizes we’ve dealt with on this system
invalid_kread_bytes: bytes allocated and then freed on failures whichhappen due to the initial kernel_read_file_from_fd(). kernel_read_file_from_fd()uses vmalloc(). These should typically not happen unless your system isunder memory pressure.
invalid_decompress_bytes: number of bytes allocated and freed due tomemory allocations in the module decompression path that use
vmap()
.These typically should not happen unless your system is under memorypressure.invalid_becoming_bytes: total number of bytes allocated and freed usedto read the kernel module userspace wants us to read before wepromote it to be processed to be added to ourmodules linked list. Thesefailures can happen if we had a check in between a successful kernel_read_file_from_fd()call and right before we allocate the our private memory for the modulewhich would be kept if the module is successfully loaded. The most commonreason for this failure is when userspace is racing to load a modulewhich it does not yet see loaded. The first module to succeed inadd_unformed_module() will add a module to our
modules
list andsubsequent loads of modules with the same name will error out at theend of early_mod_check(). The check for module_patient_check_exists()at the end of early_mod_check() prevents duplicate allocationson layout_and_allocate() for modules already being processed. Theseduplicate failed modules are non-fatal, however they typically areindicative of userspace not seeing a module in userspace loaded yet andunnecessarily trying to load a module before the kernel even has a chanceto begin to process prior requests. Although duplicate failures can benon-fatal, we should try to reduce vmalloc() pressure proactively, soideally after boot this will be close to as 0 as possible. If moduledecompression was used we also add to this counter the cost of theinitial kernel_read_file_from_fd() of the compressed module. If moduledecompression was not used the value represents the total allocated andfreed bytes in kernel_read_file_from_fd() calls for these type offailures. These failures can occur because:
module_sig_check() - module signature checks
elf_validity_cache_copy() - some ELF validation issue
early_mod_check():
blacklisting
failed to rewrite section headers
version magic
live patch requirements didn’t check out
the module was detected as being already present
invalid_mod_bytes: these are the total number of bytes allocated andfreed due to failures after we did all the sanity checks of the modulewhich userspace passed to us and after our first check that the moduleis unique. A module can still fail to load if we detect the module isloaded after we allocate space for it with layout_and_allocate(), we dothis check right before processing the module as live and run itsinitialization routines. Note that you have a failure of this type italso means the respective kernel_read_file_from_fd() memory space wasalso freed and not used, and so we increment this counter with twicethe size of the module. Additionally if you used module decompressionthe size of the compressed module is also added to this counter.
modcount: how many modules we’ve loaded in our kernel life time
failed_kreads: how many modules failed due to failed kernel_read_file_from_fd()
failed_decompress: how many failed module decompression attempts we’ve had.These really should not happen unless your compression / decompressionmight be broken.
failed_becoming: how many modules failed after we kernel_read_file_from_fd()it and before we allocate memory for it with layout_and_allocate(). Thiscounter is never incremented if you manage to validate the module andcall layout_and_allocate() for it.
failed_load_modules: how many modules failed once we’ve allocated ourprivate space for our module using layout_and_allocate(). These failuresshould hopefully mostly be dealt with already. Races in theory couldstill exist here, but it would just mean the kernel had started processingtwo threads concurrently up to early_mod_check() and one thread won.These failures are good signs the kernel or userspace is doing somethingseriously stupid or that could be improved. We should strive to fix these,but it is perhaps not easy to fix them. A recent example are the modulesrequests incurred for frequency modules, a separate module request wasbeing issued for each CPU on a system.
Inter Module support¶
Refer to the files in kernel/module/ for more information.
Hardware Interfaces¶
DMA Channels¶
- intrequest_dma(unsignedintdmanr,constchar*device_id)¶
request and reserve a system DMA channel
Parameters
unsignedintdmanr
DMA channel number
constchar*device_id
reserving device ID string, used in /proc/dma
- voidfree_dma(unsignedintdmanr)¶
free a reserved system DMA channel
Parameters
unsignedintdmanr
DMA channel number
Resources Management¶
- structresource*request_resource_conflict(structresource*root,structresource*new)¶
request and reserve an I/O or memory resource
Parameters
structresource*root
root resource descriptor
structresource*new
resource descriptor desired by caller
Description
Returns 0 for success, conflict resource on error.
- intfind_next_iomem_res(resource_size_tstart,resource_size_tend,unsignedlongflags,unsignedlongdesc,structresource*res)¶
Finds the lowest iomem resource that covers part of [start..**end**].
Parameters
resource_size_tstart
start address of the resource searched for
resource_size_tend
end address of same resource
unsignedlongflags
flags which the resource must have
unsignedlongdesc
descriptor the resource must have
structresource*res
return ptr, if resource found
Description
If a resource is found, returns 0 and***res is overwritten with the partof the resource that’s within [**start..**end**]; if none is found, returns-ENODEV. Returns -EINVAL for invalid parameters.
The caller must specifystart,end,flags, anddesc(which may be IORES_DESC_NONE).
- intreallocate_resource(structresource*root,structresource*old,resource_size_tnewsize,structresource_constraint*constraint)¶
allocate a slot in the resource tree given range & alignment. The resource will be relocated if the new size cannot be reallocated in the current location.
Parameters
structresource*root
root resource descriptor
structresource*old
resource descriptor desired by caller
resource_size_tnewsize
new size of the resource descriptor
structresource_constraint*constraint
the memory range and alignment constraints to be met.
- structresource*lookup_resource(structresource*root,resource_size_tstart)¶
find an existing resource by a resource start address
Parameters
structresource*root
root resource descriptor
resource_size_tstart
resource start address
Description
Returns a pointer to the resource if found, NULL otherwise
- structresource*insert_resource_conflict(structresource*parent,structresource*new)¶
Inserts resource in the resource tree
Parameters
structresource*parent
parent of the new resource
structresource*new
new resource to insert
Description
Returns 0 on success, conflict resource if the resource can’t be inserted.
This function is equivalent to request_resource_conflict when no conflicthappens. If a conflict happens, and the conflicting resourcesentirely fit within the range of the new resource, then the newresource is inserted and the conflicting resources become children ofthe new resource.
This function is intended for producers of resources, such as FW modulesand bus drivers.
- resource_size_tresource_alignment(structresource*res)¶
calculate resource’s alignment
Parameters
structresource*res
resource pointer
Description
Returns alignment on success, 0 (invalid alignment) on failure.
- voidrelease_mem_region_adjustable(resource_size_tstart,resource_size_tsize)¶
release a previously reserved memory region
Parameters
resource_size_tstart
resource start address
resource_size_tsize
resource region size
Description
This interface is intended for memory hot-delete. The requested regionis released from a currently busy memory resource. The requested regionmust either match exactly or fit into a single busy resource entry. Inthe latter case, the remaining resource is adjusted accordingly.Existing children of the busy memory resource must be immutable in therequest.
Note
Additional release conditions, such as overlapping region, can besupported after they are confirmed as valid cases.
When a busy memory resource gets split into two entries, the codeassumes that all children remain in the lower address entry forsimplicity. Enhance this logic when necessary.
- voidmerge_system_ram_resource(structresource*res)¶
mark the System RAM resource mergeable and try to merge it with adjacent, mergeable resources
Parameters
structresource*res
resource descriptor
Description
This interface is intended for memory hotplug, whereby lots of contiguoussystem ram resources are added (e.g., via add_memory*()) by a driver, andthe actual resource boundaries are not of interest (e.g., it might berelevant for DIMMs). Only resources that are marked mergeable, that have thesame parent, and that don’t have any children are considered. All mergeableresources must be immutable during the request.
Note
The caller has to make sure that no pointers to resources that aremarked mergeable are used anymore after this call - the resource mightbe freed and the pointer might be stale!
release_mem_region_adjustable()
will split on demand on memory hotunplug
- intrequest_resource(structresource*root,structresource*new)¶
request and reserve an I/O or memory resource
Parameters
structresource*root
root resource descriptor
structresource*new
resource descriptor desired by caller
Description
Returns 0 for success, negative error code on error.
- intrelease_resource(structresource*old)¶
release a previously reserved resource
Parameters
structresource*old
resource pointer
- intwalk_iomem_res_desc(unsignedlongdesc,unsignedlongflags,u64start,u64end,void*arg,int(*func)(structresource*,void*))¶
Walks through iomem resources and calls func() with matching resource ranges. *
Parameters
unsignedlongdesc
I/O resource descriptor. Use IORES_DESC_NONE to skipdesc check.
unsignedlongflags
I/O resource flags
u64start
start addr
u64end
end addr
void*arg
function argument for the callbackfunc
int(*func)(structresource*,void*)
callback function that is called for each qualifying resource area
Description
All the memory ranges which overlap start,end and also match flags anddesc are valid candidates.
NOTE
For a new descriptor search, define a new IORES_DESC in<linux/ioport.h> and set it in ‘desc’ of a target resource entry.
- intregion_intersects(resource_size_tstart,size_tsize,unsignedlongflags,unsignedlongdesc)¶
determine intersection of region with known resources
Parameters
resource_size_tstart
region start address
size_tsize
size of region
unsignedlongflags
flags of resource (in iomem_resource)
unsignedlongdesc
descriptor of resource (in iomem_resource) or IORES_DESC_NONE
Description
Check if the specified region partially overlaps or fully eclipses aresource identified byflags anddesc (optional with IORES_DESC_NONE).Return REGION_DISJOINT if the region does not overlapflags/desc,return REGION_MIXED if the region overlapsflags/desc and anotherresource, and return REGION_INTERSECTS if the region overlapsflags/descand no other defined resource. Note that REGION_INTERSECTS is alsoreturned in the case when the specified region overlaps RAM and undefinedmemory holes.
region_intersect() is used by memory remapping functions to ensurethe user is not remapping RAM and is a vast speed up over walkingthrough the resource table page by page.
- intfind_resource_space(structresource*root,structresource*new,resource_size_tsize,structresource_constraint*constraint)¶
Find empty space in the resource tree
Parameters
structresource*root
Root resource descriptor
structresource*new
Resource descriptor awaiting an empty resource space
resource_size_tsize
The minimum size of the empty space
structresource_constraint*constraint
The range and alignment constraints to be met
Description
Finds an empty space underroot in the resource tree satisfying range andalignmentconstraints.
Return
0
- if successful,new members start, end, and flags are altered.-EBUSY
- if no empty space was found.
- intallocate_resource(structresource*root,structresource*new,resource_size_tsize,resource_size_tmin,resource_size_tmax,resource_size_talign,resource_alignfalignf,void*alignf_data)¶
allocate empty slot in the resource tree given range & alignment. The resource will be reallocated with a new size if it was already allocated
Parameters
structresource*root
root resource descriptor
structresource*new
resource descriptor desired by caller
resource_size_tsize
requested resource region size
resource_size_tmin
minimum boundary to allocate
resource_size_tmax
maximum boundary to allocate
resource_size_talign
alignment requested, in bytes
resource_alignfalignf
alignment function, optional, called if not NULL
void*alignf_data
arbitrary data to pass to thealignf function
- intinsert_resource(structresource*parent,structresource*new)¶
Inserts a resource in the resource tree
Parameters
structresource*parent
parent of the new resource
structresource*new
new resource to insert
Description
Returns 0 on success, -EBUSY if the resource can’t be inserted.
This function is intended for producers of resources, such as FW modulesand bus drivers.
- voidinsert_resource_expand_to_fit(structresource*root,structresource*new)¶
Insert a resource into the resource tree
Parameters
structresource*root
root resource descriptor
structresource*new
new resource to insert
Description
Insert a resource into the resource tree, possibly expanding it in orderto make it encompass any conflicting resources.
- intremove_resource(structresource*old)¶
Remove a resource in the resource tree
Parameters
structresource*old
resource to remove
Description
Returns 0 on success, -EINVAL if the resource is not valid.
This function removes a resource previously inserted byinsert_resource()
orinsert_resource_conflict()
, and moves the children (if any) up towhere they were before.insert_resource()
andinsert_resource_conflict()
insert a new resource, and move any conflicting resources down to thechildren of the new resource.
insert_resource()
,insert_resource_conflict()
andremove_resource()
areintended for producers of resources, such as FW modules and bus drivers.
- intadjust_resource(structresource*res,resource_size_tstart,resource_size_tsize)¶
modify a resource’s start and size
Parameters
structresource*res
resource to modify
resource_size_tstart
new start value
resource_size_tsize
new size
Description
Given an existing resource, change its start and size to match thearguments. Returns 0 on success, -EBUSY if it can’t fit.Existing children of the resource are assumed to be immutable.
- structresource*__request_region(structresource*parent,resource_size_tstart,resource_size_tn,constchar*name,intflags)¶
create a new busy resource region
Parameters
structresource*parent
parent resource descriptor
resource_size_tstart
resource start address
resource_size_tn
resource region size
constchar*name
reserving caller’s ID string
intflags
IO resource flags
- void__release_region(structresource*parent,resource_size_tstart,resource_size_tn)¶
release a previously reserved resource region
Parameters
structresource*parent
parent resource descriptor
resource_size_tstart
resource start address
resource_size_tn
resource region size
Description
The described resource region must match a currently busy region.
- intdevm_request_resource(structdevice*dev,structresource*root,structresource*new)¶
request and reserve an I/O or memory resource
Parameters
structdevice*dev
device for which to request the resource
structresource*root
root of the resource tree from which to request the resource
structresource*new
descriptor of the resource to request
Description
This is a device-managed version ofrequest_resource()
. There is usuallyno need to release resources requested by this function explicitly sincethat will be taken care of when the device is unbound from its driver.If for some reason the resource needs to be released explicitly, becauseof ordering issues for example, drivers must calldevm_release_resource()
rather than the regularrelease_resource()
.
When a conflict is detected between any existing resources and the newlyrequested resource, an error message will be printed.
Returns 0 on success or a negative error code on failure.
- voiddevm_release_resource(structdevice*dev,structresource*new)¶
release a previously requested resource
Parameters
structdevice*dev
device for which to release the resource
structresource*new
descriptor of the resource to release
Description
Releases a resource previously requested usingdevm_request_resource()
.
- structresource*devm_request_free_mem_region(structdevice*dev,structresource*base,unsignedlongsize)¶
find free region for device private memory
Parameters
structdevice*dev
device struct to bind the resource to
structresource*base
resource tree to look in
unsignedlongsize
size in bytes of the device memory to add
Description
This function tries to find an empty range of physical address big enough tocontain the new resource, so that it can later be hotplugged as ZONE_DEVICEmemory, which in turn allocates struct pages.
- structresource*alloc_free_mem_region(structresource*base,unsignedlongsize,unsignedlongalign,constchar*name)¶
find a free region relative tobase
Parameters
structresource*base
resource that will parent the new resource
unsignedlongsize
size in bytes of memory to allocate frombase
unsignedlongalign
alignment requirements for the allocation
constchar*name
resource name
Description
Buses like CXL, that can dynamically instantiate new memory regions,need a method to allocate physical address space for those regions.Allocate and insert a new resource to cover a free, unclaimed by adescendant ofbase, range in the span ofbase.
MTRR Handling¶
- intarch_phys_wc_add(unsignedlongbase,unsignedlongsize)¶
add a WC MTRR and handle errors if PAT is unavailable
Parameters
unsignedlongbase
Physical base address
unsignedlongsize
Size of region
Description
If PAT is available, this does nothing. If PAT is unavailable, itattempts to add a WC MTRR covering size bytes starting at base andlogs an error if this fails.
The called should provide a power of two size on an equivalentpower of two boundary.
Drivers must store the return value to pass to mtrr_del_wc_if_needed,but drivers should not try to interpret that return value.
Security Framework¶
- intsecurity_init(void)¶
initializes the security framework
Parameters
void
no arguments
Description
This should be called early in the kernel initialization sequence.
- voidsecurity_add_hooks(structsecurity_hook_list*hooks,intcount,conststructlsm_id*lsmid)¶
Add a modules hooks to the hook lists.
Parameters
structsecurity_hook_list*hooks
the hooks to add
intcount
the number of hooks to add
conststructlsm_id*lsmid
the identification information for the security module
Description
Each LSM has to register its hooks with the infrastructure.
- intlsm_blob_alloc(void**dest,size_tsize,gfp_tgfp)¶
allocate a composite blob
Parameters
void**dest
the destination for the blob
size_tsize
the size of the blob
gfp_tgfp
allocation type
Description
Allocate a blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
Parameters
structcred*cred
the cred that needs a blob
gfp_tgfp
allocation type
Description
Allocate the cred blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
Parameters
structcred*cred
the cred that needs a blob
Description
Allocate the cred blob for all the modules
Parameters
structfile*file
the file that needs a blob
Description
Allocate the file blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
Parameters
structinode*inode
the inode that needs a blob
gfp_tgfp
allocation flags
Description
Allocate the inode blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_task_alloc(structtask_struct*task)¶
allocate a composite task blob
Parameters
structtask_struct*task
the task that needs a blob
Description
Allocate the task blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_ipc_alloc(structkern_ipc_perm*kip)¶
allocate a composite ipc blob
Parameters
structkern_ipc_perm*kip
the ipc that needs a blob
Description
Allocate the ipc blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
Parameters
structkey*key
the key that needs a blob
Description
Allocate the key blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_msg_msg_alloc(structmsg_msg*mp)¶
allocate a composite msg_msg blob
Parameters
structmsg_msg*mp
the msg_msg that needs a blob
Description
Allocate the ipc blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_bdev_alloc(structblock_device*bdev)¶
allocate a composite block_device blob
Parameters
structblock_device*bdev
the block_device that needs a blob
Description
Allocate the block_device blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- voidlsm_early_task(structtask_struct*task)¶
during initialization allocate a composite task blob
Parameters
structtask_struct*task
the task that needs a blob
Description
Allocate the task blob for all the modules
- intlsm_superblock_alloc(structsuper_block*sb)¶
allocate a composite superblock blob
Parameters
structsuper_block*sb
the superblock that needs a blob
Description
Allocate the superblock blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intlsm_fill_user_ctx(structlsm_ctx__user*uctx,u32*uctx_len,void*val,size_tval_len,u64id,u64flags)¶
Fill a user space lsm_ctx structure
Parameters
structlsm_ctx__user*uctx
a userspace LSM context to be filled
u32*uctx_len
available uctx size (input), used uctx size (output)
void*val
the new LSM context value
size_tval_len
the size of the new LSM context value
u64id
LSM id
u64flags
LSM defined flags
Description
Fill all of the fields in a userspace lsm_ctx structure. Ifuctx is NULLsimply calculate the required size to output viautc_len and returnsuccess.
Returns 0 on success, -E2BIG if userspace buffer is not large enough,-EFAULT on a copyout error, -ENOMEM if memory can’t be allocated.
- intsecurity_binder_set_context_mgr(conststructcred*mgr)¶
Check if becoming binder ctx mgr is ok
Parameters
conststructcred*mgr
task credentials of current binder process
Description
Check whethermgr is allowed to be the binder context manager.
Return
Return 0 if permission is granted.
- intsecurity_binder_transaction(conststructcred*from,conststructcred*to)¶
Check if a binder transaction is allowed
Parameters
conststructcred*from
sending process
conststructcred*to
receiving process
Description
Check whetherfrom is allowed to invoke a binder transaction call toto.
Return
Returns 0 if permission is granted.
- intsecurity_binder_transfer_binder(conststructcred*from,conststructcred*to)¶
Check if a binder transfer is allowed
Parameters
conststructcred*from
sending process
conststructcred*to
receiving process
Description
Check whetherfrom is allowed to transfer a binder reference toto.
Return
Returns 0 if permission is granted.
- intsecurity_binder_transfer_file(conststructcred*from,conststructcred*to,conststructfile*file)¶
Check if a binder file xfer is allowed
Parameters
conststructcred*from
sending process
conststructcred*to
receiving process
conststructfile*file
file being transferred
Description
Check whetherfrom is allowed to transferfile toto.
Return
Returns 0 if permission is granted.
- intsecurity_ptrace_access_check(structtask_struct*child,unsignedintmode)¶
Check if tracing is allowed
Parameters
structtask_struct*child
target process
unsignedintmode
PTRACE_MODE flags
Description
Check permission before allowing the current process to trace thechildprocess. Security modules may also want to perform a process tracing checkduring an execve in the set_security or apply_creds hooks of tracing checkduring an execve in the bprm_set_creds hook of binprm_security_ops if theprocess is being traced and its security attributes would be changed by theexecve.
Return
Returns 0 if permission is granted.
- intsecurity_ptrace_traceme(structtask_struct*parent)¶
Check if tracing is allowed
Parameters
structtask_struct*parent
tracing process
Description
Check that theparent process has sufficient permission to trace thecurrent process before allowing the current process to present itself to theparent process for tracing.
Return
Returns 0 if permission is granted.
- intsecurity_capget(conststructtask_struct*target,kernel_cap_t*effective,kernel_cap_t*inheritable,kernel_cap_t*permitted)¶
Get the capability sets for a process
Parameters
conststructtask_struct*target
target process
kernel_cap_t*effective
effective capability set
kernel_cap_t*inheritable
inheritable capability set
kernel_cap_t*permitted
permitted capability set
Description
Get theeffective,inheritable, andpermitted capability sets for thetarget process. The hook may also perform permission checking to determineif the current process is allowed to see the capability sets of thetargetprocess.
Return
Returns 0 if the capability sets were successfully obtained.
- intsecurity_capset(structcred*new,conststructcred*old,constkernel_cap_t*effective,constkernel_cap_t*inheritable,constkernel_cap_t*permitted)¶
Set the capability sets for a process
Parameters
structcred*new
new credentials for the target process
conststructcred*old
current credentials of the target process
constkernel_cap_t*effective
effective capability set
constkernel_cap_t*inheritable
inheritable capability set
constkernel_cap_t*permitted
permitted capability set
Description
Set theeffective,inheritable, andpermitted capability sets for thecurrent process.
Return
Returns 0 and updatenew if permission is granted.
- intsecurity_capable(conststructcred*cred,structuser_namespace*ns,intcap,unsignedintopts)¶
Check if a process has the necessary capability
Parameters
conststructcred*cred
credentials to examine
structuser_namespace*ns
user namespace
intcap
capability requested
unsignedintopts
capability check options
Description
Check whether thetsk process has thecap capability in the indicatedcredentials.cap contains the capability <include/linux/capability.h>.opts contains options for the capable check <include/linux/security.h>.
Return
Returns 0 if the capability is granted.
- intsecurity_quotactl(intcmds,inttype,intid,conststructsuper_block*sb)¶
Check if a quotactl() syscall is allowed for this fs
Parameters
intcmds
commands
inttype
type
intid
id
conststructsuper_block*sb
filesystem
Description
Check whether the quotactl syscall is allowed for thissb.
Return
Returns 0 if permission is granted.
Parameters
structdentry*dentry
dentry
Description
Check whether QUOTAON is allowed fordentry.
Return
Returns 0 if permission is granted.
- intsecurity_syslog(inttype)¶
Check if accessing the kernel message ring is allowed
Parameters
inttype
SYSLOG_ACTION_* type
Description
Check permission before accessing the kernel message ring or changinglogging to the console. See the syslog(2) manual page for an explanation ofthetype values.
Return
Return 0 if permission is granted.
- intsecurity_settime64(conststructtimespec64*ts,conststructtimezone*tz)¶
Check if changing the system time is allowed
Parameters
conststructtimespec64*ts
new time
conststructtimezone*tz
timezone
Description
Check permission to change the system time, struct timespec64 is defined in<include/linux/time64.h> and timezone is defined in <include/linux/time.h>.
Return
Returns 0 if permission is granted.
- intsecurity_vm_enough_memory_mm(structmm_struct*mm,longpages)¶
Check if allocating a new mem map is allowed
Parameters
structmm_struct*mm
mm struct
longpages
number of pages
Description
Check permissions for allocating a new virtual mapping. If all LSMs returna positive value, __vm_enough_memory() will be called with cap_sys_adminset. If at least one LSM returns 0 or negative, __vm_enough_memory() will becalled with cap_sys_admin cleared.
Return
- Returns 0 if permission is granted by the LSM infrastructure to the
caller.
- intsecurity_bprm_creds_for_exec(structlinux_binprm*bprm)¶
Prepare the credentials for exec()
Parameters
structlinux_binprm*bprm
binary program information
Description
If the setup in prepare_exec_creds did not setupbprm->cred->securityproperly for executingbprm->file, update the LSM’s portion ofbprm->cred->security to be what commit_creds needs to install for the newprogram. This hook may also optionally check permissions (e.g. fortransitions between security domains). The hook must setbprm->secureexecto 1 if AT_SECURE should be set to request libc enable secure mode.bprmcontains the linux_binprm structure.
If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check isset. The result must be the same as without this flag even if the executionwill never really happen andbprm will always be dropped.
This hook must not change current->cred, onlybprm->cred.
Return
Returns 0 if the hook is successful and permission is granted.
- intsecurity_bprm_creds_from_file(structlinux_binprm*bprm,conststructfile*file)¶
Update linux_binprm creds based on file
Parameters
structlinux_binprm*bprm
binary program information
conststructfile*file
associated file
Description
Iffile is setpcap, suid, sgid or otherwise marked to change privilege uponexec, updatebprm->cred to reflect that change. This is called afterfinding the binary that will be executed without an interpreter. Thisensures that the credentials will not be derived from a script that thebinary will need to reopen, which when reopend may end up being a completelydifferent file. This hook may also optionally check permissions (e.g. fortransitions between security domains). The hook must setbprm->secureexecto 1 if AT_SECURE should be set to request libc enable secure mode. Thehook must add tobprm->per_clear any personality flags that should becleared from current->personality.bprm contains the linux_binprmstructure.
Return
Returns 0 if the hook is successful and permission is granted.
- intsecurity_bprm_check(structlinux_binprm*bprm)¶
Mediate binary handler search
Parameters
structlinux_binprm*bprm
binary program information
Description
This hook mediates the point when a search for a binary handler will begin.It allows a check against thebprm->cred->security value which was set inthe preceding creds_for_exec call. The argv list and envp list are reliablyavailable inbprm. This hook may be called multiple times during a singleexecve.bprm contains the linux_binprm structure.
Return
Returns 0 if the hook is successful and permission is granted.
- voidsecurity_bprm_committing_creds(conststructlinux_binprm*bprm)¶
Install creds for a process during exec()
Parameters
conststructlinux_binprm*bprm
binary program information
Description
Prepare to install the new security attributes of a process beingtransformed by an execve operation, based on the old credentials pointed tobycurrent->cred and the information set inbprm->cred by thebprm_creds_for_exec hook.bprm points to the linux_binprm structure. Thishook is a good place to perform state changes on the process such as closingopen file descriptors to which access will no longer be granted when theattributes are changed. This is called immediately before commit_creds().
- voidsecurity_bprm_committed_creds(conststructlinux_binprm*bprm)¶
Tidy up after cred install during exec()
Parameters
conststructlinux_binprm*bprm
binary program information
Description
Tidy up after the installation of the new security attributes of a processbeing transformed by an execve operation. The new credentials have, by thispoint, been set tocurrent->cred.bprm points to the linux_binprmstructure. This hook is a good place to perform state changes on theprocess such as clearing out non-inheritable signal state. This is calledimmediately after commit_creds().
- intsecurity_fs_context_submount(structfs_context*fc,structsuper_block*reference)¶
Initialise fc->security
Parameters
structfs_context*fc
new filesystem context
structsuper_block*reference
dentry reference for submount/remount
Description
Fill out the ->security field for a new fs_context.
Return
Returns 0 on success or negative error code on failure.
- intsecurity_fs_context_dup(structfs_context*fc,structfs_context*src_fc)¶
Duplicate a fs_context LSM blob
Parameters
structfs_context*fc
destination filesystem context
structfs_context*src_fc
source filesystem context
Description
Allocate and attach a security structure to sc->security. This pointer isinitialised to NULL by the caller.fc indicates the new filesystem context.src_fc indicates the original filesystem context.
Return
Returns 0 on success or a negative error code on failure.
- intsecurity_fs_context_parse_param(structfs_context*fc,structfs_parameter*param)¶
Configure a filesystem context
Parameters
structfs_context*fc
filesystem context
structfs_parameter*param
filesystem parameter
Description
Userspace provided a parameter to configure a superblock. The LSM canconsume the parameter or return it to the caller for use elsewhere.
Return
- If the parameter is used by the LSM it should return 0, if it is
returned to the caller -ENOPARAM is returned, otherwise a negativeerror code is returned.
- intsecurity_sb_alloc(structsuper_block*sb)¶
Allocate a super_block LSM blob
Parameters
structsuper_block*sb
filesystem superblock
Description
Allocate and attach a security structure to the sb->s_security field. Thes_security field is initialized to NULL when the structure is allocated.sb contains the super_block structure to be modified.
Return
Returns 0 if operation was successful.
- voidsecurity_sb_delete(structsuper_block*sb)¶
Release super_block LSM associated objects
Parameters
structsuper_block*sb
filesystem superblock
Description
Release objects tied to a superblock (e.g. inodes).sb contains thesuper_block structure being released.
- voidsecurity_sb_free(structsuper_block*sb)¶
Free a super_block LSM blob
Parameters
structsuper_block*sb
filesystem superblock
Description
Deallocate and clear the sb->s_security field.sb contains the super_blockstructure to be modified.
- intsecurity_sb_kern_mount(conststructsuper_block*sb)¶
Check if a kernel mount is allowed
Parameters
conststructsuper_block*sb
filesystem superblock
Description
Mount thissb if allowed by permissions.
Return
Returns 0 if permission is granted.
- intsecurity_sb_show_options(structseq_file*m,structsuper_block*sb)¶
Output the mount options for a superblock
Parameters
structseq_file*m
output file
structsuper_block*sb
filesystem superblock
Description
Show (print onm) mount options for thissb.
Return
Returns 0 on success, negative values on failure.
Parameters
structdentry*dentry
superblock handle
Description
Check permission before obtaining filesystem statistics for themntmountpoint.dentry is a handle on the superblock for the filesystem.
Return
Returns 0 if permission is granted.
- intsecurity_sb_mount(constchar*dev_name,conststructpath*path,constchar*type,unsignedlongflags,void*data)¶
Check permission for mounting a filesystem
Parameters
constchar*dev_name
filesystem backing device
conststructpath*path
mount point
constchar*type
filesystem type
unsignedlongflags
mount flags
void*data
filesystem specific data
Description
Check permission before an object specified bydev_name is mounted on themount point named bynd. For an ordinary mount,dev_name identifies adevice if the file system type requires a device. For a remount(flags & MS_REMOUNT),dev_name is irrelevant. For a loopback/bind mount(flags & MS_BIND),dev_name identifies the pathname of the object beingmounted.
Return
Returns 0 if permission is granted.
- intsecurity_sb_umount(structvfsmount*mnt,intflags)¶
Check permission for unmounting a filesystem
Parameters
structvfsmount*mnt
mounted filesystem
intflags
unmount flags
Description
Check permission before themnt file system is unmounted.
Return
Returns 0 if permission is granted.
- intsecurity_sb_pivotroot(conststructpath*old_path,conststructpath*new_path)¶
Check permissions for pivoting the rootfs
Parameters
conststructpath*old_path
new location for current rootfs
conststructpath*new_path
location of the new rootfs
Description
Check permission before pivoting the root filesystem.
Return
Returns 0 if permission is granted.
- intsecurity_move_mount(conststructpath*from_path,conststructpath*to_path)¶
Check permissions for moving a mount
Parameters
conststructpath*from_path
source mount point
conststructpath*to_path
destination mount point
Description
Check permission before a mount is moved.
Return
Returns 0 if permission is granted.
- intsecurity_path_notify(conststructpath*path,u64mask,unsignedintobj_type)¶
Check if setting a watch is allowed
Parameters
conststructpath*path
file path
u64mask
event mask
unsignedintobj_type
file path type
Description
Check permissions before setting a watch on events as defined bymask, onan object atpath, whose type is defined byobj_type.
Return
Returns 0 if permission is granted.
Parameters
structinode*inode
the inode
gfp_tgfp
allocation flags
Description
Allocate and attach a security structure toinode->i_security. Thei_security field is initialized to NULL when the inode structure isallocated.
Return
Return 0 if operation was successful.
Parameters
structinode*inode
the inode
Description
Release any LSM resources associated withinode, although due to theinode’s RCU protections it is possible that the resources will not befully released until after the current RCU grace period has elapsed.
It is important for LSMs to note that despite being present in a call tosecurity_inode_free()
,inode may still be referenced in a VFS path walkand calls tosecurity_inode_permission()
may be made during, or after,a call tosecurity_inode_free()
. For this reason the inode->i_securityfield is released via acall_rcu()
callback and any LSMs which need toretain inode state for use insecurity_inode_permission()
should onlyrelease that state in the inode_free_security_rcu() LSM hook callback.
- intsecurity_inode_init_security_anon(structinode*inode,conststructqstr*name,conststructinode*context_inode)¶
Initialize an anonymous inode
Parameters
structinode*inode
the inode
conststructqstr*name
the anonymous inode class
conststructinode*context_inode
an optional related inode
Description
Set up the incore security field for the new anonymous inode and returnwhether the inode creation is permitted by the security module or not.
Return
Returns 0 on success, -EACCES if the security module denies thecreation of this inode, or another -errno upon other errors.
- voidsecurity_path_post_mknod(structmnt_idmap*idmap,structdentry*dentry)¶
Update inode security after reg file creation
Parameters
structmnt_idmap*idmap
idmap of the mount
structdentry*dentry
new file
Description
Update inode security field after a regular file has been created.
- intsecurity_path_rmdir(conststructpath*dir,structdentry*dentry)¶
Check if removing a directory is allowed
Parameters
conststructpath*dir
parent directory
structdentry*dentry
directory to remove
Description
Check the permission to remove a directory.
Return
Returns 0 if permission is granted.
- intsecurity_path_symlink(conststructpath*dir,structdentry*dentry,constchar*old_name)¶
Check if creating a symbolic link is allowed
Parameters
conststructpath*dir
parent directory
structdentry*dentry
symbolic link
constchar*old_name
file pathname
Description
Check the permission to create a symbolic link to a file.
Return
Returns 0 if permission is granted.
- intsecurity_path_link(structdentry*old_dentry,conststructpath*new_dir,structdentry*new_dentry)¶
Check if creating a hard link is allowed
Parameters
structdentry*old_dentry
existing file
conststructpath*new_dir
new parent directory
structdentry*new_dentry
new link
Description
Check permission before creating a new hard link to a file.
Return
Returns 0 if permission is granted.
Parameters
conststructpath*path
file
Description
Check permission before truncating the file indicated by path. Note thattruncation permissions may also be checked based on already opened files,using thesecurity_file_truncate()
hook.
Return
Returns 0 if permission is granted.
- intsecurity_path_chmod(conststructpath*path,umode_tmode)¶
Check if changing the file’s mode is allowed
Parameters
conststructpath*path
file
umode_tmode
new mode
Description
Check for permission to change a mode of the filepath. The new mode isspecified inmode which is a bitmask of constants from<include/uapi/linux/stat.h>.
Return
Returns 0 if permission is granted.
- intsecurity_path_chown(conststructpath*path,kuid_tuid,kgid_tgid)¶
Check if changing the file’s owner/group is allowed
Parameters
conststructpath*path
file
kuid_tuid
file owner
kgid_tgid
file group
Description
Check for permission to change owner/group of a file or directory.
Return
Returns 0 if permission is granted.
Parameters
conststructpath*path
directory
Description
Check for permission to change root directory.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_create_tmpfile(structmnt_idmap*idmap,structinode*inode)¶
Update inode security of new tmpfile
Parameters
structmnt_idmap*idmap
idmap of the mount
structinode*inode
inode of the new tmpfile
Description
Update inode security data after a tmpfile has been created.
- intsecurity_inode_link(structdentry*old_dentry,structinode*dir,structdentry*new_dentry)¶
Check if creating a hard link is allowed
Parameters
structdentry*old_dentry
existing file
structinode*dir
new parent directory
structdentry*new_dentry
new link
Description
Check permission before creating a new hard link to a file.
Return
Returns 0 if permission is granted.
- intsecurity_inode_unlink(structinode*dir,structdentry*dentry)¶
Check if removing a hard link is allowed
Parameters
structinode*dir
parent directory
structdentry*dentry
file
Description
Check the permission to remove a hard link to a file.
Return
Returns 0 if permission is granted.
- intsecurity_inode_symlink(structinode*dir,structdentry*dentry,constchar*old_name)¶
Check if creating a symbolic link is allowed
Parameters
structinode*dir
parent directory
structdentry*dentry
symbolic link
constchar*old_name
existing filename
Description
Check the permission to create a symbolic link to a file.
Return
Returns 0 if permission is granted.
- intsecurity_inode_rmdir(structinode*dir,structdentry*dentry)¶
Check if removing a directory is allowed
Parameters
structinode*dir
parent directory
structdentry*dentry
directory to be removed
Description
Check the permission to remove a directory.
Return
Returns 0 if permission is granted.
- intsecurity_inode_mknod(structinode*dir,structdentry*dentry,umode_tmode,dev_tdev)¶
Check if creating a special file is allowed
Parameters
structinode*dir
parent directory
structdentry*dentry
new file
umode_tmode
new file mode
dev_tdev
device number
Description
Check permissions when creating a special file (or a socket or a fifo filecreated via the mknod system call). Note that if mknod operation is beingdone for a regular file, then the create hook will be called and not thishook.
Return
Returns 0 if permission is granted.
- intsecurity_inode_rename(structinode*old_dir,structdentry*old_dentry,structinode*new_dir,structdentry*new_dentry,unsignedintflags)¶
Check if renaming a file is allowed
Parameters
structinode*old_dir
parent directory of the old file
structdentry*old_dentry
the old file
structinode*new_dir
parent directory of the new file
structdentry*new_dentry
the new file
unsignedintflags
flags
Description
Check for permission to rename a file or directory.
Return
Returns 0 if permission is granted.
Parameters
structdentry*dentry
link
Description
Check the permission to read the symbolic link.
Return
Returns 0 if permission is granted.
- intsecurity_inode_follow_link(structdentry*dentry,structinode*inode,boolrcu)¶
Check if following a symbolic link is allowed
Parameters
structdentry*dentry
link dentry
structinode*inode
link inode
boolrcu
true if in RCU-walk mode
Description
Check permission to follow a symbolic link when looking up a pathname. Ifrcu is true,inode is not stable.
Return
Returns 0 if permission is granted.
Parameters
structinode*inode
inode
intmask
access mask
Description
Check permission before accessing an inode. This hook is called by theexisting Linux permission function, so a security module can use it toprovide additional checking for existing Linux permission checks. Noticethat this hook is called when a file is opened (as well as many otheroperations), whereas the file_security_ops permission hook is called whenthe actual read/write operations are performed.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_setattr(structmnt_idmap*idmap,structdentry*dentry,intia_valid)¶
Update the inode after a setattr operation
Parameters
structmnt_idmap*idmap
idmap of the mount
structdentry*dentry
file
intia_valid
file attributes set
Description
Update inode security field after successful setting file attributes.
Parameters
conststructpath*path
file
Description
Check permission before obtaining file attributes.
Return
Returns 0 if permission is granted.
- intsecurity_inode_setxattr(structmnt_idmap*idmap,structdentry*dentry,constchar*name,constvoid*value,size_tsize,intflags)¶
Check if setting file xattrs is allowed
Parameters
structmnt_idmap*idmap
idmap of the mount
structdentry*dentry
file
constchar*name
xattr name
constvoid*value
xattr value
size_tsize
size of xattr value
intflags
flags
Description
This hook performs the desired permission checks before setting the extendedattributes (xattrs) ondentry. It is important to note that we have someadditional logic before the main LSM implementation calls to detect if weneed to perform an additional capability check at the LSM layer.
Normally we enforce a capability check prior to executing the various LSMhook implementations, but if a LSM wants to avoid this capability check,it can register a ‘inode_xattr_skipcap’ hook and return a value of 1 forxattrs that it wants to avoid the capability check, leaving the LSM fullyresponsible for enforcing the access control for the specific xattr. If allof the enabled LSMs refrain from registering a ‘inode_xattr_skipcap’ hook,or return a 0 (the default return value), the capability check is stillperformed. If no ‘inode_xattr_skipcap’ hooks are registered the capabilitycheck is performed.
Return
Returns 0 if permission is granted.
- intsecurity_inode_set_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name,structposix_acl*kacl)¶
Check if setting posix acls is allowed
Parameters
structmnt_idmap*idmap
idmap of the mount
structdentry*dentry
file
constchar*acl_name
acl name
structposix_acl*kacl
acl struct
Description
Check permission before setting posix acls, the posix acls inkacl areidentified byacl_name.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_set_acl(structdentry*dentry,constchar*acl_name,structposix_acl*kacl)¶
Update inode security from posix acls set
Parameters
structdentry*dentry
file
constchar*acl_name
acl name
structposix_acl*kacl
acl struct
Description
Update inode security data after successfully setting posix acls ondentry.The posix acls inkacl are identified byacl_name.
- intsecurity_inode_get_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)¶
Check if reading posix acls is allowed
Parameters
structmnt_idmap*idmap
idmap of the mount
structdentry*dentry
file
constchar*acl_name
acl name
Description
Check permission before getting osix acls, the posix acls are identified byacl_name.
Return
Returns 0 if permission is granted.
- intsecurity_inode_remove_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)¶
Check if removing a posix acl is allowed
Parameters
structmnt_idmap*idmap
idmap of the mount
structdentry*dentry
file
constchar*acl_name
acl name
Description
Check permission before removing posix acls, the posix acls are identifiedbyacl_name.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_remove_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)¶
Update inode security after rm posix acls
Parameters
structmnt_idmap*idmap
idmap of the mount
structdentry*dentry
file
constchar*acl_name
acl name
Description
Update inode security data after successfully removing posix acls ondentry inidmap. The posix acls are identified byacl_name.
- voidsecurity_inode_post_setxattr(structdentry*dentry,constchar*name,constvoid*value,size_tsize,intflags)¶
Update the inode after a setxattr operation
Parameters
structdentry*dentry
file
constchar*name
xattr name
constvoid*value
xattr value
size_tsize
xattr value size
intflags
flags
Description
Update inode security field after successful setxattr operation.
Parameters
structdentry*dentry
file
constchar*name
xattr name
Description
Check permission before obtaining the extended attributes identified byname fordentry.
Return
Returns 0 if permission is granted.
Parameters
structdentry*dentry
file
Description
Check permission before obtaining the list of extended attribute names fordentry.
Return
Returns 0 if permission is granted.
- intsecurity_inode_removexattr(structmnt_idmap*idmap,structdentry*dentry,constchar*name)¶
Check if removing an xattr is allowed
Parameters
structmnt_idmap*idmap
idmap of the mount
structdentry*dentry
file
constchar*name
xattr name
Description
This hook performs the desired permission checks before setting the extendedattributes (xattrs) ondentry. It is important to note that we have someadditional logic before the main LSM implementation calls to detect if weneed to perform an additional capability check at the LSM layer.
Normally we enforce a capability check prior to executing the various LSMhook implementations, but if a LSM wants to avoid this capability check,it can register a ‘inode_xattr_skipcap’ hook and return a value of 1 forxattrs that it wants to avoid the capability check, leaving the LSM fullyresponsible for enforcing the access control for the specific xattr. If allof the enabled LSMs refrain from registering a ‘inode_xattr_skipcap’ hook,or return a 0 (the default return value), the capability check is stillperformed. If no ‘inode_xattr_skipcap’ hooks are registered the capabilitycheck is performed.
Return
Returns 0 if permission is granted.
- voidsecurity_inode_post_removexattr(structdentry*dentry,constchar*name)¶
Update the inode after a removexattr op
Parameters
structdentry*dentry
file
constchar*name
xattr name
Description
Update the inode after a successful removexattr operation.
- intsecurity_inode_need_killpriv(structdentry*dentry)¶
Check if
security_inode_killpriv()
required
Parameters
structdentry*dentry
associated dentry
Description
Called when an inode has been changed to determine ifsecurity_inode_killpriv()
should be called.
Return
- Return <0 on error to abort the inode change operation, return 0 if
security_inode_killpriv()
does not need to be called, return >0 ifsecurity_inode_killpriv()
does need to be called.
- intsecurity_inode_killpriv(structmnt_idmap*idmap,structdentry*dentry)¶
The setuid bit is removed, update LSM state
Parameters
structmnt_idmap*idmap
idmap of the mount
structdentry*dentry
associated dentry
Description
Thedentry’s setuid bit is being removed. Remove similar security labels.Called with the dentry->d_inode->i_mutex held.
Return
- Return 0 on success. If error is returned, then the operation
causing setuid bit removal is failed.
- intsecurity_inode_getsecurity(structmnt_idmap*idmap,structinode*inode,constchar*name,void**buffer,boolalloc)¶
Get the xattr security label of an inode
Parameters
structmnt_idmap*idmap
idmap of the mount
structinode*inode
inode
constchar*name
xattr name
void**buffer
security label buffer
boolalloc
allocation flag
Description
Retrieve a copy of the extended attribute representation of the securitylabel associated withname forinode viabuffer. Note thatname is theremainder of the attribute name after the security prefix has been removed.alloc is used to specify if the call should return a value via the bufferor just the value length.
Return
Returns size of buffer on success.
- intsecurity_inode_setsecurity(structinode*inode,constchar*name,constvoid*value,size_tsize,intflags)¶
Set the xattr security label of an inode
Parameters
structinode*inode
inode
constchar*name
xattr name
constvoid*value
security label
size_tsize
length of security label
intflags
flags
Description
Set the security label associated withname forinode from the extendedattribute valuevalue.size indicates the size of thevalue in bytes.flags may be XATTR_CREATE, XATTR_REPLACE, or 0. Note thatname is theremainder of the attribute name after the security. prefix has been removed.
Return
Returns 0 on success.
Parameters
structinode*inode
inode
structlsm_prop*prop
lsm specific information to return
Description
Get the lsm specific information associated with the node.
- intsecurity_kernfs_init_security(structkernfs_node*kn_dir,structkernfs_node*kn)¶
Init LSM context for a kernfs node
Parameters
structkernfs_node*kn_dir
parent kernfs node
structkernfs_node*kn
the kernfs node to initialize
Description
Initialize the security context of a newly created kernfs node based on itsown and its parent’s attributes.
Return
Returns 0 if permission is granted.
Parameters
structfile*file
file
intmask
requested permissions
Description
Check file permissions before accessing an open file. This hook is calledby various operations that read or write files. A security module can usethis hook to perform additional checking on these operations, e.g. torevalidate permissions on use to support privilege bracketing or policychanges. Notice that this hook is used when the actual read/writeoperations are performed, whereas the inode_security_ops hook is called whena file is opened (as well as many other operations). Although this hook canbe used to revalidate permissions for various system call operations thatread or write files, it does not address the revalidation of permissions formemory-mapped files. Security modules must handle this separately if theyneed such revalidation.
Return
Returns 0 if permission is granted.
Parameters
structfile*file
the file
Description
Allocate and attach a security structure to the file->f_security field. Thesecurity field is initialized to NULL when the structure is first created.
Return
Return 0 if the hook is successful and permission is granted.
Parameters
structfile*file
the file
Description
Perform actions before releasing the last reference to a file.
Parameters
structfile*file
the file
Description
Deallocate and free any security structures stored in file->f_security.
- intsecurity_mmap_file(structfile*file,unsignedlongprot,unsignedlongflags)¶
Check if mmap’ing a file is allowed
Parameters
structfile*file
file
unsignedlongprot
protection applied by the kernel
unsignedlongflags
flags
Description
Check permissions for a mmap operation. Thefile may be NULL, e.g. ifmapping anonymous memory.
Return
Returns 0 if permission is granted.
- intsecurity_mmap_addr(unsignedlongaddr)¶
Check if mmap’ing an address is allowed
Parameters
unsignedlongaddr
address
Description
Check permissions for a mmap operation ataddr.
Return
Returns 0 if permission is granted.
- intsecurity_file_mprotect(structvm_area_struct*vma,unsignedlongreqprot,unsignedlongprot)¶
Check if changing memory protections is allowed
Parameters
structvm_area_struct*vma
memory region
unsignedlongreqprot
application requested protection
unsignedlongprot
protection applied by the kernel
Description
Check permissions before changing memory access permissions.
Return
Returns 0 if permission is granted.
Parameters
structfile*file
file
unsignedintcmd
lock operation (e.g. F_RDLCK, F_WRLCK)
Description
Check permission before performing file locking operations. Note the hookmediates both flock and fcntl style locks.
Return
Returns 0 if permission is granted.
- intsecurity_file_fcntl(structfile*file,unsignedintcmd,unsignedlongarg)¶
Check if fcntl() op is allowed
Parameters
structfile*file
file
unsignedintcmd
fcntl command
unsignedlongarg
command argument
Description
Check permission before allowing the file operation specified bycmd frombeing performed on the filefile. Note thatarg sometimes represents auser space pointer; in other cases, it may be a simple integer value. Whenarg represents a user space pointer, it should never be used by thesecurity module.
Return
Returns 0 if permission is granted.
Parameters
structfile*file
the file
Description
Save owner security information (typically from current->security) infile->f_security for later use by the send_sigiotask hook.
This hook is called with file->f_owner.lock held.
Return
Returns 0 on success.
- intsecurity_file_send_sigiotask(structtask_struct*tsk,structfown_struct*fown,intsig)¶
Check if sending SIGIO/SIGURG is allowed
Parameters
structtask_struct*tsk
target task
structfown_struct*fown
signal sender
intsig
signal to be sent, SIGIO is sent if 0
Description
Check permission for the file ownerfown to send SIGIO or SIGURG to theprocesstsk. Note that this hook is sometimes called from interrupt. Notethat the fown_struct,fown, is never outside the context of astructfile
,so the file structure (and associated security information) can always beobtained: container_of(fown,structfile
, f_owner).
Return
Returns 0 if permission is granted.
Parameters
structfile*file
file being received
Description
This hook allows security modules to control the ability of a process toreceive an open file descriptor via socket IPC.
Return
Returns 0 if permission is granted.
Parameters
structfile*file
Description
Save open-time permission checking state for later use upon file_permission,and recheck access if anything has changed since inode_permission.
We can check if a file is opened for execution (e.g. execve(2) call), eitherdirectly or indirectly (e.g. ELF’s ld.so) by checking file->f_flags &__FMODE_EXEC .
Return
Returns 0 if permission is granted.
Parameters
structfile*file
file
Description
Check permission before truncating a file, i.e. using ftruncate. Note thattruncation permission may also be checked based on the path, using thepath_truncate hook.
Return
Returns 0 if permission is granted.
- intsecurity_task_alloc(structtask_struct*task,unsignedlongclone_flags)¶
Allocate a task’s LSM blob
Parameters
structtask_struct*task
the task
unsignedlongclone_flags
flags indicating what is being shared
Description
Handle allocation of task-related resources.
Return
Returns a zero on success, negative values on failure.
- voidsecurity_task_free(structtask_struct*task)¶
Free a task’s LSM blob and related resources
Parameters
structtask_struct*task
task
Description
Handle release of task-related resources. Note that this can be called frominterrupt context.
- intsecurity_cred_alloc_blank(structcred*cred,gfp_tgfp)¶
Allocate the min memory to allow cred_transfer
Parameters
structcred*cred
credentials
gfp_tgfp
gfp flags
Description
Only allocate sufficient memory and attach tocred such thatcred_transfer() will not get ENOMEM.
Return
Returns 0 on success, negative values on failure.
Parameters
structcred*cred
credentials
Description
Deallocate and clear the cred->security field in a set of credentials.
- intsecurity_prepare_creds(structcred*new,conststructcred*old,gfp_tgfp)¶
Prepare a new set of credentials
Parameters
structcred*new
new credentials
conststructcred*old
original credentials
gfp_tgfp
gfp flags
Description
Prepare a new set of credentials by copying the data from the old set.
Return
Returns 0 on success, negative values on failure.
- voidsecurity_transfer_creds(structcred*new,conststructcred*old)¶
Transfer creds
Parameters
structcred*new
target credentials
conststructcred*old
original credentials
Description
Transfer data from original creds to new creds.
- intsecurity_kernel_act_as(structcred*new,u32secid)¶
Set the kernel credentials to act as secid
Parameters
structcred*new
credentials
u32secid
secid
Description
Set the credentials for a kernel service to act as (subjective context).The current task must be the one that nominatedsecid.
Return
Returns 0 if successful.
- intsecurity_kernel_create_files_as(structcred*new,structinode*inode)¶
Set file creation context using an inode
Parameters
structcred*new
target credentials
structinode*inode
reference inode
Description
Set the file creation context in a set of credentials to be the same as theobjective context of the specified inode. The current task must be the onethat nominatedinode.
Return
Returns 0 if successful.
- intsecurity_kernel_module_request(char*kmod_name)¶
Check if loading a module is allowed
Parameters
char*kmod_name
module name
Description
Ability to trigger the kernel to automatically upcall to userspace foruserspace to load a kernel module with the given name.
Return
Returns 0 if successful.
- intsecurity_task_fix_setuid(structcred*new,conststructcred*old,intflags)¶
Update LSM with new user id attributes
Parameters
structcred*new
updated credentials
conststructcred*old
credentials being replaced
intflags
LSM_SETID_* flag values
Description
Update the module’s state after setting one or more of the user identityattributes of the current process. Theflags parameter indicates which ofthe set*uid system calls invoked this hook. Ifnew is the set ofcredentials that will be installed. Modifications should be made to thisrather than tocurrent->cred.
Return
Returns 0 on success.
- intsecurity_task_fix_setgid(structcred*new,conststructcred*old,intflags)¶
Update LSM with new group id attributes
Parameters
structcred*new
updated credentials
conststructcred*old
credentials being replaced
intflags
LSM_SETID_* flag value
Description
Update the module’s state after setting one or more of the group identityattributes of the current process. Theflags parameter indicates which ofthe set*gid system calls invoked this hook.new is the set of credentialsthat will be installed. Modifications should be made to this rather than tocurrent->cred.
Return
Returns 0 on success.
- intsecurity_task_fix_setgroups(structcred*new,conststructcred*old)¶
Update LSM with new supplementary groups
Parameters
structcred*new
updated credentials
conststructcred*old
credentials being replaced
Description
Update the module’s state after setting the supplementary group identityattributes of the current process.new is the set of credentials that willbe installed. Modifications should be made to this rather than tocurrent->cred.
Return
Returns 0 on success.
- intsecurity_task_setpgid(structtask_struct*p,pid_tpgid)¶
Check if setting the pgid is allowed
Parameters
structtask_struct*p
task being modified
pid_tpgid
new pgid
Description
Check permission before setting the process group identifier of the processp topgid.
Return
Returns 0 if permission is granted.
- intsecurity_task_getpgid(structtask_struct*p)¶
Check if getting the pgid is allowed
Parameters
structtask_struct*p
task
Description
Check permission before getting the process group identifier of the processp.
Return
Returns 0 if permission is granted.
- intsecurity_task_getsid(structtask_struct*p)¶
Check if getting the session id is allowed
Parameters
structtask_struct*p
task
Description
Check permission before getting the session identifier of the processp.
Return
Returns 0 if permission is granted.
- intsecurity_task_setnice(structtask_struct*p,intnice)¶
Check if setting a task’s nice value is allowed
Parameters
structtask_struct*p
target task
intnice
nice value
Description
Check permission before setting the nice value ofp tonice.
Return
Returns 0 if permission is granted.
- intsecurity_task_setioprio(structtask_struct*p,intioprio)¶
Check if setting a task’s ioprio is allowed
Parameters
structtask_struct*p
target task
intioprio
ioprio value
Description
Check permission before setting the ioprio value ofp toioprio.
Return
Returns 0 if permission is granted.
- intsecurity_task_getioprio(structtask_struct*p)¶
Check if getting a task’s ioprio is allowed
Parameters
structtask_struct*p
task
Description
Check permission before getting the ioprio value ofp.
Return
Returns 0 if permission is granted.
- intsecurity_task_prlimit(conststructcred*cred,conststructcred*tcred,unsignedintflags)¶
Check if get/setting resources limits is allowed
Parameters
conststructcred*cred
current task credentials
conststructcred*tcred
target task credentials
unsignedintflags
LSM_PRLIMIT_* flag bits indicating a get/set/both
Description
Check permission before getting and/or setting the resource limits ofanother task.
Return
Returns 0 if permission is granted.
- intsecurity_task_setrlimit(structtask_struct*p,unsignedintresource,structrlimit*new_rlim)¶
Check if setting a new rlimit value is allowed
Parameters
structtask_struct*p
target task’s group leader
unsignedintresource
resource whose limit is being set
structrlimit*new_rlim
new resource limit
Description
Check permission before setting the resource limits of processp forresource tonew_rlim. The old resource limit values can be examined bydereferencing (p->signal->rlim + resource).
Return
Returns 0 if permission is granted.
- intsecurity_task_setscheduler(structtask_struct*p)¶
Check if setting sched policy/param is allowed
Parameters
structtask_struct*p
target task
Description
Check permission before setting scheduling policy and/or parameters ofprocessp.
Return
Returns 0 if permission is granted.
- intsecurity_task_getscheduler(structtask_struct*p)¶
Check if getting scheduling info is allowed
Parameters
structtask_struct*p
target task
Description
Check permission before obtaining scheduling information for processp.
Return
Returns 0 if permission is granted.
- intsecurity_task_movememory(structtask_struct*p)¶
Check if moving memory is allowed
Parameters
structtask_struct*p
task
Description
Check permission before moving memory owned by processp.
Return
Returns 0 if permission is granted.
- intsecurity_task_kill(structtask_struct*p,structkernel_siginfo*info,intsig,conststructcred*cred)¶
Check if sending a signal is allowed
Parameters
structtask_struct*p
target process
structkernel_siginfo*info
signal information
intsig
signal value
conststructcred*cred
credentials of the signal sender, NULL ifcurrent
Description
Check permission before sending signalsig top.info can be NULL, theconstant 1, or a pointer to a kernel_siginfo structure. Ifinfo is 1 orSI_FROMKERNEL(info) is true, then the signal should be viewed as coming fromthe kernel and should typically be permitted. SIGIO signals are handledseparately by the send_sigiotask hook in file_security_ops.
Return
Returns 0 if permission is granted.
- intsecurity_task_prctl(intoption,unsignedlongarg2,unsignedlongarg3,unsignedlongarg4,unsignedlongarg5)¶
Check if a prctl op is allowed
Parameters
intoption
operation
unsignedlongarg2
argument
unsignedlongarg3
argument
unsignedlongarg4
argument
unsignedlongarg5
argument
Description
Check permission before performing a process control operation on thecurrent process.
Return
- Return -ENOSYS if no-one wanted to handle this op, any other value
to cause prctl() to return immediately with that value.
- voidsecurity_task_to_inode(structtask_struct*p,structinode*inode)¶
Set the security attributes of a task’s inode
Parameters
structtask_struct*p
task
structinode*inode
inode
Description
Set the security attributes for an inode based on an associated task’ssecurity attributes, e.g. for /proc/pid inodes.
Parameters
conststructcred*cred
prepared creds
Description
Check permission prior to creating a new user namespace.
Return
Returns 0 if successful, otherwise < 0 error code.
- intsecurity_ipc_permission(structkern_ipc_perm*ipcp,shortflag)¶
Check if sysv ipc access is allowed
Parameters
structkern_ipc_perm*ipcp
ipc permission structure
shortflag
requested permissions
Description
Check permissions for access to IPC.
Return
Returns 0 if permission is granted.
- voidsecurity_ipc_getlsmprop(structkern_ipc_perm*ipcp,structlsm_prop*prop)¶
Get the sysv ipc object LSM data
Parameters
structkern_ipc_perm*ipcp
ipc permission structure
structlsm_prop*prop
pointer to lsm information
Description
Get the lsm information associated with the ipc object.
- intsecurity_msg_msg_alloc(structmsg_msg*msg)¶
Allocate a sysv ipc message LSM blob
Parameters
structmsg_msg*msg
message structure
Description
Allocate and attach a security structure to the msg->security field. Thesecurity field is initialized to NULL when the structure is first created.
Return
Return 0 if operation was successful and permission is granted.
- voidsecurity_msg_msg_free(structmsg_msg*msg)¶
Free a sysv ipc message LSM blob
Parameters
structmsg_msg*msg
message structure
Description
Deallocate the security structure for this message.
- intsecurity_msg_queue_alloc(structkern_ipc_perm*msq)¶
Allocate a sysv ipc msg queue LSM blob
Parameters
structkern_ipc_perm*msq
sysv ipc permission structure
Description
Allocate and attach a security structure tomsg. The security field isinitialized to NULL when the structure is first created.
Return
Returns 0 if operation was successful and permission is granted.
- voidsecurity_msg_queue_free(structkern_ipc_perm*msq)¶
Free a sysv ipc msg queue LSM blob
Parameters
structkern_ipc_perm*msq
sysv ipc permission structure
Description
Deallocate security fieldperm->security for the message queue.
- intsecurity_msg_queue_associate(structkern_ipc_perm*msq,intmsqflg)¶
Check if a msg queue operation is allowed
Parameters
structkern_ipc_perm*msq
sysv ipc permission structure
intmsqflg
operation flags
Description
Check permission when a message queue is requested through the msgget systemcall. This hook is only called when returning the message queue identifierfor an existing message queue, not when a new message queue is created.
Return
Return 0 if permission is granted.
- intsecurity_msg_queue_msgctl(structkern_ipc_perm*msq,intcmd)¶
Check if a msg queue operation is allowed
Parameters
structkern_ipc_perm*msq
sysv ipc permission structure
intcmd
operation
Description
Check permission when a message control operation specified bycmd is to beperformed on the message queue with permissions.
Return
Returns 0 if permission is granted.
- intsecurity_msg_queue_msgsnd(structkern_ipc_perm*msq,structmsg_msg*msg,intmsqflg)¶
Check if sending a sysv ipc message is allowed
Parameters
structkern_ipc_perm*msq
sysv ipc permission structure
structmsg_msg*msg
message
intmsqflg
operation flags
Description
Check permission before a message,msg, is enqueued on the message queuewith permissions specified inmsq.
Return
Returns 0 if permission is granted.
- intsecurity_msg_queue_msgrcv(structkern_ipc_perm*msq,structmsg_msg*msg,structtask_struct*target,longtype,intmode)¶
Check if receiving a sysv ipc msg is allowed
Parameters
structkern_ipc_perm*msq
sysv ipc permission structure
structmsg_msg*msg
message
structtask_struct*target
target task
longtype
type of message requested
intmode
operation flags
Description
Check permission before a message,msg, is removed from the message queue.Thetarget task structure contains a pointer to the process that will bereceiving the message (not equal to the current process when inline receivesare being performed).
Return
Returns 0 if permission is granted.
- intsecurity_shm_alloc(structkern_ipc_perm*shp)¶
Allocate a sysv shm LSM blob
Parameters
structkern_ipc_perm*shp
sysv ipc permission structure
Description
Allocate and attach a security structure to theshp security field. Thesecurity field is initialized to NULL when the structure is first created.
Return
Returns 0 if operation was successful and permission is granted.
- voidsecurity_shm_free(structkern_ipc_perm*shp)¶
Free a sysv shm LSM blob
Parameters
structkern_ipc_perm*shp
sysv ipc permission structure
Description
Deallocate the security structureperm->security for the memory segment.
- intsecurity_shm_associate(structkern_ipc_perm*shp,intshmflg)¶
Check if a sysv shm operation is allowed
Parameters
structkern_ipc_perm*shp
sysv ipc permission structure
intshmflg
operation flags
Description
Check permission when a shared memory region is requested through the shmgetsystem call. This hook is only called when returning the shared memoryregion identifier for an existing region, not when a new shared memoryregion is created.
Return
Returns 0 if permission is granted.
- intsecurity_shm_shmctl(structkern_ipc_perm*shp,intcmd)¶
Check if a sysv shm operation is allowed
Parameters
structkern_ipc_perm*shp
sysv ipc permission structure
intcmd
operation
Description
Check permission when a shared memory control operation specified bycmd isto be performed on the shared memory region with permissions inshp.
Return
Return 0 if permission is granted.
- intsecurity_shm_shmat(structkern_ipc_perm*shp,char__user*shmaddr,intshmflg)¶
Check if a sysv shm attach operation is allowed
Parameters
structkern_ipc_perm*shp
sysv ipc permission structure
char__user*shmaddr
address of memory region to attach
intshmflg
operation flags
Description
Check permissions prior to allowing the shmat system call to attach theshared memory segment with permissionsshp to the data segment of thecalling process. The attaching address is specified byshmaddr.
Return
Returns 0 if permission is granted.
- intsecurity_sem_alloc(structkern_ipc_perm*sma)¶
Allocate a sysv semaphore LSM blob
Parameters
structkern_ipc_perm*sma
sysv ipc permission structure
Description
Allocate and attach a security structure to thesma security field. Thesecurity field is initialized to NULL when the structure is first created.
Return
Returns 0 if operation was successful and permission is granted.
- voidsecurity_sem_free(structkern_ipc_perm*sma)¶
Free a sysv semaphore LSM blob
Parameters
structkern_ipc_perm*sma
sysv ipc permission structure
Description
Deallocate security structuresma->security for the semaphore.
- intsecurity_sem_associate(structkern_ipc_perm*sma,intsemflg)¶
Check if a sysv semaphore operation is allowed
Parameters
structkern_ipc_perm*sma
sysv ipc permission structure
intsemflg
operation flags
Description
Check permission when a semaphore is requested through the semget systemcall. This hook is only called when returning the semaphore identifier foran existing semaphore, not when a new one must be created.
Return
Returns 0 if permission is granted.
- intsecurity_sem_semctl(structkern_ipc_perm*sma,intcmd)¶
Check if a sysv semaphore operation is allowed
Parameters
structkern_ipc_perm*sma
sysv ipc permission structure
intcmd
operation
Description
Check permission when a semaphore operation specified bycmd is to beperformed on the semaphore.
Return
Returns 0 if permission is granted.
- intsecurity_sem_semop(structkern_ipc_perm*sma,structsembuf*sops,unsignednsops,intalter)¶
Check if a sysv semaphore operation is allowed
Parameters
structkern_ipc_perm*sma
sysv ipc permission structure
structsembuf*sops
operations to perform
unsignednsops
number of operations
intalter
flag indicating changes will be made
Description
Check permissions before performing operations on members of the semaphoreset. If thealter flag is nonzero, the semaphore set may be modified.
Return
Returns 0 if permission is granted.
- intsecurity_getselfattr(unsignedintattr,structlsm_ctx__user*uctx,u32__user*size,u32flags)¶
Read an LSM attribute of the current process.
Parameters
unsignedintattr
which attribute to return
structlsm_ctx__user*uctx
the user-space destination for the information, or NULL
u32__user*size
pointer to the size of space available to receive the data
u32flags
special handling options. LSM_FLAG_SINGLE indicates that onlyattributes associated with the LSM identified in the passedctx bereported.
Description
A NULL value foructx can be used to get both the number of attributesand the size of the data.
Returns the number of attributes found on success, negative valueon error.size is reset to the total size of the data.Ifsize is insufficient to contain the data -E2BIG is returned.
- intsecurity_setselfattr(unsignedintattr,structlsm_ctx__user*uctx,u32size,u32flags)¶
Set an LSM attribute on the current process.
Parameters
unsignedintattr
which attribute to set
structlsm_ctx__user*uctx
the user-space source for the information
u32size
the size of the data
u32flags
reserved for future use, must be 0
Description
Set an LSM attribute for the current process. The LSM, attributeand new value are included inuctx.
Returns 0 on success, -EINVAL if the input is inconsistent, -EFAULTif the user buffer is inaccessible, E2BIG if size is too big, or anLSM specific failure.
- intsecurity_getprocattr(structtask_struct*p,intlsmid,constchar*name,char**value)¶
Read an attribute for a task
Parameters
structtask_struct*p
the task
intlsmid
LSM identification
constchar*name
attribute name
char**value
attribute value
Description
Read attributename for taskp and store it intovalue if allowed.
Return
Returns the length ofvalue on success, a negative value otherwise.
- intsecurity_setprocattr(intlsmid,constchar*name,void*value,size_tsize)¶
Set an attribute for a task
Parameters
intlsmid
LSM identification
constchar*name
attribute name
void*value
attribute value
size_tsize
attribute value size
Description
Write (set) the current task’s attributename tovalue, sizesize ifallowed.
Return
Returns bytes written on success, a negative value otherwise.
- intsecurity_post_notification(conststructcred*w_cred,conststructcred*cred,structwatch_notification*n)¶
Check if a watch notification can be posted
Parameters
conststructcred*w_cred
credentials of the task that set the watch
conststructcred*cred
credentials of the task which triggered the watch
structwatch_notification*n
the notification
Description
Check to see if a watch notification can be posted to a particular queue.
Return
Returns 0 if permission is granted.
Parameters
structkey*key
the key to watch
Description
Check to see if a process is allowed to watch for event notifications froma key or keyring.
Return
Returns 0 if permission is granted.
- intsecurity_netlink_send(structsock*sk,structsk_buff*skb)¶
Save info and check if netlink sending is allowed
Parameters
structsock*sk
sending socket
structsk_buff*skb
netlink message
Description
Save security information for a netlink message so that permission checkingcan be performed when the message is processed. The security informationcan be saved using the eff_cap field of the netlink_skb_parms structure.Also may be used to provide fine grained control over message transmission.
Return
- Returns 0 if the information was successfully saved and message is
allowed to be transmitted.
- intsecurity_socket_create(intfamily,inttype,intprotocol,intkern)¶
Check if creating a new socket is allowed
Parameters
intfamily
protocol family
inttype
communications type
intprotocol
requested protocol
intkern
set to 1 if a kernel socket is requested
Description
Check permissions prior to creating a new socket.
Return
Returns 0 if permission is granted.
- intsecurity_socket_post_create(structsocket*sock,intfamily,inttype,intprotocol,intkern)¶
Initialize a newly created socket
Parameters
structsocket*sock
socket
intfamily
protocol family
inttype
communications type
intprotocol
requested protocol
intkern
set to 1 if a kernel socket is requested
Description
This hook allows a module to update or allocate a per-socket securitystructure. Note that the security field was not added directly to the socketstructure, but rather, the socket security information is stored in theassociated inode. Typically, the inode alloc_security hook will allocateand attach security information to SOCK_INODE(sock)->i_security. This hookmay be used to update the SOCK_INODE(sock)->i_security field with additionalinformation that wasn’t available when the inode was allocated.
Return
Returns 0 if permission is granted.
- intsecurity_socket_bind(structsocket*sock,structsockaddr*address,intaddrlen)¶
Check if a socket bind operation is allowed
Parameters
structsocket*sock
socket
structsockaddr*address
requested bind address
intaddrlen
length of address
Description
Check permission before socket protocol layer bind operation is performedand the socketsock is bound to the address specified in theaddressparameter.
Return
Returns 0 if permission is granted.
- intsecurity_socket_connect(structsocket*sock,structsockaddr*address,intaddrlen)¶
Check if a socket connect operation is allowed
Parameters
structsocket*sock
socket
structsockaddr*address
address of remote connection point
intaddrlen
length of address
Description
Check permission before socket protocol layer connect operation attempts toconnect socketsock to a remote address,address.
Return
Returns 0 if permission is granted.
Parameters
structsocket*sock
socket
intbacklog
connection queue size
Description
Check permission before socket protocol layer listen operation.
Return
Returns 0 if permission is granted.
- intsecurity_socket_accept(structsocket*sock,structsocket*newsock)¶
Check if a socket is allowed to accept connections
Parameters
structsocket*sock
listening socket
structsocket*newsock
newly creation connection socket
Description
Check permission before accepting a new connection. Note that the newsocket,newsock, has been created and some information copied to it, butthe accept operation has not actually been performed.
Return
Returns 0 if permission is granted.
- intsecurity_socket_sendmsg(structsocket*sock,structmsghdr*msg,intsize)¶
Check if sending a message is allowed
Parameters
structsocket*sock
sending socket
structmsghdr*msg
message to send
intsize
size of message
Description
Check permission before transmitting a message to another socket.
Return
Returns 0 if permission is granted.
- intsecurity_socket_recvmsg(structsocket*sock,structmsghdr*msg,intsize,intflags)¶
Check if receiving a message is allowed
Parameters
structsocket*sock
receiving socket
structmsghdr*msg
message to receive
intsize
size of message
intflags
operational flags
Description
Check permission before receiving a message from a socket.
Return
Returns 0 if permission is granted.
Parameters
structsocket*sock
socket
Description
Check permission before reading the local address (name) of the socketobject.
Return
Returns 0 if permission is granted.
Parameters
structsocket*sock
socket
Description
Check permission before the remote address (name) of a socket object.
Return
Returns 0 if permission is granted.
- intsecurity_socket_getsockopt(structsocket*sock,intlevel,intoptname)¶
Check if reading a socket option is allowed
Parameters
structsocket*sock
socket
intlevel
option’s protocol level
intoptname
option name
Description
Check permissions before retrieving the options associated with socketsock.
Return
Returns 0 if permission is granted.
- intsecurity_socket_setsockopt(structsocket*sock,intlevel,intoptname)¶
Check if setting a socket option is allowed
Parameters
structsocket*sock
socket
intlevel
option’s protocol level
intoptname
option name
Description
Check permissions before setting the options associated with socketsock.
Return
Returns 0 if permission is granted.
Parameters
structsocket*sock
socket
inthow
flag indicating how sends and receives are handled
Description
Checks permission before all or part of a connection on the socketsock isshut down.
Return
Returns 0 if permission is granted.
- intsecurity_socket_getpeersec_stream(structsocket*sock,sockptr_toptval,sockptr_toptlen,unsignedintlen)¶
Get the remote peer label
Parameters
structsocket*sock
socket
sockptr_toptval
destination buffer
sockptr_toptlen
size of peer label copied into the buffer
unsignedintlen
maximum size of the destination buffer
Description
This hook allows the security module to provide peer socket security statefor unix or connected tcp sockets to userspace via getsockopt SO_GETPEERSEC.For tcp sockets this can be meaningful if the socket is associated with anipsec SA.
Return
- Returns 0 if all is well, otherwise, typical getsockopt return
values.
Parameters
structsock*sock
the sock that needs a blob
gfp_tgfp
allocation mode
Description
Allocate the sock blob for all the modules
Returns 0, or -ENOMEM if memory can’t be allocated.
- intsecurity_sk_alloc(structsock*sk,intfamily,gfp_tpriority)¶
Allocate and initialize a sock’s LSM blob
Parameters
structsock*sk
sock
intfamily
protocol family
gfp_tpriority
gfp flags
Description
Allocate and attach a security structure to the sk->sk_security field, whichis used to copy security attributes between local stream sockets.
Return
Returns 0 on success, error on failure.
Parameters
structsock*sk
sock
Description
Deallocate security structure.
- voidsecurity_inet_csk_clone(structsock*newsk,conststructrequest_sock*req)¶
Set new sock LSM state based on request_sock
Parameters
structsock*newsk
new sock
conststructrequest_sock*req
connection request_sock
Description
Set that LSM state ofsock using the LSM state fromreq.
- intsecurity_mptcp_add_subflow(structsock*sk,structsock*ssk)¶
Inherit the LSM label from the MPTCP socket
Parameters
structsock*sk
the owning MPTCP socket
structsock*ssk
the new subflow
Description
Update the labeling for the given MPTCP subflow, to match the one of theowning MPTCP socket. This hook has to be called after the socket creation andinitialization via thesecurity_socket_create()
andsecurity_socket_post_create()
LSM hooks.
Return
Returns 0 on success or a negative error code on failure.
- intsecurity_xfrm_policy_clone(structxfrm_sec_ctx*old_ctx,structxfrm_sec_ctx**new_ctxp)¶
Clone xfrm policy LSM state
Parameters
structxfrm_sec_ctx*old_ctx
xfrm security context
structxfrm_sec_ctx**new_ctxp
target xfrm security context
Description
Allocate a security structure in new_ctxp that contains the information fromthe old_ctx structure.
Return
Return 0 if operation was successful.
- intsecurity_xfrm_policy_delete(structxfrm_sec_ctx*ctx)¶
Check if deleting a xfrm policy is allowed
Parameters
structxfrm_sec_ctx*ctx
xfrm security context
Description
Authorize deletion of a SPD entry.
Return
Returns 0 if permission is granted.
- intsecurity_xfrm_state_alloc_acquire(structxfrm_state*x,structxfrm_sec_ctx*polsec,u32secid)¶
Allocate a xfrm state LSM blob
Parameters
structxfrm_state*x
xfrm state being added to the SAD
structxfrm_sec_ctx*polsec
associated policy’s security context
u32secid
secid from the flow
Description
Allocate a security structure to the x->security field; the security fieldis initialized to NULL when the xfrm_state is allocated. Set the context tocorrespond to secid.
Return
Returns 0 if operation was successful.
- voidsecurity_xfrm_state_free(structxfrm_state*x)¶
Free a xfrm state
Parameters
structxfrm_state*x
xfrm state
Description
Deallocate x->security.
- intsecurity_xfrm_policy_lookup(structxfrm_sec_ctx*ctx,u32fl_secid)¶
Check if using a xfrm policy is allowed
Parameters
structxfrm_sec_ctx*ctx
target xfrm security context
u32fl_secid
flow secid used to authorize access
Description
Check permission when a flow selects a xfrm_policy for processing XFRMs on apacket. The hook is called when selecting either a per-socket policy or ageneric xfrm policy.
Return
- Return 0 if permission is granted, -ESRCH otherwise, or -errno on
other errors.
- intsecurity_xfrm_state_pol_flow_match(structxfrm_state*x,structxfrm_policy*xp,conststructflowi_common*flic)¶
Check for a xfrm match
Parameters
structxfrm_state*x
xfrm state to match
structxfrm_policy*xp
xfrm policy to check for a match
conststructflowi_common*flic
flow to check for a match.
Description
Checkxp andflic for a match withx.
Return
Returns 1 if there is a match.
Parameters
structsk_buff*skb
xfrm packet
u32*secid
secid
Description
Decode the packet inskb and return the security label insecid.
Return
Return 0 if all xfrms used have the same secid.
- intsecurity_key_alloc(structkey*key,conststructcred*cred,unsignedlongflags)¶
Allocate and initialize a kernel key LSM blob
Parameters
structkey*key
key
conststructcred*cred
credentials
unsignedlongflags
allocation flags
Description
Permit allocation of a key and assign security data. Note that key does nothave a serial number assigned at this point.
Return
Return 0 if permission is granted, -ve error otherwise.
Parameters
structkey*key
key
Description
Notification of destruction; free security data.
- intsecurity_key_permission(key_ref_tkey_ref,conststructcred*cred,enumkey_need_permneed_perm)¶
Check if a kernel key operation is allowed
Parameters
key_ref_tkey_ref
key reference
conststructcred*cred
credentials of actor requesting access
enumkey_need_permneed_perm
requested permissions
Description
See whether a specific operational right is granted to a process on a key.
Return
Return 0 if permission is granted, -ve error otherwise.
Parameters
structkey*key
key
char**buffer
security label buffer
Description
Get a textual representation of the security context attached to a key forthe purposes of honouring KEYCTL_GETSECURITY. This function allocates thestorage for the NUL-terminated string and the caller should free it.
Return
- Returns the length ofbuffer (including terminating NUL) or -ve if
an error occurs. May also return 0 (and a NULL buffer pointer) ifthere is no security label assigned to the key.
- voidsecurity_key_post_create_or_update(structkey*keyring,structkey*key,constvoid*payload,size_tpayload_len,unsignedlongflags,boolcreate)¶
Notification of key create or update
Parameters
structkey*keyring
keyring to which the key is linked to
structkey*key
created or updated key
constvoid*payload
data used to instantiate or update the key
size_tpayload_len
length of payload
unsignedlongflags
key flags
boolcreate
flag indicating whether the key was created or updated
Description
Notify the caller of a key creation or update.
- intsecurity_audit_rule_init(u32field,u32op,char*rulestr,void**lsmrule,gfp_tgfp)¶
Allocate and init an LSM audit rule struct
Parameters
u32field
audit action
u32op
rule operator
char*rulestr
rule context
void**lsmrule
receive buffer for audit rule struct
gfp_tgfp
GFP flag used for kmalloc
Description
Allocate and initialize an LSM audit rule structure.
Return
- Return 0 iflsmrule has been successfully set, -EINVAL in case of
an invalid rule.
- intsecurity_audit_rule_known(structaudit_krule*krule)¶
Check if an audit rule contains LSM fields
Parameters
structaudit_krule*krule
audit rule
Description
Specifies whether givenkrule contains any fields related to the currentLSM.
Return
Returns 1 in case of relation found, 0 otherwise.
- voidsecurity_audit_rule_free(void*lsmrule)¶
Free an LSM audit rule struct
Parameters
void*lsmrule
audit rule struct
Description
Deallocate the LSM audit rule structure previously allocated byaudit_rule_init().
- intsecurity_audit_rule_match(structlsm_prop*prop,u32field,u32op,void*lsmrule)¶
Check if a label matches an audit rule
Parameters
structlsm_prop*prop
security label
u32field
LSM audit field
u32op
matching operator
void*lsmrule
audit rule
Description
Determine if givensecid matches a rule previously approved bysecurity_audit_rule_known()
.
Return
- Returns 1 if secid matches the rule, 0 if it does not, -ERRNO on
failure.
- intsecurity_bpf(intcmd,unionbpf_attr*attr,unsignedintsize,boolkernel)¶
Check if the bpf syscall operation is allowed
Parameters
intcmd
command
unionbpf_attr*attr
bpf attribute
unsignedintsize
size
boolkernel
whether or not call originated from kernel
Description
Do a initial check for all bpf syscalls after the attribute is copied intothe kernel. The actual security module can implement their own rules tocheck the specific cmd they need.
Return
Returns 0 if permission is granted.
- intsecurity_bpf_map(structbpf_map*map,fmode_tfmode)¶
Check if access to a bpf map is allowed
Parameters
structbpf_map*map
bpf map
fmode_tfmode
mode
Description
Do a check when the kernel generates and returns a file descriptor for eBPFmaps.
Return
Returns 0 if permission is granted.
- intsecurity_bpf_prog(structbpf_prog*prog)¶
Check if access to a bpf program is allowed
Parameters
structbpf_prog*prog
bpf program
Description
Do a check when the kernel generates and returns a file descriptor for eBPFprograms.
Return
Returns 0 if permission is granted.
- intsecurity_bpf_map_create(structbpf_map*map,unionbpf_attr*attr,structbpf_token*token,boolkernel)¶
Check if BPF map creation is allowed
Parameters
structbpf_map*map
BPF map object
unionbpf_attr*attr
BPF syscall attributes used to create BPF map
structbpf_token*token
BPF token used to grant user access
boolkernel
whether or not call originated from kernel
Description
Do a check when the kernel creates a new BPF map. This is also thepoint where LSM blob is allocated for LSMs that need them.
Return
Returns 0 on success, error on failure.
- intsecurity_bpf_prog_load(structbpf_prog*prog,unionbpf_attr*attr,structbpf_token*token,boolkernel)¶
Check if loading of BPF program is allowed
Parameters
structbpf_prog*prog
BPF program object
unionbpf_attr*attr
BPF syscall attributes used to create BPF program
structbpf_token*token
BPF token used to grant user access to BPF subsystem
boolkernel
whether or not call originated from kernel
Description
Perform an access control check when the kernel loads a BPF program andallocates associated BPF program object. This hook is also responsible forallocating any required LSM state for the BPF program.
Return
Returns 0 on success, error on failure.
- intsecurity_bpf_token_create(structbpf_token*token,unionbpf_attr*attr,conststructpath*path)¶
Check if creating of BPF token is allowed
Parameters
structbpf_token*token
BPF token object
unionbpf_attr*attr
BPF syscall attributes used to create BPF token
conststructpath*path
path pointing to BPF FS mount point from which BPF token is created
Description
Do a check when the kernel instantiates a new BPF token object from BPF FSinstance. This is also the point where LSM blob can be allocated for LSMs.
Return
Returns 0 on success, error on failure.
- intsecurity_bpf_token_cmd(conststructbpf_token*token,enumbpf_cmdcmd)¶
Check if BPF token is allowed to delegate requested BPF syscall command
Parameters
conststructbpf_token*token
BPF token object
enumbpf_cmdcmd
BPF syscall command requested to be delegated by BPF token
Description
Do a check when the kernel decides whether provided BPF token should allowdelegation of requested BPF syscall command.
Return
Returns 0 on success, error on failure.
- intsecurity_bpf_token_capable(conststructbpf_token*token,intcap)¶
Check if BPF token is allowed to delegate requested BPF-related capability
Parameters
conststructbpf_token*token
BPF token object
intcap
capabilities requested to be delegated by BPF token
Description
Do a check when the kernel decides whether provided BPF token should allowdelegation of requested BPF-related capabilities.
Return
Returns 0 on success, error on failure.
- voidsecurity_bpf_map_free(structbpf_map*map)¶
Free a bpf map’s LSM blob
Parameters
structbpf_map*map
bpf map
Description
Clean up the security information stored inside bpf map.
- voidsecurity_bpf_prog_free(structbpf_prog*prog)¶
Free a BPF program’s LSM blob
Parameters
structbpf_prog*prog
BPF program struct
Description
Clean up the security information stored inside BPF program.
- voidsecurity_bpf_token_free(structbpf_token*token)¶
Free a BPF token’s LSM blob
Parameters
structbpf_token*token
BPF token struct
Description
Clean up the security information stored inside BPF token.
- intsecurity_perf_event_open(inttype)¶
Check if a perf event open is allowed
Parameters
inttype
type of event
Description
Check whether thetype of perf_event_open syscall is allowed.
Return
Returns 0 if permission is granted.
- intsecurity_perf_event_alloc(structperf_event*event)¶
Allocate a perf event LSM blob
Parameters
structperf_event*event
perf event
Description
Allocate and save perf_event security info.
Return
Returns 0 on success, error on failure.
- voidsecurity_perf_event_free(structperf_event*event)¶
Free a perf event LSM blob
Parameters
structperf_event*event
perf event
Description
Release (free) perf_event security info.
- intsecurity_perf_event_read(structperf_event*event)¶
Check if reading a perf event label is allowed
Parameters
structperf_event*event
perf event
Description
Read perf_event security info if allowed.
Return
Returns 0 if permission is granted.
- intsecurity_perf_event_write(structperf_event*event)¶
Check if writing a perf event label is allowed
Parameters
structperf_event*event
perf event
Description
Write perf_event security info if allowed.
Return
Returns 0 if permission is granted.
- intsecurity_uring_override_creds(conststructcred*new)¶
Check if overriding creds is allowed
Parameters
conststructcred*new
new credentials
Description
Check if the current task, executing an io_uring operation, is allowed tooverride it’s credentials withnew.
Return
Returns 0 if permission is granted.
- intsecurity_uring_sqpoll(void)¶
Check if IORING_SETUP_SQPOLL is allowed
Parameters
void
no arguments
Description
Check whether the current task is allowed to spawn a io_uring polling thread(IORING_SETUP_SQPOLL).
Return
Returns 0 if permission is granted.
- intsecurity_uring_cmd(structio_uring_cmd*ioucmd)¶
Check if a io_uring passthrough command is allowed
Parameters
structio_uring_cmd*ioucmd
command
Description
Check whether the file_operations uring_cmd is allowed to run.
Return
Returns 0 if permission is granted.
- intsecurity_uring_allowed(void)¶
Check if io_uring_setup() is allowed
Parameters
void
no arguments
Description
Check whether the current task is allowed to call io_uring_setup().
Return
Returns 0 if permission is granted.
- voidsecurity_initramfs_populated(void)¶
Notify LSMs that initramfs has been loaded
Parameters
void
no arguments
Description
Tells the LSMs the initramfs has been unpacked into the rootfs.
- structdentry*securityfs_create_file(constchar*name,umode_tmode,structdentry*parent,void*data,conststructfile_operations*fops)¶
create a file in the securityfs filesystem
Parameters
constchar*name
a pointer to a string containing the name of the file to create.
umode_tmode
the permission that the file should have
structdentry*parent
a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter is
NULL
, then thefile will be created in the root of the securityfs filesystem.void*data
a pointer to something that the caller will want to get to lateron. The inode.i_private pointer will point to this value onthe open() call.
conststructfile_operations*fops
a pointer to a struct file_operations that should be used forthis file.
Description
This function creates a file in securityfs with the givenname.
This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove()
function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).
If securityfs is not enabled in the kernel, the value-ENODEV
isreturned.
- structdentry*securityfs_create_dir(constchar*name,structdentry*parent)¶
create a directory in the securityfs filesystem
Parameters
constchar*name
a pointer to a string containing the name of the directory tocreate.
structdentry*parent
a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter is
NULL
, then thedirectory will be created in the root of the securityfs filesystem.
Description
This function creates a directory in securityfs with the givenname.
This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove()
function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).
If securityfs is not enabled in the kernel, the value-ENODEV
isreturned.
- structdentry*securityfs_create_symlink(constchar*name,structdentry*parent,constchar*target,conststructinode_operations*iops)¶
create a symlink in the securityfs filesystem
Parameters
constchar*name
a pointer to a string containing the name of the symlink tocreate.
structdentry*parent
a pointer to the parent dentry for the symlink. This should be adirectory dentry if set. If this parameter is
NULL
, then thedirectory will be created in the root of the securityfs filesystem.constchar*target
a pointer to a string containing the name of the symlink’s target.If this parameter is
NULL
, then theiops parameter needs to besetup to handle .readlink and .get_link inode_operations.conststructinode_operations*iops
a pointer to the struct inode_operations to use for the symlink. Ifthis parameter is
NULL
, then the default simple_symlink_inodeoperations will be used.
Description
This function creates a symlink in securityfs with the givenname.
This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove()
function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).
If securityfs is not enabled in the kernel, the value-ENODEV
isreturned.
- voidsecurityfs_remove(structdentry*dentry)¶
removes a file or directory from the securityfs filesystem
Parameters
structdentry*dentry
a pointer to a the dentry of the file or directory to be removed.
Description
This function removes a file or directory in securityfs that was previouslycreated with a call to another securityfs function (likesecurityfs_create_file()
or variants thereof.)
This function is required to be called in order for the file to beremoved. No automatic cleanup of files will happen when a module isremoved; you are responsible here.
Parameters
structdentry*dentry
a pointer to a the dentry of the file or directory to be removed.
Description
This function recursively removes a file or directory in securityfs that waspreviously created with a call to another securityfs function (likesecurityfs_create_file()
or variants thereof.)
Audit Interfaces¶
- structaudit_buffer*audit_log_start(structaudit_context*ctx,gfp_tgfp_mask,inttype)¶
obtain an audit buffer
Parameters
structaudit_context*ctx
audit_context (may be NULL)
gfp_tgfp_mask
type of allocation
inttype
audit message type
Description
Returns audit_buffer pointer on success or NULL on error.
Obtain an audit buffer. This routine does locking to obtain theaudit buffer, but then no locking is required for calls toaudit_log_*format. If the task (ctx) is a task that is currently in asyscall, then the syscall is marked as auditable and an audit recordwill be written at syscall exit. If there is no associated task, thentask context (ctx) should be NULL.
- voidaudit_log_format(structaudit_buffer*ab,constchar*fmt,...)¶
format a message into the audit buffer.
Parameters
structaudit_buffer*ab
audit_buffer
constchar*fmt
format string
...
optional parameters matchingfmt string
Description
All the work is done in audit_log_vformat.
- voidaudit_log_end(structaudit_buffer*ab)¶
end one audit record
Parameters
structaudit_buffer*ab
the audit_buffer
Description
We can not do a netlink send inside an irq context because it blocks (lastarg, flags, is not set to MSG_DONTWAIT), so the audit buffer is placed on aqueue and a kthread is scheduled to remove them from the queue outside theirq context. May be called in any context.
- voidaudit_log(structaudit_context*ctx,gfp_tgfp_mask,inttype,constchar*fmt,...)¶
Log an audit record
Parameters
structaudit_context*ctx
audit context
gfp_tgfp_mask
type of allocation
inttype
audit message type
constchar*fmt
format string to use
...
variable parameters matching the format string
Description
This is a convenience function that calls audit_log_start,audit_log_vformat, and audit_log_end. It may be calledin any context.
- int__audit_filter_op(structtask_struct*tsk,structaudit_context*ctx,structlist_head*list,structaudit_names*name,unsignedlongop)¶
common filter helper for operations (syscall/uring/etc)
Parameters
structtask_struct*tsk
associated task
structaudit_context*ctx
audit context
structlist_head*list
audit filter list
structaudit_names*name
audit_name (can be NULL)
unsignedlongop
current syscall/uring_op
Description
Run the udit filters specified inlist againsttsk usingctx,name, andop, as necessary; the caller is responsible for ensuringthat the call is made while the RCU read lock is held. Thenameparameter can be NULL, but all others must be specified.Returns 1/true if the filter finds a match, 0/false if none are found.
- voidaudit_filter_uring(structtask_struct*tsk,structaudit_context*ctx)¶
apply filters to an io_uring operation
Parameters
structtask_struct*tsk
associated task
structaudit_context*ctx
audit context
- voidaudit_reset_context(structaudit_context*ctx)¶
reset a audit_context structure
Parameters
structaudit_context*ctx
the audit_context to reset
Description
All fields in the audit_context will be reset to an initial state, allreferences held by fields will be dropped, and private memory will bereleased. When this function returns the audit_context will be suitablefor reuse, so long as the passed context is not NULL or a dummy context.
- intaudit_alloc(structtask_struct*tsk)¶
allocate an audit context block for a task
Parameters
structtask_struct*tsk
task
Description
Filter on the task information and allocate a per-task audit contextif necessary. Doing so turns on system call auditing for thespecified task. This is called from copy_process, so no lock isneeded.
- voidaudit_log_uring(structaudit_context*ctx)¶
generate a AUDIT_URINGOP record
Parameters
structaudit_context*ctx
the audit context
- void__audit_free(structtask_struct*tsk)¶
free a per-task audit context
Parameters
structtask_struct*tsk
task whose audit context block to free
Description
Called from copy_process, do_exit, and the io_uring code
- voidaudit_return_fixup(structaudit_context*ctx,intsuccess,longcode)¶
fixup the return codes in the audit_context
Parameters
structaudit_context*ctx
the audit_context
intsuccess
true/false value to indicate if the operation succeeded or not
longcode
operation return code
Description
We need to fixup the return code in the audit logs if the actual returncodes are later going to be fixed by the arch specific signal handlers.
- void__audit_uring_entry(u8op)¶
prepare the kernel task’s audit context for io_uring
Parameters
u8op
the io_uring opcode
Description
This is similar to audit_syscall_entry() but is intended for use by io_uringoperations. This function should only ever be called fromaudit_uring_entry() as we rely on the audit context checking present in thatfunction.
- void__audit_uring_exit(intsuccess,longcode)¶
wrap up the kernel task’s audit context after io_uring
Parameters
intsuccess
true/false value to indicate if the operation succeeded or not
longcode
operation return code
Description
This is similar to audit_syscall_exit() but is intended for use by io_uringoperations. This function should only ever be called fromaudit_uring_exit() as we rely on the audit context checking present in thatfunction.
- void__audit_syscall_entry(intmajor,unsignedlonga1,unsignedlonga2,unsignedlonga3,unsignedlonga4)¶
fill in an audit record at syscall entry
Parameters
intmajor
major syscall type (function)
unsignedlonga1
additional syscall register 1
unsignedlonga2
additional syscall register 2
unsignedlonga3
additional syscall register 3
unsignedlonga4
additional syscall register 4
Description
Fill in audit context at syscall entry. This only happens if theaudit context was created when the task was created and the state orfilters demand the audit context be built. If the state from theper-task filter or from the per-syscall filter is AUDIT_STATE_RECORD,then the record will be written at syscall exit time (otherwise, itwill only be written if another part of the kernel requests that itbe written).
- void__audit_syscall_exit(intsuccess,longreturn_code)¶
deallocate audit context after a system call
Parameters
intsuccess
success value of the syscall
longreturn_code
return value of the syscall
Description
Tear down after system call. If the audit context has been marked asauditable (either because of the AUDIT_STATE_RECORD state fromfiltering, or because some other part of the kernel wrote an auditmessage), then write out the syscall information. In call cases,free the names stored from getname().
- structfilename*__audit_reusename(__userconstchar*uptr)¶
fill out filename with info from existing entry
Parameters
const__userchar*uptr
userland ptr to pathname
Description
Search the audit_names list for the current audit context. If there is anexisting entry with a matching “uptr” then return the filenameassociated with that audit_name. If not, return NULL.
- void__audit_getname(structfilename*name)¶
add a name to the list
Parameters
structfilename*name
name to add
Description
Add a name to the list of audit names for this context.Called from fs/namei.c:getname().
- void__audit_inode(structfilename*name,conststructdentry*dentry,unsignedintflags)¶
store the inode and device from a lookup
Parameters
structfilename*name
name being audited
conststructdentry*dentry
dentry being audited
unsignedintflags
attributes for this particular entry
- intauditsc_get_stamp(structaudit_context*ctx,structtimespec64*t,unsignedint*serial)¶
get local copies of audit_context values
Parameters
structaudit_context*ctx
audit_context for the task
structtimespec64*t
timespec64 to store time recorded in the audit_context
unsignedint*serial
serial value that is recorded in the audit_context
Description
Also sets the context as auditable.
- void__audit_mq_open(intoflag,umode_tmode,structmq_attr*attr)¶
record audit data for a POSIX MQ open
Parameters
intoflag
open flag
umode_tmode
mode bits
structmq_attr*attr
queue attributes
- void__audit_mq_sendrecv(mqd_tmqdes,size_tmsg_len,unsignedintmsg_prio,conststructtimespec64*abs_timeout)¶
record audit data for a POSIX MQ timed send/receive
Parameters
mqd_tmqdes
MQ descriptor
size_tmsg_len
Message length
unsignedintmsg_prio
Message priority
conststructtimespec64*abs_timeout
Message timeout in absolute time
- void__audit_mq_notify(mqd_tmqdes,conststructsigevent*notification)¶
record audit data for a POSIX MQ notify
Parameters
mqd_tmqdes
MQ descriptor
conststructsigevent*notification
Notification event
- void__audit_mq_getsetattr(mqd_tmqdes,structmq_attr*mqstat)¶
record audit data for a POSIX MQ get/set attribute
Parameters
mqd_tmqdes
MQ descriptor
structmq_attr*mqstat
MQ flags
- void__audit_ipc_obj(structkern_ipc_perm*ipcp)¶
record audit data for ipc object
Parameters
structkern_ipc_perm*ipcp
ipc permissions
- void__audit_ipc_set_perm(unsignedlongqbytes,uid_tuid,gid_tgid,umode_tmode)¶
record audit data for new ipc permissions
Parameters
unsignedlongqbytes
msgq bytes
uid_tuid
msgq user id
gid_tgid
msgq group id
umode_tmode
msgq mode (permissions)
Description
Called only after audit_ipc_obj().
- int__audit_socketcall(intnargs,unsignedlong*args)¶
record audit data for sys_socketcall
Parameters
intnargs
number of args, which should not be more than AUDITSC_ARGS.
unsignedlong*args
args array
- void__audit_fd_pair(intfd1,intfd2)¶
record audit data for pipe and socketpair
Parameters
intfd1
the first file descriptor
intfd2
the second file descriptor
- int__audit_sockaddr(intlen,void*a)¶
record audit data for sys_bind, sys_connect, sys_sendto
Parameters
intlen
data length in user space
void*a
data address in kernel space
Description
Returns 0 for success or NULL context or < 0 on error.
- intaudit_signal_info_syscall(structtask_struct*t)¶
record signal info for syscalls
Parameters
structtask_struct*t
task being signaled
Description
If the audit subsystem is being terminated, record the task (pid)and uid that is doing that.
- int__audit_log_bprm_fcaps(structlinux_binprm*bprm,conststructcred*new,conststructcred*old)¶
store information about a loading bprm and relevant fcaps
Parameters
structlinux_binprm*bprm
pointer to the bprm being processed
conststructcred*new
the proposed new credentials
conststructcred*old
the old credentials
Description
Simply check if the proc already has the caps given by the file and if notstore the priv escalation info for later auditing at the end of the syscall
-Eric
- void__audit_log_capset(conststructcred*new,conststructcred*old)¶
store information about the arguments to the capset syscall
Parameters
conststructcred*new
the new credentials
conststructcred*old
the old (current) credentials
Description
Record the arguments userspace sent to sys_capset for later printing by theaudit system if applicable
- voidaudit_core_dumps(longsignr)¶
record information about processes that end abnormally
Parameters
longsignr
signal value
Description
If a process ends with a core dump, something fishy is going on and weshould record the event for investigation.
- voidaudit_seccomp(unsignedlongsyscall,longsignr,intcode)¶
record information about a seccomp action
Parameters
unsignedlongsyscall
syscall number
longsignr
signal value
intcode
the seccomp action
Description
Record the information associated with a seccomp action. Event filtering forseccomp actions that are not to be logged is done in seccomp_log().Therefore, this function forces auditing independent of the audit_enabledand dummy context state because seccomp actions should be logged even whenaudit is not in use.
- intaudit_rule_change(inttype,intseq,void*data,size_tdatasz)¶
apply all rules to the specified message type
Parameters
inttype
audit message type
intseq
netlink audit message sequence (serial) number
void*data
payload data
size_tdatasz
size of payload data
Parameters
structsk_buff*request_skb
skb of request we are replying to (used to target the reply)
intseq
netlink audit message sequence (serial) number
- intparent_len(constchar*path)¶
find the length of the parent portion of a pathname
Parameters
constchar*path
pathname of which to determine length
- intaudit_compare_dname_path(conststructqstr*dname,constchar*path,intparentlen)¶
compare given dentry name with last component in given path. Return of 0 indicates a match.
Parameters
conststructqstr*dname
dentry name that we’re comparing
constchar*path
full pathname that we’re comparing
intparentlen
length of the parent if known. Passing in AUDIT_NAME_FULLhere indicates that we must compute this value.
Accounting Framework¶
- longsys_acct(constchar__user*name)¶
enable/disable process accounting
Parameters
constchar__user*name
file name for accounting records or NULL to shutdown accounting
Description
sys_acct()
is the only system call needed to implement processaccounting. It takes the name of the file where accounting recordsshould be written. If the filename is NULL, accounting will beshutdown.
Return
0 for success or negative errno values for failure.
- voidacct_collect(longexitcode,intgroup_dead)¶
collect accounting information into pacct_struct
Parameters
longexitcode
task exit code
intgroup_dead
not 0, if this thread is the last one in the process.
- voidacct_process(void)¶
handles process accounting for an exiting task
Parameters
void
no arguments
Block Devices¶
Parameters
structbio*bio
bio to advance
unsignedintnbytes
number of bytes to complete
Description
This updates bi_sector, bi_size and bi_idx; if the number of bytes tocomplete doesn’t align with a bvec boundary, then bv_len and bv_offset willbe updated on the last bvec as well.
bio will then represent the remaining, uncompleted portion of the io.
- structfolio_iter¶
State for iterating all folios in a bio.
Definition:
struct folio_iter { struct folio *folio; size_t offset; size_t length;};
Members
folio
The current folio we’re iterating. NULL after the last folio.
offset
The byte offset within the current folio.
length
The number of bytes in this iteration (will not cross folioboundary).
- bio_for_each_folio_all¶
bio_for_each_folio_all(fi,bio)
Iterate over each folio in a bio.
Parameters
fi
structfolio_iter
which is updated for each folio.bio
struct bio to iterate over.
- structbio*bio_next_split(structbio*bio,intsectors,gfp_tgfp,structbio_set*bs)¶
get nextsectors from a bio, splitting if necessary
Parameters
structbio*bio
bio to split
intsectors
number of sectors to split from the front ofbio
gfp_tgfp
gfp mask
structbio_set*bs
bio set to allocate from
Return
a bio representing the nextsectors ofbio - if the bio is smallerthansectors, returns the original bio unchanged.
- unsignedintbio_add_max_vecs(void*kaddr,unsignedintlen)¶
number of bio_vecs needed to add data to a bio
Parameters
void*kaddr
kernel virtual address to add
unsignedintlen
length in bytes to add
Description
Calculate how many bio_vecs need to be allocated to add the kernel virtualaddress range in [kaddr:len] in the worse case.
Parameters
structbio*bio
bio to check
Description
Check ifbio is a zone append operation. Core block layer code and end_iohandlers must use this instead of an open coded REQ_OP_ZONE_APPEND checkbecause the block layer can rewrite REQ_OP_ZONE_APPEND to REQ_OP_WRITE ifit is not natively supported.
- voidblk_queue_flag_set(unsignedintflag,structrequest_queue*q)¶
atomically set a queue flag
Parameters
unsignedintflag
flag to be set
structrequest_queue*q
request queue
- voidblk_queue_flag_clear(unsignedintflag,structrequest_queue*q)¶
atomically clear a queue flag
Parameters
unsignedintflag
flag to be cleared
structrequest_queue*q
request queue
- constchar*blk_op_str(enumreq_opop)¶
Return string XXX in the REQ_OP_XXX.
Parameters
enumreq_opop
REQ_OP_XXX.
Description
Centralize block layer function to convert REQ_OP_XXX intostring format. Useful in the debugging and tracing bio or request. Forinvalid REQ_OP_XXX it returns string “UNKNOWN”.
- voidblk_sync_queue(structrequest_queue*q)¶
cancel any pending callbacks on a queue
Parameters
structrequest_queue*q
the queue
Description
The block layer may perform asynchronous callback activityon a queue, such as calling the unplug function after a timeout.A block device may call blk_sync_queue to ensure that anysuch activity is cancelled, thus allowing it to release resourcesthat the callbacks might use. The caller must already have made surethat its ->submit_bio will not re-add plugging prior to callingthis function.
This function does not cancel any asynchronous activity arisingout of elevator or throttling code. That would require elevator_exit()and blkcg_exit_queue() to be called with queue lock initialized.
- voidblk_set_pm_only(structrequest_queue*q)¶
increment pm_only counter
Parameters
structrequest_queue*q
request queue pointer
- voidblk_put_queue(structrequest_queue*q)¶
decrement the request_queue refcount
Parameters
structrequest_queue*q
the request_queue structure to decrement the refcount for
Description
Decrements the refcount of the request_queue and free it when the refcountreaches 0.
- boolblk_get_queue(structrequest_queue*q)¶
increment the request_queue refcount
Parameters
structrequest_queue*q
the request_queue structure to increment the refcount for
Description
Increment the refcount of the request_queue kobject.
Context
Any context.
Parameters
structbio*bio
The bio describing the location in memory and on the device.
Description
This is a version ofsubmit_bio()
that shall only be used for I/O that isresubmitted to lower level drivers by stacking block drivers. All filesystems and other upper level users of the block layer should usesubmit_bio()
instead.
Parameters
structbio*bio
The
structbio
which describes the I/O
Description
submit_bio()
is used to submit I/O requests to block devices. It is passed afully set upstructbio
that describes the I/O that needs to be done. Thebio will be send to the device described by the bi_bdev field.
The success/failure status of the request, along with notification ofcompletion, is delivered asynchronously through the ->bi_end_io() callbackinbio. The bio must NOT be touched by the caller until ->bi_end_io() hasbeen called.
Parameters
structbio*bio
bio to poll for
structio_comp_batch*iob
batches of IO
unsignedintflags
BLK_POLL_* flags that control the behavior
Description
Poll for completions on queue associated with the bio. Returns number ofcompleted entries found.
Note
the caller must either be the context that submittedbio, orbe in a RCU critical section to prevent freeing ofbio.
Parameters
structbio*bio
bio to start account for
Description
Returns the start time that should be passed back to bio_end_io_acct().
- intblk_lld_busy(structrequest_queue*q)¶
Check if underlying low-level drivers of a device are busy
Parameters
structrequest_queue*q
the queue of the device being checked
Description
Check if underlying low-level drivers of a device are busy.If the drivers want to export their busy state, they must set ownexporting function using blk_queue_lld_busy() first.
Basically, this function is used only by request stacking driversto stop dispatching requests to underlying devices when underlyingdevices are busy. This behavior helps more I/O merging on the queueof the request stacking driver and prevents I/O throughput regressionon burst I/O load.
Return
0 - Not busy (The request stacking driver should dispatch request)1 - Busy (The request stacking driver should stop dispatching request)
- voidblk_start_plug(structblk_plug*plug)¶
initialize blk_plug and track it inside the task_struct
Parameters
structblk_plug*plug
The
structblk_plug
that needs to be initialized
Description
blk_start_plug()
indicates to the block layer an intent by the callerto submit multiple I/O requests in a batch. The block layer may usethis hint to defer submitting I/Os from the caller untilblk_finish_plug()
is called. However, the block layer may choose to submit requestsbefore a call toblk_finish_plug()
if the number of queued I/OsexceedsBLK_MAX_REQUEST_COUNT
, or if the size of the I/O is larger thanBLK_PLUG_FLUSH_SIZE
. The queued I/Os may also be submitted early ifthe task schedules (see below).Tracking blk_plug inside the task_struct will help with auto-flushing thepending I/O should the task end up blocking between
blk_start_plug()
andblk_finish_plug()
. This is important from a performance perspective, butalso ensures that we don’t deadlock. For instance, if the task is blockingfor a memory allocation, memory reclaim could end up wanting to free apage belonging to that request that is currently residing in our privateplug. By flushing the pending I/O when the process goes to sleep, we avoidthis kind of deadlock.
- voidblk_finish_plug(structblk_plug*plug)¶
mark the end of a batch of submitted I/O
Parameters
structblk_plug*plug
The
structblk_plug
passed toblk_start_plug()
Description
Indicate that a batch of I/O submissions is complete. This functionmust be paired with an initial call toblk_start_plug()
. The intentis to allow the block layer to optimize I/O submission. See thedocumentation forblk_start_plug()
for more information.
- intblk_queue_enter(structrequest_queue*q,blk_mq_req_flags_tflags)¶
try to increase q->q_usage_counter
Parameters
structrequest_queue*q
request queue pointer
blk_mq_req_flags_tflags
BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PM
- intblk_rq_map_user_iov(structrequest_queue*q,structrequest*rq,structrq_map_data*map_data,conststructiov_iter*iter,gfp_tgfp_mask)¶
map user data to a request, for passthrough requests
Parameters
structrequest_queue*q
request queue where request should be inserted
structrequest*rq
request to map data to
structrq_map_data*map_data
pointer to the rq_map_data holding pages (if necessary)
conststructiov_iter*iter
iovec iterator
gfp_tgfp_mask
memory allocation flags
Description
Data will be mapped directly for zero copy I/O, if possible. Otherwisea kernel bounce buffer is used.
A matching
blk_rq_unmap_user()
must be issued at the end of I/O, whilestill in process context.
Parameters
structbio*bio
start of bio list
Description
Unmap a rq previously mapped by blk_rq_map_user(). The caller mustsupply the original rq->bio from the blk_rq_map_user() return, sincethe I/O completion may have changed rq->bio.
- intblk_rq_map_kern(structrequest*rq,void*kbuf,unsignedintlen,gfp_tgfp_mask)¶
map kernel data to a request, for passthrough requests
Parameters
structrequest*rq
request to fill
void*kbuf
the kernel buffer
unsignedintlen
length of user data
gfp_tgfp_mask
memory allocation flags
Description
Data will be mapped directly if possible. Otherwise a bouncebuffer is used. Can be called multiple times to append multiplebuffers.
- intblk_register_queue(structgendisk*disk)¶
register a block layer queue with sysfs
Parameters
structgendisk*disk
Disk of which the request queue should be registered with sysfs.
- voidblk_unregister_queue(structgendisk*disk)¶
counterpart of
blk_register_queue()
Parameters
structgendisk*disk
Disk of which the request queue should be unregistered from sysfs.
Note
the caller is responsible for guaranteeing that this function is calledafterblk_register_queue()
has finished.
- voidblk_set_stacking_limits(structqueue_limits*lim)¶
set default limits for stacking devices
Parameters
structqueue_limits*lim
the queue_limits structure to reset
Description
Prepare queue limits for applying limits from underlying devices usingblk_stack_limits()
.
- intqueue_limits_commit_update(structrequest_queue*q,structqueue_limits*lim)¶
commit an atomic update of queue limits
Parameters
structrequest_queue*q
queue to update
structqueue_limits*lim
limits to apply
Description
Apply the limits inlim that were obtained from queue_limits_start_update()and updated by the caller toq. The caller must have frozen the queue orensure that there are no outstanding I/Os by other means.
Returns 0 if successful, else a negative error code.
- intqueue_limits_commit_update_frozen(structrequest_queue*q,structqueue_limits*lim)¶
commit an atomic update of queue limits
Parameters
structrequest_queue*q
queue to update
structqueue_limits*lim
limits to apply
Description
Apply the limits inlim that were obtained from queue_limits_start_update()and updated with the new values by the caller toq. Freezes the queuebefore the update and unfreezes it after.
Returns 0 if successful, else a negative error code.
- intqueue_limits_set(structrequest_queue*q,structqueue_limits*lim)¶
apply queue limits to queue
Parameters
structrequest_queue*q
queue to update
structqueue_limits*lim
limits to apply
Description
Apply the limits inlim that were freshly initialized toq.To update existing limits use queue_limits_start_update() andqueue_limits_commit_update()
instead.
Returns 0 if successful, else a negative error code.
- intblk_stack_limits(structqueue_limits*t,structqueue_limits*b,sector_tstart)¶
adjust queue_limits for stacked devices
Parameters
structqueue_limits*t
the stacking driver limits (top device)
structqueue_limits*b
the underlying queue limits (bottom, component device)
sector_tstart
first data sector within component device
Description
This function is used by stacking drivers like MD and DM to ensurethat all component devices have compatible block sizes andalignments. The stacking driver must provide a queue_limitsstruct (top) and then iteratively call the stacking function forall component (bottom) devices. The stacking function willattempt to combine the values and ensure proper alignment.
Returns 0 if the top and bottom queue_limits are compatible. Thetop device’s block sizes and alignment offsets may be adjusted toensure alignment with the bottom device. If no compatible sizesand alignments exist, -1 is returned and the resulting topqueue_limits will have the misaligned flag set to indicate thatthe alignment_offset is undefined.
- voidqueue_limits_stack_bdev(structqueue_limits*t,structblock_device*bdev,sector_toffset,constchar*pfx)¶
adjust queue_limits for stacked devices
Parameters
structqueue_limits*t
the stacking driver limits (top device)
structblock_device*bdev
the underlying block device (bottom)
sector_toffset
offset to beginning of data within component device
constchar*pfx
prefix to use for warnings logged
Description
This function is used by stacking drivers like MD and DM to ensurethat all component devices have compatible block sizes andalignments. The stacking driver must provide a queue_limitsstruct (top) and then iteratively call the stacking function forall component (bottom) devices. The stacking function willattempt to combine the values and ensure proper alignment.
- boolqueue_limits_stack_integrity(structqueue_limits*t,structqueue_limits*b)¶
stack integrity profile
Parameters
structqueue_limits*t
target queue limits
structqueue_limits*b
base queue limits
Description
Check if the integrity profile in theb can be stacked into thetargett. Stacking is possible if either:
does not have any integrity information stacked into it yet
the integrity profile inb is identical to the one int
Ifb can be stacked intot, returntrue
. Else returnfalse
and clear theintegrity information int.
- voidblk_set_queue_depth(structrequest_queue*q,unsignedintdepth)¶
tell the block layer about the device queue depth
Parameters
structrequest_queue*q
the request queue for the device
unsignedintdepth
queue depth
- intblkdev_issue_flush(structblock_device*bdev)¶
queue a flush
Parameters
structblock_device*bdev
blockdev to issue flush for
Description
Issue a flush for the block device in question.
- intblkdev_issue_discard(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask)¶
queue a discard
Parameters
structblock_device*bdev
blockdev to issue discard for
sector_tsector
start sector
sector_tnr_sects
number of sectors to discard
gfp_tgfp_mask
memory allocation flags (for bio_alloc)
Description
Issue a discard request for the sectors in question.
- int__blkdev_issue_zeroout(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask,structbio**biop,unsignedflags)¶
generate number of zero filed write bios
Parameters
structblock_device*bdev
blockdev to issue
sector_tsector
start sector
sector_tnr_sects
number of sectors to write
gfp_tgfp_mask
memory allocation flags (for bio_alloc)
structbio**biop
pointer to anchor bio
unsignedflags
controls detailed behavior
Description
Zero-fill a block range, either using hardware offload or by explicitlywriting zeroes to the device.
If a device is using logical block provisioning, the underlying space willnot be released if
flags
contains BLKDEV_ZERO_NOUNMAP.If
flags
contains BLKDEV_ZERO_NOFALLBACK, the function will return-EOPNOTSUPP if no explicit hardware offload for zeroing is provided.
- intblkdev_issue_zeroout(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask,unsignedflags)¶
zero-fill a block range
Parameters
structblock_device*bdev
blockdev to write
sector_tsector
start sector
sector_tnr_sects
number of sectors to write
gfp_tgfp_mask
memory allocation flags (for bio_alloc)
unsignedflags
controls detailed behavior
Description
Zero-fill a block range, either using hardware offload or by explicitlywriting zeroes to the device. See
__blkdev_issue_zeroout()
for thevalid values forflags
.
- intblk_rq_map_integrity_sg(structrequest*rq,structscatterlist*sglist)¶
Map integrity metadata into a scatterlist
Parameters
structrequest*rq
request to map
structscatterlist*sglist
target scatterlist
Description
Map the integrity vectors in request into ascatterlist. The scatterlist must be big enough to hold allelements. I.e. sized using blk_rq_count_integrity_sg() orrq->nr_integrity_segments.
- intblk_trace_ioctl(structblock_device*bdev,unsignedcmd,char__user*arg)¶
handle the ioctls associated with tracing
Parameters
structblock_device*bdev
the block device
unsignedcmd
the ioctl cmd
char__user*arg
the argument data, if any
- voidblk_trace_shutdown(structrequest_queue*q)¶
stop and cleanup trace structures
Parameters
structrequest_queue*q
the request queue associated with the device
- voidblk_add_trace_rq(structrequest*rq,blk_status_terror,unsignedintnr_bytes,u32what,u64cgid)¶
Add a trace for a request oriented action
Parameters
structrequest*rq
the source request
blk_status_terror
return status to log
unsignedintnr_bytes
number of completed bytes
u32what
the action
u64cgid
the cgroup info
Description
Records an action against a request. Will log the bio offset + size.
- voidblk_add_trace_bio(structrequest_queue*q,structbio*bio,u32what,interror)¶
Add a trace for a bio oriented action
Parameters
structrequest_queue*q
queue the io is for
structbio*bio
the source bio
u32what
the action
interror
error, if any
Description
Records an action against a bio. Will log the bio offset + size.
- voidblk_add_trace_bio_remap(void*ignore,structbio*bio,dev_tdev,sector_tfrom)¶
Add a trace for a bio-remap operation
Parameters
void*ignore
trace callback data parameter (not used)
structbio*bio
the source bio
dev_tdev
source device
sector_tfrom
source sector
Description
Called after a bio is remapped to a different device and/or sector.
- voidblk_add_trace_rq_remap(void*ignore,structrequest*rq,dev_tdev,sector_tfrom)¶
Add a trace for a request-remap operation
Parameters
void*ignore
trace callback data parameter (not used)
structrequest*rq
the source request
dev_tdev
target device
sector_tfrom
source sector
Description
Device mapper remaps request to other devices.Add a trace for that action.
Parameters
structdevice*dev
the device representing this disk
Description
This function releases all allocated resources of the gendisk.
Drivers which used __device_add_disk() have a gendisk with a request_queueassigned. Since the request_queue sits on top of the gendisk for thesedrivers we also callblk_put_queue()
for them, and we expect therequest_queue refcount to reach 0 at this point, and so the request_queuewill also be freed prior to the disk.
Context
can sleep
- unsignedintbdev_count_inflight(structblock_device*part)¶
get the number of inflight IOs for a block device.
Parameters
structblock_device*part
the block device.
Description
Inflight here means started IO accounting, from bdev_start_io_acct() forbio-based block device, and from blk_account_io_start() for rq-based blockdevice.
- int__register_blkdev(unsignedintmajor,constchar*name,void(*probe)(dev_tdevt))¶
register a new block device
Parameters
unsignedintmajor
the requested major device number [1..BLKDEV_MAJOR_MAX-1]. Ifmajor = 0, try to allocate any unused major number.
constchar*name
the name of the new block device as a zero terminated string
void(*probe)(dev_tdevt)
pre-devtmpfs / pre-udev callback used to create disks when theirpre-created device node is accessed. When a probe call usesadd_disk() and it fails the driver must cleanup resources. Thisinterface may soon be removed.
Description
Thename must be unique within the system.
The return value depends on themajor input parameter:
if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1]then the function returns zero on success, or a negative error code
if any unused major number was requested withmajor = 0 parameterthen the return value is the allocated major number in range[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise
SeeLinux allocated devices (4.x+ version) for the list of allocatedmajor numbers.
Use register_blkdev instead for any new code.
- intadd_disk_fwnode(structdevice*parent,structgendisk*disk,conststructattribute_group**groups,structfwnode_handle*fwnode)¶
add disk information to kernel list with fwnode
Parameters
structdevice*parent
parent device for the disk
structgendisk*disk
per-device partitioning information
conststructattribute_group**groups
Additional per-device sysfs groups
structfwnode_handle*fwnode
attached disk fwnode
Description
This function registers the partitioning information indiskwith the kernel. Also attach a fwnode to the disk device.
- intdevice_add_disk(structdevice*parent,structgendisk*disk,conststructattribute_group**groups)¶
add disk information to kernel list
Parameters
structdevice*parent
parent device for the disk
structgendisk*disk
per-device partitioning information
conststructattribute_group**groups
Additional per-device sysfs groups
Description
This function registers the partitioning information indiskwith the kernel.
- voidblk_mark_disk_dead(structgendisk*disk)¶
mark a disk as dead
Parameters
structgendisk*disk
disk to mark as dead
Description
Mark as disk as dead (e.g. surprise removed) and don’t accept any new I/Oto this disk.
- voiddel_gendisk(structgendisk*disk)¶
remove the gendisk
Parameters
structgendisk*disk
the struct gendisk to remove
Description
Removes the gendisk and all its associated resources. This deletes thepartitions associated with the gendisk, and unregisters the associatedrequest_queue.
This is the counter to the respective __device_add_disk() call.
The final removal of the struct gendisk happens when its refcount reaches 0withput_disk()
, which should be called afterdel_gendisk()
, if__device_add_disk() was used.
Drivers exist which depend on the release of the gendisk to be synchronous,it should not be deferred.
Context
can sleep
- voidinvalidate_disk(structgendisk*disk)¶
invalidate the disk
Parameters
structgendisk*disk
the struct gendisk to invalidate
Description
A helper to invalidates the disk. It will clean the disk’s associatedbuffer/page caches and reset its internal states so that the diskcan be reused by the drivers.
Context
can sleep
- voidput_disk(structgendisk*disk)¶
decrements the gendisk refcount
Parameters
structgendisk*disk
the struct gendisk to decrement the refcount for
Description
This decrements the refcount for the struct gendisk. When this reaches 0we’ll havedisk_release()
called.
Note
for blk-mq disk put_disk must be called before freeing the tag_setwhen handling probe errors (that is before add_disk() is called).
Context
Any context, but the last reference must not be dropped fromatomic context.
- voidset_disk_ro(structgendisk*disk,boolread_only)¶
set a gendisk read-only
Parameters
structgendisk*disk
gendisk to operate on
boolread_only
true
to set the disk read-only,false
set the disk read/write
Description
This function is used to indicate whether a given disk device should have itsread-only flag set.set_disk_ro()
is typically used by device drivers toindicate whether the underlying physical device is write-protected.
- intbdev_validate_blocksize(structblock_device*bdev,intblock_size)¶
check that this block size is acceptable
Parameters
structblock_device*bdev
blockdevice to check
intblock_size
block size to check
Description
For block device users that do not use buffer heads or the block devicepage cache, make sure that this block size can be used with the device.
Return
On success zero is returned, negative error code on failure.
- intbdev_freeze(structblock_device*bdev)¶
lock a filesystem and force it into a consistent state
Parameters
structblock_device*bdev
blockdevice to lock
Description
If a superblock is found on this device, we take the s_umount semaphoreon it to make sure nobody unmounts until the snapshot creation is done.The reference counter (bd_fsfreeze_count) guarantees that only the lastunfreeze process can unfreeze the frozen filesystem actually when multiplefreeze requests arrive simultaneously. It counts up inbdev_freeze()
andcount down inbdev_thaw()
. When it becomes 0, thaw_bdev() will unfreezeactually.
Return
On success zero is returned, negative error code on failure.
- intbdev_thaw(structblock_device*bdev)¶
unlock filesystem
Parameters
structblock_device*bdev
blockdevice to unlock
Description
Unlocks the filesystem and marks it writeable again afterbdev_freeze()
.
Return
On success zero is returned, negative error code on failure.
- intbd_prepare_to_claim(structblock_device*bdev,void*holder,conststructblk_holder_ops*hops)¶
claim a block device
Parameters
structblock_device*bdev
block device of interest
void*holder
holder trying to claimbdev
conststructblk_holder_ops*hops
holder ops.
Description
Claimbdev. This function fails ifbdev is already claimed by anotherholder and waits if another claiming is in progress. return, the callerhas ownership of bd_claiming and bd_holder[s].
Return
0 ifbdev can be claimed, -EBUSY otherwise.
- voidbd_abort_claiming(structblock_device*bdev,void*holder)¶
abort claiming of a block device
Parameters
structblock_device*bdev
block device of interest
void*holder
holder that has claimedbdev
Description
Abort claiming of a block device when the exclusive open failed. This can bealso used when exclusive open is not actually desired and we just neededto block other exclusive openers for a while.
Parameters
structfile*bdev_file
open block device
Description
Yield claim on the block device and put the file. Ensure that theblock device can be reclaimed before the file is closed which is adeferred operation.
- intlookup_bdev(constchar*pathname,dev_t*dev)¶
Look up a struct block_device by name.
Parameters
constchar*pathname
Name of the block device in the filesystem.
dev_t*dev
Pointer to the block device’s dev_t, if found.
Description
Lookup the block device’s dev_t atpathname in the currentnamespace if possible and return it indev.
Context
May sleep.
Return
0 if succeeded, negative errno otherwise.
- voidbdev_mark_dead(structblock_device*bdev,boolsurprise)¶
mark a block device as dead
Parameters
structblock_device*bdev
block device to operate on
boolsurprise
indicate a surprise removal
Description
Tell the file system that this devices or media is dead. Ifsurprise is settotrue
the device or media is already gone, if not we are preparing for anorderly removal.
This calls into the file system, which then typicall syncs out all dirty dataand writes back inodes and then invalidates any cached data in the inodes onthe file system. In addition we also invalidate the block device mapping.
Char devices¶
- intregister_chrdev_region(dev_tfrom,unsignedcount,constchar*name)¶
register a range of device numbers
Parameters
dev_tfrom
the first in the desired range of device numbers; must includethe major number.
unsignedcount
the number of consecutive device numbers required
constchar*name
the name of the device or driver.
Description
Return value is zero on success, a negative error code on failure.
- intalloc_chrdev_region(dev_t*dev,unsignedbaseminor,unsignedcount,constchar*name)¶
register a range of char device numbers
Parameters
dev_t*dev
output parameter for first assigned number
unsignedbaseminor
first of the requested range of minor numbers
unsignedcount
the number of minor numbers required
constchar*name
the name of the associated device or driver
Description
Allocates a range of char device numbers. The major number will bechosen dynamically, and returned (along with the first minor number)indev. Returns zero or a negative error code.
- int__register_chrdev(unsignedintmajor,unsignedintbaseminor,unsignedintcount,constchar*name,conststructfile_operations*fops)¶
create and register a cdev occupying a range of minors
Parameters
unsignedintmajor
major device number or 0 for dynamic allocation
unsignedintbaseminor
first of the requested range of minor numbers
unsignedintcount
the number of minor numbers required
constchar*name
name of this range of devices
conststructfile_operations*fops
file operations associated with this devices
Description
Ifmajor == 0 this functions will dynamically allocate a major and returnits number.
Ifmajor > 0 this function will attempt to reserve a device with the givenmajor number and will return zero on success.
Returns a -ve errno on failure.
The name of this device has nothing to do with the name of the device in/dev. It only helps to keep track of the different owners of devices. Ifyour module name has only one type of devices it’s ok to use e.g. the nameof the module here.
- voidunregister_chrdev_region(dev_tfrom,unsignedcount)¶
unregister a range of device numbers
Parameters
dev_tfrom
the first in the range of numbers to unregister
unsignedcount
the number of device numbers to unregister
Description
This function will unregister a range ofcount device numbers,starting withfrom. The caller should normally be the one whoallocated those numbers in the first place...
- void__unregister_chrdev(unsignedintmajor,unsignedintbaseminor,unsignedintcount,constchar*name)¶
unregister and destroy a cdev
Parameters
unsignedintmajor
major device number
unsignedintbaseminor
first of the range of minor numbers
unsignedintcount
the number of minor numbers this cdev is occupying
constchar*name
name of this range of devices
Description
Unregister and destroy the cdev occupying the region described bymajor,baseminor andcount. This function undoes what__register_chrdev()
did.
- intcdev_add(structcdev*p,dev_tdev,unsignedcount)¶
add a char device to the system
Parameters
structcdev*p
the cdev structure for the device
dev_tdev
the first device number for which this device is responsible
unsignedcount
the number of consecutive minor numbers corresponding to thisdevice
Description
cdev_add()
adds the device represented byp to the system, making itlive immediately. A negative error code is returned on failure.
- voidcdev_set_parent(structcdev*p,structkobject*kobj)¶
set the parent kobject for a char device
Parameters
structcdev*p
the cdev structure
structkobject*kobj
the kobject to take a reference to
Description
cdev_set_parent()
sets a parent kobject which will be referencedappropriately so the parent is not freed before the cdev. Thisshould be called before cdev_add.
- intcdev_device_add(structcdev*cdev,structdevice*dev)¶
add a char device and it’s corresponding
structdevice
, linkink
Parameters
structcdev*cdev
the cdev structure
structdevice*dev
the device structure
Description
cdev_device_add()
adds the char device represented bycdev to the system,just as cdev_add does. It then addsdev to the system using device_addThe dev_t for the char device will be taken from thestructdevice
whichneeds to be initialized first. This helper function correctly takes areference to the parent device so the parent will not get released untilall references to the cdev are released.
This helper uses dev->devt for the device number. If it is not setit will not add the cdev and it will be equivalent to device_add.
This function should be used whenever the struct cdev and thestructdevice
are members of the same structure whose lifetime ismanaged by thestructdevice
.
NOTE
Callers must assume that userspace was able to open the cdev andcan call cdev fops callbacks at any time, even if this function fails.
Parameters
structcdev*cdev
the cdev structure
structdevice*dev
the device structure
Description
cdev_device_del()
is a helper function to call cdev_del and device_del.It should be used whenever cdev_device_add is used.
If dev->devt is not set it will not remove the cdev and will be equivalentto device_del.
NOTE
This guarantees that associated sysfs callbacks are not runningor runnable, however any cdevs already open will remain and their fopswill still be callable even after this function returns.
- voidcdev_del(structcdev*p)¶
remove a cdev from the system
Parameters
structcdev*p
the cdev structure to be removed
Description
cdev_del()
removesp from the system, possibly freeing the structureitself.
NOTE
This guarantees that cdev device will no longer be able to beopened, however any cdevs already open will remain and their fops willstill be callable even after cdev_del returns.
- structcdev*cdev_alloc(void)¶
allocate a cdev structure
Parameters
void
no arguments
Description
Allocates and returns a cdev structure, or NULL on failure.
Parameters
structcdev*cdev
the structure to initialize
conststructfile_operations*fops
the file_operations for this device
Description
Initializescdev, rememberingfops, making it ready to add to thesystem withcdev_add()
.
Clock Framework¶
The clock framework defines programming interfaces to support softwaremanagement of the system clock tree. This framework is widely used withSystem-On-Chip (SOC) platforms to support power management and variousdevices which may need custom clock rates. Note that these “clocks”don’t relate to timekeeping or real time clocks (RTCs), each of whichhave separate frameworks. Thesestructclk
instances may be used to manage for example a 96 MHz signal that is usedto shift bits into and out of peripherals or busses, or otherwisetrigger synchronous state machine transitions in system hardware.
Power management is supported by explicit software clock gating: unusedclocks are disabled, so the system doesn’t waste power changing thestate of transistors that aren’t in active use. On some systems this maybe backed by hardware clock gating, where clocks are gated without beingdisabled in software. Sections of chips that are powered but not clockedmay be able to retain their last state. This low power state is oftencalled aretention mode. This mode still incurs leakage currents,especially with finer circuit geometries, but for CMOS circuits power ismostly used by clocked state changes.
Power-aware drivers only enable their clocks when the device they manageis in active use. Also, system sleep states often differ according towhich clock domains are active: while a “standby” state may allow wakeupfrom several active domains, a “mem” (suspend-to-RAM) state may requirea more wholesale shutdown of clocks derived from higher speed PLLs andoscillators, limiting the number of possible wakeup event sources. Adriver’s suspend method may need to be aware of system-specific clockconstraints on the target sleep state.
Some platforms support programmable clock generators. These can be usedby external chips of various kinds, such as other CPUs, multimediacodecs, and devices with strict requirements for interface clocking.
- structclk_notifier¶
associate a clk with a notifier
Definition:
struct clk_notifier { struct clk *clk; struct srcu_notifier_head notifier_head; struct list_head node;};
Members
clk
struct clk * to associate the notifier with
notifier_head
a blocking_notifier_head for this clk
node
linked list pointers
Description
A list ofstructclk_notifier
is maintained by the notifier code.An entry is created whenever code registers the first notifier on aparticularclk. Future notifiers on thatclk are added to thenotifier_head.
- structclk_notifier_data¶
rate data to pass to the notifier callback
Definition:
struct clk_notifier_data { struct clk *clk; unsigned long old_rate; unsigned long new_rate;};
Members
clk
struct clk * being changed
old_rate
previous rate of this clk
new_rate
new rate of this clk
Description
For a pre-notifier, old_rate is the clk’s rate before this ratechange, and new_rate is what the rate will be in the future. For apost-notifier, old_rate and new_rate are both set to the clk’scurrent rate (this was done to optimize the implementation).
- structclk_bulk_data¶
Data used for bulk clk operations.
Definition:
struct clk_bulk_data { const char *id; struct clk *clk;};
Members
id
clock consumer ID
clk
struct clk * to store the associated clock
Description
The CLK APIs provide a series of clk_bulk_() API calls asa convenience to consumers which require multiple clks. Thisstructure is used to manage data for these calls.
- intclk_notifier_register(structclk*clk,structnotifier_block*nb)¶
register a clock rate-change notifier callback
Parameters
structclk*clk
clock whose rate we are interested in
structnotifier_block*nb
notifier block with callback function pointer
Description
ProTip: debugging across notifier chains can be frustrating. Make sure thatyour notifier callback function prints a nice big warning in case offailure.
- intclk_notifier_unregister(structclk*clk,structnotifier_block*nb)¶
unregister a clock rate-change notifier callback
Parameters
structclk*clk
clock whose rate we are no longer interested in
structnotifier_block*nb
notifier block which will be unregistered
- intdevm_clk_notifier_register(structdevice*dev,structclk*clk,structnotifier_block*nb)¶
register a managed rate-change notifier callback
Parameters
structdevice*dev
device for clock “consumer”
structclk*clk
clock whose rate we are interested in
structnotifier_block*nb
notifier block with callback function pointer
Description
Returns 0 on success, -EERROR otherwise
- longclk_get_accuracy(structclk*clk)¶
obtain the clock accuracy in ppb (parts per billion) for a clock source.
Parameters
structclk*clk
clock source
Description
This gets the clock source accuracy expressed in ppb.A perfect clock returns 0.
Parameters
structclk*clk
clock signal source
intdegrees
number of degrees the signal is shifted
Description
Shifts the phase of a clock signal by the specified degrees. Returns 0 onsuccess, -EERROR otherwise.
Parameters
structclk*clk
clock signal source
Description
Returns the phase shift of a clock node in degrees, otherwise returns-EERROR.
- intclk_set_duty_cycle(structclk*clk,unsignedintnum,unsignedintden)¶
adjust the duty cycle ratio of a clock signal
Parameters
structclk*clk
clock signal source
unsignedintnum
numerator of the duty cycle ratio to be applied
unsignedintden
denominator of the duty cycle ratio to be applied
Description
Adjust the duty cycle of a clock signal by the specified ratio. Returns 0 onsuccess, -EERROR otherwise.
- intclk_get_scaled_duty_cycle(structclk*clk,unsignedintscale)¶
return the duty cycle ratio of a clock signal
Parameters
structclk*clk
clock signal source
unsignedintscale
scaling factor to be applied to represent the ratio as an integer
Description
Returns the duty cycle ratio multiplied by the scale provided, otherwisereturns -EERROR.
- boolclk_is_match(conststructclk*p,conststructclk*q)¶
check if two clk’s point to the same hardware clock
Parameters
conststructclk*p
clk compared against q
conststructclk*q
clk compared against p
Description
Returns true if the two struct clk pointers both point to the same hardwareclock node. Put differently, returns true ifp andqshare the samestructclk_core
object.
Returns false otherwise. Note that two NULL clks are treated as matching.
Parameters
structclk*clk
clock source
Description
This function allows drivers to get exclusive control over the rate of aprovider. It prevents any other consumer to execute, even indirectly,opereation which could alter the rate of the provider or cause glitches
If exlusivity is claimed more than once on clock, even by the same driver,the rate effectively gets locked as exclusivity can’t be preempted.
Must not be called from within atomic context.
Returns success (0) or negative errno.
- intdevm_clk_rate_exclusive_get(structdevice*dev,structclk*clk)¶
devm variant of clk_rate_exclusive_get
Parameters
structdevice*dev
device the exclusivity is bound to
structclk*clk
clock source
Description
Callsclk_rate_exclusive_get()
onclk and registers a devm cleanup handlerondev to callclk_rate_exclusive_put()
.
Must not be called from within atomic context.
Parameters
structclk*clk
clock source
Description
This function allows drivers to release the exclusivity it previously gotfromclk_rate_exclusive_get()
The caller must balance the number ofclk_rate_exclusive_get()
andclk_rate_exclusive_put()
calls.
Must not be called from within atomic context.
Parameters
structclk*clk
clock source
Description
This prepares the clock source for use.
Must not be called from within atomic context.
Parameters
structclk*clk
clock source
Description
Returns true ifclk_prepare()
implicitly enables the clock, effectivelymakingclk_enable()
/clk_disable()
no-ops, false otherwise.
This is of interest mainly to the power management code where actuallydisabling the clock also requires unpreparing it to have any materialeffect.
Regardless of the value returned here, the caller must always invokeclk_enable()
or clk_prepare_enable() and counterparts for usage countsto be right.
Parameters
structclk*clk
clock source
Description
This undoes a previously prepared clock. The caller must balancethe number of prepare and unprepare calls.
Must not be called from within atomic context.
Parameters
structdevice*dev
device for clock “consumer”
constchar*id
clock consumer ID
Description
Returns a struct clk corresponding to the clock producer, orvalidIS_ERR()
condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)
Drivers must assume that the clock source is not enabled.
clk_get should not be called from within interrupt context.
- intclk_bulk_get(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶
lookup and obtain a number of references to clock producer.
Parameters
structdevice*dev
device for clock “consumer”
intnum_clks
the number of clk_bulk_data
structclk_bulk_data*clks
the clk_bulk_data table of consumer
Description
This helper function allows drivers to get several clk consumers in oneoperation. If any of the clk cannot be acquired then any clksthat were obtained will be freed before returning to the caller.
Returns 0 if all clocks specified in clk_bulk_data table are obtainedsuccessfully, or validIS_ERR()
condition containing errno.The implementation usesdev andclk_bulk_data.id to determine theclock consumer, and thereby the clock producer.The clock returned is stored in eachclk_bulk_data.clk field.
Drivers must assume that the clock source is not enabled.
clk_bulk_get should not be called from within interrupt context.
- intclk_bulk_get_all(structdevice*dev,structclk_bulk_data**clks)¶
lookup and obtain all available references to clock producer.
Parameters
structdevice*dev
device for clock “consumer”
structclk_bulk_data**clks
pointer to the clk_bulk_data table of consumer
Description
This helper function allows drivers to get all clk consumers in oneoperation. If any of the clk cannot be acquired then any clksthat were obtained will be freed before returning to the caller.
Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.
Drivers must assume that the clock source is not enabled.
clk_bulk_get should not be called from within interrupt context.
- intclk_bulk_get_optional(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶
lookup and obtain a number of references to clock producer
Parameters
structdevice*dev
device for clock “consumer”
intnum_clks
the number of clk_bulk_data
structclk_bulk_data*clks
the clk_bulk_data table of consumer
Description
Behaves the same asclk_bulk_get()
except where there is no clock producer.In this case, instead of returning -ENOENT, the function returns 0 andNULL for a clk for which a clock producer could not be determined.
- intdevm_clk_bulk_get(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶
managed get multiple clk consumers
Parameters
structdevice*dev
device for clock “consumer”
intnum_clks
the number of clk_bulk_data
structclk_bulk_data*clks
the clk_bulk_data table of consumer
Description
Return 0 on success, an errno on failure.
This helper function allows drivers to get several clkconsumers in one operation with management, the clks willautomatically be freed when the device is unbound.
- intdevm_clk_bulk_get_optional(structdevice*dev,intnum_clks,structclk_bulk_data*clks)¶
managed get multiple optional consumer clocks
Parameters
structdevice*dev
device for clock “consumer”
intnum_clks
the number of clk_bulk_data
structclk_bulk_data*clks
pointer to the clk_bulk_data table of consumer
Description
Behaves the same asdevm_clk_bulk_get()
except where there is no clockproducer. In this case, instead of returning -ENOENT, the function returnsNULL for given clk. It is assumed all clocks in clk_bulk_data are optional.
Returns 0 if all clocks specified in clk_bulk_data table are obtainedsuccessfully or for any clk there was no clk provider available, otherwisereturns validIS_ERR()
condition containing errno.The implementation usesdev andclk_bulk_data.id to determine theclock consumer, and thereby the clock producer.The clock returned is stored in eachclk_bulk_data.clk field.
Drivers must assume that the clock source is not enabled.
clk_bulk_get should not be called from within interrupt context.
- intdevm_clk_bulk_get_all(structdevice*dev,structclk_bulk_data**clks)¶
managed get multiple clk consumers
Parameters
structdevice*dev
device for clock “consumer”
structclk_bulk_data**clks
pointer to the clk_bulk_data table of consumer
Description
Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.
This helper function allows drivers to get several clkconsumers in one operation with management, the clks willautomatically be freed when the device is unbound.
- intdevm_clk_bulk_get_all_enabled(structdevice*dev,structclk_bulk_data**clks)¶
Get and enable all clocks of the consumer (managed)
Parameters
structdevice*dev
device for clock “consumer”
structclk_bulk_data**clks
pointer to the clk_bulk_data table of consumer
Description
Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.
This helper function allows drivers to get all clocks of theconsumer and enables them in one operation with management.The clks will automatically be disabled and freed when the deviceis unbound.
- structclk*devm_clk_get(structdevice*dev,constchar*id)¶
lookup and obtain a managed reference to a clock producer.
Parameters
structdevice*dev
device for clock “consumer”
constchar*id
clock consumer ID
Context
May sleep.
Return
a struct clk corresponding to the clock producer, orvalidIS_ERR()
condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)
Description
Drivers must assume that the clock source is neither prepared norenabled.
The clock will automatically be freed when the device is unboundfrom the bus.
Parameters
structdevice*dev
device for clock “consumer”
constchar*id
clock consumer ID
Context
May sleep.
Return
a struct clk corresponding to the clock producer, orvalidIS_ERR()
condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)
Description
The returned clk (if valid) is prepared. Drivers must however assumethat the clock is not enabled.
The clock will automatically be unprepared and freed when the deviceis unbound from the bus.
- structclk*devm_clk_get_enabled(structdevice*dev,constchar*id)¶
devm_clk_get()
+ clk_prepare_enable()
Parameters
structdevice*dev
device for clock “consumer”
constchar*id
clock consumer ID
Context
May sleep.
Return
a struct clk corresponding to the clock producer, orvalidIS_ERR()
condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)
Description
The returned clk (if valid) is prepared and enabled.
The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.
- structclk*devm_clk_get_optional(structdevice*dev,constchar*id)¶
lookup and obtain a managed reference to an optional clock producer.
Parameters
structdevice*dev
device for clock “consumer”
constchar*id
clock consumer ID
Context
May sleep.
Return
a struct clk corresponding to the clock producer, orvalidIS_ERR()
condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get()
.
Description
Drivers must assume that the clock source is neither prepared norenabled.
The clock will automatically be freed when the device is unboundfrom the bus.
Parameters
structdevice*dev
device for clock “consumer”
constchar*id
clock consumer ID
Context
May sleep.
Return
a struct clk corresponding to the clock producer, orvalidIS_ERR()
condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_prepared()
.
Description
The returned clk (if valid) is prepared. Drivers must howeverassume that the clock is not enabled.
The clock will automatically be unprepared and freed when thedevice is unbound from the bus.
- structclk*devm_clk_get_optional_enabled(structdevice*dev,constchar*id)¶
devm_clk_get_optional()
+ clk_prepare_enable()
Parameters
structdevice*dev
device for clock “consumer”
constchar*id
clock consumer ID
Context
May sleep.
Return
a struct clk corresponding to the clock producer, orvalidIS_ERR()
condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_enabled()
.
Description
The returned clk (if valid) is prepared and enabled.
The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.
- structclk*devm_clk_get_optional_enabled_with_rate(structdevice*dev,constchar*id,unsignedlongrate)¶
devm_clk_get_optional()
+clk_set_rate()
+ clk_prepare_enable()
Parameters
structdevice*dev
device for clock “consumer”
constchar*id
clock consumer ID
unsignedlongrate
new clock rate
Context
May sleep.
Return
a struct clk corresponding to the clock producer, orvalidIS_ERR()
condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_enabled()
.
Description
The returned clk (if valid) is prepared and enabled and rate was set.
The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.
- structclk*devm_get_clk_from_child(structdevice*dev,structdevice_node*np,constchar*con_id)¶
lookup and obtain a managed reference to a clock producer from child node.
Parameters
structdevice*dev
device for clock “consumer”
structdevice_node*np
pointer to clock consumer node
constchar*con_id
clock consumer ID
Description
This function parses the clocks, and uses them to look up thestruct clk from the registered list of clock providers by usingnp andcon_id
The clock will automatically be freed when the device is unboundfrom the bus.
Parameters
structclk*clk
clock source
Description
If the clock can not be enabled/disabled, this should return success.
May be called from atomic contexts.
Returns success (0) or negative errno.
- intclk_bulk_enable(intnum_clks,conststructclk_bulk_data*clks)¶
inform the system when the set of clks should be running.
Parameters
intnum_clks
the number of clk_bulk_data
conststructclk_bulk_data*clks
the clk_bulk_data table of consumer
Description
May be called from atomic contexts.
Returns success (0) or negative errno.
Parameters
structclk*clk
clock source
Description
Inform the system that a clock source is no longer required bya driver and may be shut down.
May be called from atomic contexts.
Implementation detail: if the clock source is shared betweenmultiple drivers,clk_enable()
calls must be balanced by thesame number ofclk_disable()
calls for the clock source to bedisabled.
- voidclk_bulk_disable(intnum_clks,conststructclk_bulk_data*clks)¶
inform the system when the set of clks is no longer required.
Parameters
intnum_clks
the number of clk_bulk_data
conststructclk_bulk_data*clks
the clk_bulk_data table of consumer
Description
Inform the system that a set of clks is no longer required bya driver and may be shut down.
May be called from atomic contexts.
Implementation detail: if the set of clks is shared betweenmultiple drivers,clk_bulk_enable()
calls must be balanced by thesame number ofclk_bulk_disable()
calls for the clock source to bedisabled.
- unsignedlongclk_get_rate(structclk*clk)¶
obtain the current clock rate (in Hz) for a clock source. This is only valid once the clock source has been enabled.
Parameters
structclk*clk
clock source
Parameters
structclk*clk
clock source
Note
drivers must ensure that all clk_enable calls made on thisclock source are balanced by clk_disable calls prior to callingthis function.
Description
clk_put should not be called from within interrupt context.
- voidclk_bulk_put(intnum_clks,structclk_bulk_data*clks)¶
“free” the clock source
Parameters
intnum_clks
the number of clk_bulk_data
structclk_bulk_data*clks
the clk_bulk_data table of consumer
Note
drivers must ensure that all clk_bulk_enable calls made on thisclock source are balanced by clk_bulk_disable calls prior to callingthis function.
Description
clk_bulk_put should not be called from within interrupt context.
- voidclk_bulk_put_all(intnum_clks,structclk_bulk_data*clks)¶
“free” all the clock source
Parameters
intnum_clks
the number of clk_bulk_data
structclk_bulk_data*clks
the clk_bulk_data table of consumer
Note
drivers must ensure that all clk_bulk_enable calls made on thisclock source are balanced by clk_bulk_disable calls prior to callingthis function.
Description
clk_bulk_put_all should not be called from within interrupt context.
Parameters
structdevice*dev
device used to acquire the clock
structclk*clk
clock source acquired with
devm_clk_get()
Note
drivers must ensure that all clk_enable calls made on thisclock source are balanced by clk_disable calls prior to callingthis function.
Description
clk_put should not be called from within interrupt context.
- longclk_round_rate(structclk*clk,unsignedlongrate)¶
adjust a rate to the exact rate a clock can provide
Parameters
structclk*clk
clock source
unsignedlongrate
desired clock rate in Hz
Description
This answers the question “if I were to passrate toclk_set_rate()
,what clock rate would I end up with?” without changing the hardwarein any way. In other words:
rate = clk_round_rate(clk, r);
and:
clk_set_rate(clk, r);rate = clk_get_rate(clk);
are equivalent except the former does not modify the clock hardwarein any way.
Returns rounded clock rate in Hz, or negative errno.
Parameters
structclk*clk
clock source
unsignedlongrate
desired clock rate in Hz
Description
Updating the rate starts at the top-most affected clock and thenwalks the tree down to the bottom-most clock that needs updating.
Returns success (0) or negative errno.
- intclk_set_rate_exclusive(structclk*clk,unsignedlongrate)¶
set the clock rate and claim exclusivity over clock source
Parameters
structclk*clk
clock source
unsignedlongrate
desired clock rate in Hz
Description
This helper function allows drivers to atomically set the rate of a producerand claim exclusivity over the rate control of the producer.
It is essentially a combination ofclk_set_rate()
andclk_rate_exclusite_get(). Caller must balance this call with a call toclk_rate_exclusive_put()
Returns success (0) or negative errno.
- boolclk_has_parent(conststructclk*clk,conststructclk*parent)¶
check if a clock is a possible parent for another
Parameters
conststructclk*clk
clock source
conststructclk*parent
parent clock source
Description
This function can be used in drivers that need to check that a clock can bethe parent of another without actually changing the parent.
Returns true ifparent is a possible parent forclk, false otherwise.
- intclk_set_rate_range(structclk*clk,unsignedlongmin,unsignedlongmax)¶
set a rate range for a clock source
Parameters
structclk*clk
clock source
unsignedlongmin
desired minimum clock rate in Hz, inclusive
unsignedlongmax
desired maximum clock rate in Hz, inclusive
Description
Returns success (0) or negative errno.
Parameters
structclk*clk
clock source
unsignedlongrate
desired minimum clock rate in Hz, inclusive
Description
Returns success (0) or negative errno.
Parameters
structclk*clk
clock source
unsignedlongrate
desired maximum clock rate in Hz, inclusive
Description
Returns success (0) or negative errno.
Parameters
structclk*clk
clock source
structclk*parent
parent clock source
Description
Returns success (0) or negative errno.
Parameters
structclk*clk
clock source
Description
Returns struct clk corresponding to parent clock source, orvalidIS_ERR()
condition containing errno.
- structclk*clk_get_sys(constchar*dev_id,constchar*con_id)¶
get a clock based upon the device name
Parameters
constchar*dev_id
device name
constchar*con_id
connection ID
Description
Returns a struct clk corresponding to the clock producer, orvalidIS_ERR()
condition containing errno. The implementationusesdev_id andcon_id to determine the clock consumer, andthereby the clock producer. In contrast toclk_get()
this functiontakes the device name instead of the device itself for identification.
Drivers must assume that the clock source is not enabled.
clk_get_sys should not be called from within interrupt context.
- intclk_save_context(void)¶
save clock context for poweroff
Parameters
void
no arguments
Description
Saves the context of the clock register for powerstates in which thecontents of the registers will be lost. Occurs deep within the suspendcode so locking is not necessary.
- voidclk_restore_context(void)¶
restore clock context after poweroff
Parameters
void
no arguments
Description
This occurs with all clocks enabled. Occurs deep within the resume codeso locking is not necessary.
Parameters
structclk*clk
clock source
Description
Returns success (0) or negative errno.
- structclk*clk_get_optional(structdevice*dev,constchar*id)¶
lookup and obtain a reference to an optional clock producer.
Parameters
structdevice*dev
device for clock “consumer”
constchar*id
clock consumer ID
Description
Behaves the same asclk_get()
except where there is no clock producer. Inthis case, instead of returning -ENOENT, the function returns NULL.
Synchronization Primitives¶
Read-Copy Update (RCU)¶
- boolsame_state_synchronize_rcu(unsignedlongoldstate1,unsignedlongoldstate2)¶
Are two old-state values identical?
Parameters
unsignedlongoldstate1
First old-state value.
unsignedlongoldstate2
Second old-state value.
Description
The two old-state values must have been obtained from eitherget_state_synchronize_rcu()
,start_poll_synchronize_rcu()
, orget_completed_synchronize_rcu()
. Returnstrue if the two values areidentical andfalse otherwise. This allows structures whose lifetimesare tracked by old-state values to push these values to a list header,allowing those structures to be slightly smaller.
- boolrcu_trace_implies_rcu_gp(void)¶
does an RCU Tasks Trace grace period imply an RCU grace period?
Parameters
void
no arguments
Description
As an accident of implementation, an RCU Tasks Trace grace period alsoacts as an RCU grace period. However, this could change at any time.Code relying on this accident must call this function to verify thatthis accident is still happening.
You have been warned!
- cond_resched_tasks_rcu_qs¶
cond_resched_tasks_rcu_qs()
Report potential quiescent states to RCU
Description
This macro resembles cond_resched(), except that it is defined toreport potential quiescent states to RCU-tasks even if the cond_resched()machinery were to be shut off, as some advocate for PREEMPTION kernels.
- rcu_softirq_qs_periodic¶
rcu_softirq_qs_periodic(old_ts)
Report RCU and RCU-Tasks quiescent states
Parameters
old_ts
jiffies at start of processing.
Description
This helper is for long-running softirq handlers, such as NAPI threads innetworking. The caller should initialize the variable passed in asold_tsat the beginning of the softirq handler. When invoked frequently, this macrowill invokercu_softirq_qs()
every 100 milliseconds thereafter, which willprovide both RCU and RCU-Tasks quiescent states. Note that this macromodifies its old_ts argument.
Because regions of code that have disabled softirq act as RCU read-sidecritical sections, this macro should be invoked with softirq (andpreemption) enabled.
The macro is not needed when CONFIG_PREEMPT_RT is defined. RT kernels wouldhave more chance to invoke schedule() calls and provide necessary quiescentstates. As a contrast, calling cond_resched() only won’t achieve the sameeffect because cond_resched() does not provide RCU-Tasks quiescent states.
- RCU_LOCKDEP_WARN¶
RCU_LOCKDEP_WARN(c,s)
emit lockdep splat if specified condition is met
Parameters
c
condition to check
s
informative message
Description
This checks debug_lockdep_rcu_enabled() before checking (c) toprevent early boot splats due to lockdep not yet being initialized,and rechecks it after checking (c) to prevent false-positive splatsdue to races with lockdep being disabled. Seecommit 3066820034b5dd(“rcu: RejectRCU_LOCKDEP_WARN()
false positives”) for more detail.
- lockdep_assert_in_rcu_read_lock¶
lockdep_assert_in_rcu_read_lock()
WARN if not protected by
rcu_read_lock()
Description
Splats if lockdep is enabled and there is no
rcu_read_lock()
in effect.
- lockdep_assert_in_rcu_read_lock_bh¶
lockdep_assert_in_rcu_read_lock_bh()
WARN if not protected by
rcu_read_lock_bh()
Description
Splats if lockdep is enabled and there is no
rcu_read_lock_bh()
in effect.Note that local_bh_disable() and friends do not suffice here, instead anactualrcu_read_lock_bh()
is required.
- lockdep_assert_in_rcu_read_lock_sched¶
lockdep_assert_in_rcu_read_lock_sched()
WARN if not protected by
rcu_read_lock_sched()
Description
Splats if lockdep is enabled and there is no
rcu_read_lock_sched()
in effect. Note that preempt_disable() and friends do not suffice here,instead an actualrcu_read_lock_sched()
is required.
- lockdep_assert_in_rcu_reader¶
lockdep_assert_in_rcu_reader()
WARN if not within some type of RCU reader
Description
Splats if lockdep is enabled and there is no RCU reader of anytype in effect. Note that regions of code protected by things likepreempt_disable, local_bh_disable(), and local_irq_disable() all qualifyas RCU readers.
Note that this will never trigger in PREEMPT_NONE or PREEMPT_VOLUNTARYkernels that are not also built with PREEMPT_COUNT. But if you havelockdep enabled, you might as well also enable PREEMPT_COUNT.
- unrcu_pointer¶
unrcu_pointer(p)
mark a pointer as not being RCU protected
Parameters
p
pointer needing to lose its __rcu property
Description
Convertsp from an __rcu pointer to a __kernel pointer.This allows an __rcu pointer to be used with xchg() and friends.
- RCU_INITIALIZER¶
RCU_INITIALIZER(v)
statically initialize an RCU-protected global variable
Parameters
v
The value to statically initialize with.
- rcu_assign_pointer¶
rcu_assign_pointer(p,v)
assign to RCU-protected pointer
Parameters
p
pointer to assign to
v
value to assign (publish)
Description
Assigns the specified value to the specified RCU-protectedpointer, ensuring that any concurrent RCU readers will seeany prior initialization.
Inserts memory barriers on architectures that require them(which is most of them), and also prevents the compiler fromreordering the code that initializes the structure after the pointerassignment. More importantly, this call documents which pointerswill be dereferenced by RCU read-side code.
In some special cases, you may useRCU_INIT_POINTER()
insteadofrcu_assign_pointer()
.RCU_INIT_POINTER()
is a bit faster dueto the fact that it does not constrain either the CPU or the compiler.That said, usingRCU_INIT_POINTER()
when you should have usedrcu_assign_pointer()
is a very bad thing that results inimpossible-to-diagnose memory corruption. So please be careful.See theRCU_INIT_POINTER()
comment header for details.
Note thatrcu_assign_pointer()
evaluates each of its arguments onlyonce, appearances notwithstanding. One of the “extra” evaluationsis in typeof() and the other visible only to sparse (__CHECKER__),neither of which actually execute the argument. As with most cppmacros, this execute-arguments-only-once property is important, soplease be careful when making changes torcu_assign_pointer()
and theother macros that it invokes.
- rcu_replace_pointer¶
rcu_replace_pointer(rcu_ptr,ptr,c)
replace an RCU pointer, returning its old value
Parameters
rcu_ptr
RCU pointer, whose old value is returned
ptr
regular pointer
c
the lockdep conditions under which the dereference will take place
Description
Perform a replacement, wherercu_ptr is an RCU-annotatedpointer andc is the lockdep argument that is passed to thercu_dereference_protected()
call used to read that pointer. The oldvalue ofrcu_ptr is returned, andrcu_ptr is set toptr.
- rcu_access_pointer¶
rcu_access_pointer(p)
fetch RCU pointer with no dereferencing
Parameters
p
The pointer to read
Description
Return the value of the specified RCU-protected pointer, but omit thelockdep checks for being in an RCU read-side critical section. This isuseful when the value of this pointer is accessed, but the pointer isnot dereferenced, for example, when testing an RCU-protected pointeragainst NULL. Althoughrcu_access_pointer()
may also be used in caseswhere update-side locks prevent the value of the pointer from changing,you should instead usercu_dereference_protected()
for this use case.Within an RCU read-side critical section, there is little reason tousercu_access_pointer()
.
It is usually best to test thercu_access_pointer()
return valuedirectly in order to avoid accidental dereferences being introducedby later inattentive changes. In other words, assigning thercu_access_pointer()
return value to a local variable results in anaccident waiting to happen.
It is also permissible to usercu_access_pointer()
when read-sideaccess to the pointer was removed at least one grace period ago, as isthe case in the context of the RCU callback that is freeing up the data,or after asynchronize_rcu()
returns. This can be useful when tearingdown multi-linked structures after a grace period has elapsed. However,rcu_dereference_protected()
is normally preferred for this use case.
- rcu_dereference_check¶
rcu_dereference_check(p,c)
rcu_dereference with debug checking
Parameters
p
The pointer to read, prior to dereferencing
c
The conditions under which the dereference will take place
Description
Do anrcu_dereference()
, but check that the conditions under which thedereference will take place are correct. Typically the conditionsindicate the various locking conditions that should be held at thatpoint. The check should return true if the conditions are satisfied.An implicit check for being in an RCU read-side critical section(rcu_read_lock()
) is included.
For example:
bar = rcu_dereference_check(foo->bar, lockdep_is_held(
foo->lock
));
could be used to indicate to lockdep that foo->bar may only be dereferencedif eitherrcu_read_lock()
is held, or that the lock required to replacethe bar struct at foo->bar is held.
Note that the list of conditions may also include indications of when a lockneed not be held, for example during initialisation or destruction of thetarget struct:
- bar = rcu_dereference_check(foo->bar, lockdep_is_held(
foo->lock
) ||atomic_read(
foo->usage
) == 0);
Inserts memory barriers on architectures that require them(currently only the Alpha), prevents the compiler from refetching(and from merging fetches), and, more importantly, documents exactlywhich pointers are protected by RCU and checks that the pointer isannotated as __rcu.
- rcu_dereference_bh_check¶
rcu_dereference_bh_check(p,c)
rcu_dereference_bh with debug checking
Parameters
p
The pointer to read, prior to dereferencing
c
The conditions under which the dereference will take place
Description
This is the RCU-bh counterpart torcu_dereference_check()
. However,please note that starting in v5.0 kernels, vanilla RCU grace periodswait for local_bh_disable() regions of code in addition to regions ofcode demarked byrcu_read_lock()
andrcu_read_unlock()
. This meansthatsynchronize_rcu()
, call_rcu, and friends all take not onlyrcu_read_lock()
but alsorcu_read_lock_bh()
into account.
- rcu_dereference_sched_check¶
rcu_dereference_sched_check(p,c)
rcu_dereference_sched with debug checking
Parameters
p
The pointer to read, prior to dereferencing
c
The conditions under which the dereference will take place
Description
This is the RCU-sched counterpart torcu_dereference_check()
.However, please note that starting in v5.0 kernels, vanilla RCU graceperiods wait for preempt_disable() regions of code in addition toregions of code demarked byrcu_read_lock()
andrcu_read_unlock()
.This means thatsynchronize_rcu()
, call_rcu, and friends all take notonlyrcu_read_lock()
but alsorcu_read_lock_sched()
into account.
- rcu_dereference_protected¶
rcu_dereference_protected(p,c)
fetch RCU pointer when updates prevented
Parameters
p
The pointer to read, prior to dereferencing
c
The conditions under which the dereference will take place
Description
Return the value of the specified RCU-protected pointer, but omitthe READ_ONCE(). This is useful in cases where update-side locksprevent the value of the pointer from changing. Please note that thisprimitive doesnot prevent the compiler from repeating this referenceor combining it with other references, so it should not be used withoutprotection of appropriate locks.
This function is only for update-side use. Using this functionwhen protected only byrcu_read_lock()
will result in infrequentbut very ugly failures.
- rcu_dereference¶
rcu_dereference(p)
fetch RCU-protected pointer for dereferencing
Parameters
p
The pointer to read, prior to dereferencing
Description
This is a simple wrapper aroundrcu_dereference_check()
.
- rcu_dereference_bh¶
rcu_dereference_bh(p)
fetch an RCU-bh-protected pointer for dereferencing
Parameters
p
The pointer to read, prior to dereferencing
Description
Makesrcu_dereference_check()
do the dirty work.
- rcu_dereference_sched¶
rcu_dereference_sched(p)
fetch RCU-sched-protected pointer for dereferencing
Parameters
p
The pointer to read, prior to dereferencing
Description
Makesrcu_dereference_check()
do the dirty work.
- rcu_pointer_handoff¶
rcu_pointer_handoff(p)
Hand off a pointer from RCU to other mechanism
Parameters
p
The pointer to hand off
Description
This is simply an identity function, but it documents where a pointeris handed off from RCU to some other synchronization mechanism, forexample, reference counting or locking. In C11, it would map tokill_dependency(). It could be used as follows:
rcu_read_lock();p = rcu_dereference(gp);long_lived = is_long_lived(p);if (long_lived) { if (!atomic_inc_not_zero(p->refcnt)) long_lived = false; else p = rcu_pointer_handoff(p);}rcu_read_unlock();
- voidrcu_read_lock(void)¶
mark the beginning of an RCU read-side critical section
Parameters
void
no arguments
Description
Whensynchronize_rcu()
is invoked on one CPU while other CPUsare within RCU read-side critical sections, then thesynchronize_rcu()
is guaranteed to block until after all the otherCPUs exit their critical sections. Similarly, ifcall_rcu()
is invokedon one CPU while other CPUs are within RCU read-side criticalsections, invocation of the corresponding RCU callback is deferreduntil after the all the other CPUs exit their critical sections.
Bothsynchronize_rcu()
andcall_rcu()
also wait for regions of codewith preemption disabled, including regions of code with interrupts orsoftirqs disabled.
Note, however, that RCU callbacks are permitted to run concurrentlywith new RCU read-side critical sections. One way that this can happenis via the following sequence of events: (1) CPU 0 enters an RCUread-side critical section, (2) CPU 1 invokescall_rcu()
to registeran RCU callback, (3) CPU 0 exits the RCU read-side critical section,(4) CPU 2 enters a RCU read-side critical section, (5) the RCUcallback is invoked. This is legal, because the RCU read-side criticalsection that was running concurrently with thecall_rcu()
(and whichtherefore might be referencing something that the corresponding RCUcallback would free up) has completed before the correspondingRCU callback is invoked.
RCU read-side critical sections may be nested. Any deferred actionswill be deferred until the outermost RCU read-side critical sectioncompletes.
You can avoid reading and understanding the next paragraph byfollowing this rule: don’t put anything in anrcu_read_lock()
RCUread-side critical section that would block in a !PREEMPTION kernel.But if you want the full story, read on!
In non-preemptible RCU implementations (pure TREE_RCU and TINY_RCU),it is illegal to block while in an RCU read-side critical section.In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTIONkernel builds, RCU read-side critical sections may be preempted,but explicit blocking is illegal. Finally, in preemptible RCUimplementations in real-time (with -rt patchset) kernel builds, RCUread-side critical sections may be preempted and they may also block, butonly when acquiring spinlocks that are subject to priority inheritance.
- voidrcu_read_unlock(void)¶
marks the end of an RCU read-side critical section.
Parameters
void
no arguments
Description
In almost all situations,rcu_read_unlock()
is immune from deadlock.This deadlock immunity also extends to the scheduler’s runqueueand priority-inheritance spinlocks, courtesy of the quiescent-statedeferral that is carried out whenrcu_read_unlock()
is invoked withinterrupts disabled.
Seercu_read_lock()
for more information.
- voidrcu_read_lock_bh(void)¶
mark the beginning of an RCU-bh critical section
Parameters
void
no arguments
Description
This is equivalent torcu_read_lock()
, but also disables softirqs.Note that anything else that disables softirqs can also serve as an RCUread-side critical section. However, please note that this equivalenceapplies only to v5.0 and later. Before v5.0,rcu_read_lock()
andrcu_read_lock_bh()
were unrelated.
Note thatrcu_read_lock_bh()
and the matchingrcu_read_unlock_bh()
must occur in the same context, for example, it is illegal to invokercu_read_unlock_bh()
from one task if the matchingrcu_read_lock_bh()
was invoked from some other task.
- voidrcu_read_unlock_bh(void)¶
marks the end of a softirq-only RCU critical section
- voidrcu_read_lock_sched(void)¶
mark the beginning of a RCU-sched critical section
Parameters
void
no arguments
Description
This is equivalent torcu_read_lock()
, but also disables preemption.Read-side critical sections can also be introduced by anything else thatdisables preemption, including local_irq_disable() and friends. However,please note that the equivalence torcu_read_lock()
applies only tov5.0 and later. Before v5.0,rcu_read_lock()
andrcu_read_lock_sched()
were unrelated.
Note thatrcu_read_lock_sched()
and the matchingrcu_read_unlock_sched()
must occur in the same context, for example, it is illegal to invokercu_read_unlock_sched()
from process context if the matchingrcu_read_lock_sched()
was invoked from an NMI handler.
- voidrcu_read_unlock_sched(void)¶
marks the end of a RCU-classic critical section
- RCU_INIT_POINTER¶
RCU_INIT_POINTER(p,v)
initialize an RCU protected pointer
Parameters
p
The pointer to be initialized.
v
The value to initialized the pointer to.
Description
Initialize an RCU-protected pointer in special cases where readersdo not need ordering constraints on the CPU or the compiler. Thesespecial cases are:
This use of
RCU_INIT_POINTER()
is NULLing out the pointerorThe caller has taken whatever steps are required to preventRCU readers from concurrently accessing this pointeror
The referenced data structure has already been exposed toreaders either at compile time or via
rcu_assign_pointer()
andYou have not madeany reader-visible changes tothis structure since thenor
It is OK for readers accessing this structure from itsnew location to see the old state of the structure. (Forexample, the changes were to statistical counters or toother state where exact synchronization is not required.)
Failure to follow these rules governing use ofRCU_INIT_POINTER()
willresult in impossible-to-diagnose memory corruption. As in the structureswill look OK in crash dumps, but any concurrent RCU readers mightsee pre-initialized values of the referenced data structure. Soplease be very careful how you useRCU_INIT_POINTER()
!!!
If you are creating an RCU-protected linked structure that is accessedby a single external-to-structure RCU-protected pointer, then you mayuseRCU_INIT_POINTER()
to initialize the internal RCU-protectedpointers, but you must usercu_assign_pointer()
to initialize theexternal-to-structure pointerafter you have completely initializedthe reader-accessible portions of the linked structure.
Note that unlikercu_assign_pointer()
,RCU_INIT_POINTER()
provides noordering guarantees for either the CPU or the compiler.
- RCU_POINTER_INITIALIZER¶
RCU_POINTER_INITIALIZER(p,v)
statically initialize an RCU protected pointer
Parameters
p
The pointer to be initialized.
v
The value to initialized the pointer to.
Description
GCC-style initialization for an RCU-protected pointer in a structure field.
- kfree_rcu¶
kfree_rcu(ptr,rhf)
kfree an object after a grace period.
Parameters
ptr
pointer to kfree for double-argument invocations.
rhf
the name of the struct rcu_head within the type ofptr.
Description
Many rcu callbacks functions just callkfree()
on the base structure.These functions are trivial, but their size adds up, and furthermorewhen they are used in a kernel module, that module must invoke thehigh-latencyrcu_barrier()
function at module-unload time.
Thekfree_rcu()
function handles this issue. In order to have a universalcallback function handling different offsets of rcu_head, the callback needsto determine the starting address of the freed object, which can be a largekmalloc or vmalloc allocation. To allow simply aligning the pointer down topage boundary for those, only offsets up to 4095 bytes can be accommodated.If the offset is larger than 4095 bytes, a compile-time error willbe generated in kvfree_rcu_arg_2(). If this error is triggered, you caneither fall back to use ofcall_rcu()
or rearrange the structure toposition the rcu_head structure into the first 4096 bytes.
The object to be freed can be allocated either bykmalloc()
orkmem_cache_alloc()
.
Note that the allowable offset might decrease in the future.
The BUILD_BUG_ON check must not involve any function calls, hence thechecks are done in macros here.
- kfree_rcu_mightsleep¶
kfree_rcu_mightsleep(ptr)
kfree an object after a grace period.
Parameters
ptr
pointer to kfree for single-argument invocations.
Description
When it comes to head-less variant, only one argumentis passed and that is just a pointer which has to befreed after a grace period. Therefore the semantic is
kfree_rcu_mightsleep(ptr);
whereptr is the pointer to be freed bykvfree()
.
Please note, head-less way of freeing is permitted touse from a context that has to followmight_sleep()
annotation. Otherwise, please switch and embed thercu_head structure within the type ofptr.
- voidrcu_head_init(structrcu_head*rhp)¶
Initialize rcu_head for
rcu_head_after_call_rcu()
Parameters
structrcu_head*rhp
The rcu_head structure to initialize.
Description
If you intend to invokercu_head_after_call_rcu()
to test whether agiven rcu_head structure has already been passed tocall_rcu()
, thenyou must also invoke thisrcu_head_init()
function on it just afterallocating that structure. Calls to this function must not race withcalls tocall_rcu()
,rcu_head_after_call_rcu()
, or callback invocation.
- boolrcu_head_after_call_rcu(structrcu_head*rhp,rcu_callback_tf)¶
Has this rcu_head been passed to
call_rcu()
?
Parameters
structrcu_head*rhp
The rcu_head structure to test.
rcu_callback_tf
The function passed to
call_rcu()
along withrhp.
Description
Returnstrue if therhp has been passed tocall_rcu()
withfunc,andfalse otherwise. Emits a warning in any other case, includingthe case whererhp has already been invoked after a grace period.Calls to this function must not race with callback invocation. One wayto avoid such races is to enclose the call torcu_head_after_call_rcu()
in an RCU read-side critical section that includes a read-side fetchof the pointer to the structure containingrhp.
- voidrcu_softirq_qs(void)¶
Provide a set of RCU quiescent states in softirq processing
Parameters
void
no arguments
Description
Mark a quiescent state for RCU, Tasks RCU, and Tasks Trace RCU.This is a special-purpose function to be used in the softirqinfrastructure and perhaps the occasional long-running softirqhandler.
Note that from RCU’s viewpoint, a call torcu_softirq_qs()
isequivalent to momentarily completely enabling preemption. Forexample, given this code:
local_bh_disable();do_something();rcu_softirq_qs(); // Ado_something_else();local_bh_enable(); // B
A call tosynchronize_rcu()
that began concurrently with thecall to do_something() would be guaranteed to wait only untilexecution reached statement A. Without thatrcu_softirq_qs()
,that samesynchronize_rcu()
would instead be guaranteed to waituntil execution reached statement B.
- boolrcu_watching_snap_stopped_since(structrcu_data*rdp,intsnap)¶
Has RCU stopped watching a given CPU since the specifiedsnap?
Parameters
structrcu_data*rdp
The rcu_data corresponding to the CPU for which to check EQS.
intsnap
rcu_watching snapshot taken when the CPU wasn’t in an EQS.
Description
Returns true if the CPU corresponding tordp has spent some time in anextended quiescent state sincesnap. Note that this doesn’t check if it/still/ is in an EQS, just that it went through one sincesnap.
This is meant to be used in a loop waiting for a CPU to go through an EQS.
- intrcu_is_cpu_rrupt_from_idle(void)¶
see if ‘interrupted’ from idle
Parameters
void
no arguments
Description
If the current CPU is idle and running at a first-level (not nested)interrupt, or directly, from idle, return true.
The caller must have at least disabled IRQs.
- voidrcu_irq_exit_check_preempt(void)¶
Validate that scheduling is possible
Parameters
void
no arguments
- void__rcu_irq_enter_check_tick(void)¶
Enable scheduler tick on CPU if RCU needs it.
Parameters
void
no arguments
Description
The scheduler tick is not normally enabled when CPUs enter the kernelfrom nohz_full userspace execution. After all, nohz_full userspaceexecution is an RCU quiescent state and the time executing in the kernelis quite short. Except of course when it isn’t. And it is not hard tocause a large system to spend tens of seconds or even minutes loopingin the kernel, which can cause a number of problems, include RCU CPUstall warnings.
Therefore, if a nohz_full CPU fails to report a quiescent statein a timely manner, the RCU grace-period kthread sets that CPU’s->rcu_urgent_qs flag with the expectation that the next interrupt orexception will invoke this function, which will turn on the schedulertick, which will enable RCU to detect that CPU’s quiescent states,for example, due to cond_resched() calls in CONFIG_PREEMPT=n kernels.The tick will be disabled once a quiescent state is reported forthis CPU.
Of course, in carefully tuned systems, there might never be aninterrupt or exception. In that case, the RCU grace-period kthreadwill eventually cause one to happen. However, in less carefullycontrolled environments, this function allows RCU to get what itneeds without creating otherwise useless interruptions.
- notraceboolrcu_is_watching(void)¶
RCU read-side critical sections permitted on current CPU?
Parameters
void
no arguments
Description
Returntrue if RCU is watching the running CPU andfalse otherwise.Antrue return means that this CPU can safely enter RCU read-sidecritical sections.
Although calls torcu_is_watching()
from most parts of the kernelwill returntrue, there are important exceptions. For example, if thecurrent CPU is deep within its idle loop, in kernel entry/exit code,or offline,rcu_is_watching()
will returnfalse.
Make notrace because it can be called by the internal functions offtrace, and making this notrace removes unnecessary recursion calls.
- voidrcu_set_gpwrap_lag(unsignedlonglag_gps)¶
Set RCU GP sequence overflow lag value.
Parameters
unsignedlonglag_gps
Set overflow lag to this many grace period worth of counterswhich is used by rcutorture to quickly force a gpwrap situation.lag_gps = 0 means we reset it back to the boot-time value.
- voidcall_rcu_hurry(structrcu_head*head,rcu_callback_tfunc)¶
Queue RCU callback for invocation after grace period, and flush all lazy callbacks (including the new one) to the main ->cblist while doing so.
Parameters
structrcu_head*head
structure to be used for queueing the RCU updates.
rcu_callback_tfunc
actual callback function to be invoked after the grace period
Description
The callback function will be invoked some time after a full graceperiod elapses, in other words after all pre-existing RCU read-sidecritical sections have completed.
Use this API instead ofcall_rcu()
if you don’t want the callback to bedelayed for very long periods of time, which can happen on systems withoutmemory pressure and on systems which are lightly loaded or mostly idle.This function will cause callbacks to be invoked sooner than later at theexpense of extra power. Other than that, this function is identical to, andreusescall_rcu()
’s logic. Refer tocall_rcu()
for more details about memoryordering and other functionality.
- voidcall_rcu(structrcu_head*head,rcu_callback_tfunc)¶
Queue an RCU callback for invocation after a grace period. By default the callbacks are ‘lazy’ and are kept hidden from the main ->cblist to prevent starting of grace periods too soon. If you desire grace periods to start very soon, use
call_rcu_hurry()
.
Parameters
structrcu_head*head
structure to be used for queueing the RCU updates.
rcu_callback_tfunc
actual callback function to be invoked after the grace period
Description
The callback function will be invoked some time after a full graceperiod elapses, in other words after all pre-existing RCU read-sidecritical sections have completed. However, the callback functionmight well execute concurrently with RCU read-side critical sectionsthat started aftercall_rcu()
was invoked.
It is perfectly legal to repost an RCU callback, potentially witha different callback function, from within its callback function.The specified function will be invoked after another full grace periodhas elapsed. This use case is similar in form to the common practiceof reposting a timer from within its own handler.
RCU read-side critical sections are delimited byrcu_read_lock()
andrcu_read_unlock()
, and may be nested. In addition, but only inv5.0 and later, regions of code across which interrupts, preemption,or softirqs have been disabled also serve as RCU read-side criticalsections. This includes hardware interrupt handlers, softirq handlers,and NMI handlers.
Note that all CPUs must agree that the grace period extended beyondall pre-existing RCU read-side critical section. On systems with morethan one CPU, this means that when “func()” is invoked, each CPU isguaranteed to have executed a full memory barrier since the end of itslast RCU read-side critical section whose beginning preceded the calltocall_rcu()
. It also means that each CPU executing an RCU read-sidecritical section that continues beyond the start of “func()” must haveexecuted a memory barrier after thecall_rcu()
but before the beginningof that RCU read-side critical section. Note that these guaranteesinclude CPUs that are offline, idle, or executing in user mode, aswell as CPUs that are executing in the kernel.
Furthermore, if CPU A invokedcall_rcu()
and CPU B invoked theresulting RCU callback function “func()”, then both CPU A and CPU B areguaranteed to execute a full memory barrier during the time intervalbetween the call tocall_rcu()
and the invocation of “func()” -- evenif CPU A and CPU B are the same CPU (but again only if the system hasmore than one CPU).
Implementation of these memory-ordering guarantees is described here:A Tour Through TREE_RCU’s Grace-Period Memory Ordering.
Specific tocall_rcu()
(as opposed to the other call_rcu*() functions),in kernels built with CONFIG_RCU_LAZY=y,call_rcu()
might delay for manyseconds before starting the grace period needed by the correspondingcallback. This delay can significantly improve energy-efficiencyon low-utilization battery-powered devices. To avoid this delay,in latency-sensitive kernel code, usecall_rcu_hurry()
.
- voidsynchronize_rcu(void)¶
wait until a grace period has elapsed.
Parameters
void
no arguments
Description
Control will return to the caller some time after a full graceperiod has elapsed, in other words after all currently executing RCUread-side critical sections have completed. Note, however, thatupon return fromsynchronize_rcu()
, the caller might well be executingconcurrently with new RCU read-side critical sections that began whilesynchronize_rcu()
was waiting.
RCU read-side critical sections are delimited byrcu_read_lock()
andrcu_read_unlock()
, and may be nested. In addition, but only inv5.0 and later, regions of code across which interrupts, preemption,or softirqs have been disabled also serve as RCU read-side criticalsections. This includes hardware interrupt handlers, softirq handlers,and NMI handlers.
Note that this guarantee implies further memory-ordering guarantees.On systems with more than one CPU, whensynchronize_rcu()
returns,each CPU is guaranteed to have executed a full memory barrier sincethe end of its last RCU read-side critical section whose beginningpreceded the call tosynchronize_rcu()
. In addition, each CPU havingan RCU read-side critical section that extends beyond the return fromsynchronize_rcu()
is guaranteed to have executed a full memory barrierafter the beginning ofsynchronize_rcu()
and before the beginning ofthat RCU read-side critical section. Note that these guarantees includeCPUs that are offline, idle, or executing in user mode, as well as CPUsthat are executing in the kernel.
Furthermore, if CPU A invokedsynchronize_rcu()
, which returnedto its caller on CPU B, then both CPU A and CPU B are guaranteedto have executed a full memory barrier during the execution ofsynchronize_rcu()
-- even if CPU A and CPU B are the same CPU (butagain only if the system has more than one CPU).
Implementation of these memory-ordering guarantees is described here:A Tour Through TREE_RCU’s Grace-Period Memory Ordering.
- voidget_completed_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Return a full pre-completed polled state cookie
Parameters
structrcu_gp_oldstate*rgosp
Place to put state cookie
Description
Stores intorgosp a value that will always be treated by functionslikepoll_state_synchronize_rcu_full()
as a cookie whose grace periodhas already completed.
- unsignedlongget_state_synchronize_rcu(void)¶
Snapshot current RCU state
Parameters
void
no arguments
Description
Returns a cookie that is used by a later call tocond_synchronize_rcu()
orpoll_state_synchronize_rcu()
to determine whether or not a fullgrace period has elapsed in the meantime.
- voidget_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Snapshot RCU state, both normal and expedited
Parameters
structrcu_gp_oldstate*rgosp
location to place combined normal/expedited grace-period state
Description
Places the normal and expedited grace-period states inrgosp. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()
orpoll_state_synchronize_rcu_full()
to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.The rcu_gp_oldstate structure takes up twice the memory of an unsignedlong, but is guaranteed to see all grace periods. In contrast, thecombined state occupies less memory, but can sometimes fail to takegrace periods into account.
This does not guarantee that the needed grace period will actuallystart.
- unsignedlongstart_poll_synchronize_rcu(void)¶
Snapshot and start RCU grace period
Parameters
void
no arguments
Description
Returns a cookie that is used by a later call tocond_synchronize_rcu()
orpoll_state_synchronize_rcu()
to determine whether or not a fullgrace period has elapsed in the meantime. If the needed grace periodis not already slated to start, notifies RCU core of the need for thatgrace period.
- voidstart_poll_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Take a full snapshot and start RCU grace period
Parameters
structrcu_gp_oldstate*rgosp
value from
get_state_synchronize_rcu_full()
orstart_poll_synchronize_rcu_full()
Description
Places the normal and expedited grace-period states in*rgos. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()
orpoll_state_synchronize_rcu_full()
to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.If the needed grace period is not already slated to start, notifiesRCU core of the need for that grace period.
- boolpoll_state_synchronize_rcu(unsignedlongoldstate)¶
Has the specified RCU grace period completed?
Parameters
unsignedlongoldstate
value from
get_state_synchronize_rcu()
orstart_poll_synchronize_rcu()
Description
If a full RCU grace period has elapsed since the earlier call fromwhicholdstate was obtained, returntrue, otherwise returnfalse.Iffalse is returned, it is the caller’s responsibility to invoke thisfunction later on until it does returntrue. Alternatively, the callercan explicitly wait for a grace period, for example, by passingoldstateto eithercond_synchronize_rcu()
orcond_synchronize_rcu_expedited()
on the one hand or by directly invoking eithersynchronize_rcu()
orsynchronize_rcu_expedited()
on the other.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than a billion grace periods (and way more on a 64-bit system!).Those needing to keep old state values for very long time periods(many hours even on 32-bit systems) should check them occasionally andeither refresh them or set a flag indicating that the grace period hascompleted. Alternatively, they can useget_completed_synchronize_rcu()
to get a guaranteed-completed grace-period state.
In addition, because oldstate compresses the grace-period state forboth normal and expedited grace periods into a single unsigned long,it can miss a grace period whensynchronize_rcu()
runs concurrentlywithsynchronize_rcu_expedited()
. If this is unacceptable, pleaseinstead use the _full() variant of these polling APIs.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu()
that was invoked at the callto the function that providedoldstate, and that returned at the endof this function.
- boolpoll_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Has the specified RCU grace period completed?
Parameters
structrcu_gp_oldstate*rgosp
value from
get_state_synchronize_rcu_full()
orstart_poll_synchronize_rcu_full()
Description
If a full RCU grace period has elapsed since the earlier call fromwhichrgosp was obtained, return **true*, otherwise returnfalse.Iffalse is returned, it is the caller’s responsibility to invoke thisfunction later on until it does returntrue. Alternatively, the callercan explicitly wait for a grace period, for example, by passingrgosptocond_synchronize_rcu()
or by directly invokingsynchronize_rcu()
.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waitedfor more than a billion grace periods (and way more on a 64-bitsystem!). Those needing to keep rcu_gp_oldstate values for verylong time periods (many hours even on 32-bit systems) should checkthem occasionally and either refresh them or set a flag indicatingthat the grace period has completed. Alternatively, they can useget_completed_synchronize_rcu_full()
to get a guaranteed-completedgrace-period state.
This function provides the same memory-ordering guarantees that wouldbe provided by asynchronize_rcu()
that was invoked at the call tothe function that providedrgosp, and that returned at the end of thisfunction. And this guarantee requires that the root rcu_node structure’s->gp_seq field be checked instead of that of the rcu_state structure.The problem is that the just-ending grace-period’s callbacks can beinvoked between the time that the root rcu_node structure’s ->gp_seqfield is updated and the time that the rcu_state structure’s ->gp_seqfield is updated. Therefore, if a singlesynchronize_rcu()
is tocause a subsequentpoll_state_synchronize_rcu_full()
to returntrue,then the root rcu_node structure is the one that needs to be polled.
- voidcond_synchronize_rcu(unsignedlongoldstate)¶
Conditionally wait for an RCU grace period
Parameters
unsignedlongoldstate
value from
get_state_synchronize_rcu()
,start_poll_synchronize_rcu()
, orstart_poll_synchronize_rcu_expedited()
Description
If a full RCU grace period has elapsed since the earlier call toget_state_synchronize_rcu()
orstart_poll_synchronize_rcu()
, just return.Otherwise, invokesynchronize_rcu()
to wait for a full grace period.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu()
that was invoked at the callto the function that providedoldstate and that returned at the endof this function.
- voidcond_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)¶
Conditionally wait for an RCU grace period
Parameters
structrcu_gp_oldstate*rgosp
value from
get_state_synchronize_rcu_full()
,start_poll_synchronize_rcu_full()
, orstart_poll_synchronize_rcu_expedited_full()
Description
If a full RCU grace period has elapsed since the call toget_state_synchronize_rcu_full()
,start_poll_synchronize_rcu_full()
,orstart_poll_synchronize_rcu_expedited_full()
from whichrgosp wasobtained, just return. Otherwise, invokesynchronize_rcu()
to waitfor a full grace period.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu()
that was invoked at the callto the function that providedrgosp and that returned at the end ofthis function.
- voidrcu_barrier(void)¶
Wait until all in-flight
call_rcu()
callbacks complete.
Parameters
void
no arguments
Description
Note that this primitive does not necessarily wait for an RCU grace periodto complete. For example, if there are no RCU callbacks queued anywherein the system, thenrcu_barrier()
is within its rights to returnimmediately, without waiting for anything, much less an RCU grace period.
- voidrcu_barrier_throttled(void)¶
Do
rcu_barrier()
, but limit to one per second
Parameters
void
no arguments
Description
This can be thought of as guard rails aroundrcu_barrier()
thatpermits unrestricted userspace use, at least assuming the hardware’stry_cmpxchg() is robust. There will be at most one call per second torcu_barrier()
system-wide from use of this function, which means thatcallers might needlessly wait a second or three.
This is intended for use by test suites to avoid OOM by flushing RCUcallbacks from the previous test before starting the next. See thercutree.do_rcu_barrier module parameter for more information.
Why not simply makercu_barrier()
more scalable? That might bethe eventual endpoint, but let’s keep it simple for the time being.Note that the module parameter infrastructure serializes calls to agiven .set() function, but should concurrent .set() invocation ever bepossible, we are ready!
- voidsynchronize_rcu_expedited(void)¶
Brute-force RCU grace period
Parameters
void
no arguments
Description
Wait for an RCU grace period, but expedite it. The basic idea is toIPI all non-idle non-nohz online CPUs. The IPI handler checks whetherthe CPU is in an RCU critical section, and if so, it sets a flag thatcauses the outermostrcu_read_unlock()
to report the quiescent statefor RCU-preempt or asks the scheduler for help for RCU-sched. On theother hand, if the CPU is not in an RCU read-side critical section,the IPI handler reports the quiescent state immediately.
Although this is a great improvement over previous expeditedimplementations, it is still unfriendly to real-time workloads, so isthus not recommended for any sort of common-case code. In fact, ifyou are usingsynchronize_rcu_expedited()
in a loop, please restructureyour code to batch your updates, and then use a singlesynchronize_rcu()
instead.
This has the same semantics as (but is more brutal than)synchronize_rcu()
.
- unsignedlongstart_poll_synchronize_rcu_expedited(void)¶
Snapshot current RCU state and start expedited grace period
Parameters
void
no arguments
Description
Returns a cookie to pass to a call tocond_synchronize_rcu()
,cond_synchronize_rcu_expedited()
, orpoll_state_synchronize_rcu()
,allowing them to determine whether or not any sort of grace period haselapsed in the meantime. If the needed expedited grace period is notalready slated to start, initiates that grace period.
- voidstart_poll_synchronize_rcu_expedited_full(structrcu_gp_oldstate*rgosp)¶
Take a full snapshot and start expedited grace period
Parameters
structrcu_gp_oldstate*rgosp
Place to put snapshot of grace-period state
Description
Places the normal and expedited grace-period states in rgosp. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()
orpoll_state_synchronize_rcu_full()
to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.If the needed expedited grace period is not already slated to start,initiates that grace period.
- voidcond_synchronize_rcu_expedited(unsignedlongoldstate)¶
Conditionally wait for an expedited RCU grace period
Parameters
unsignedlongoldstate
value from
get_state_synchronize_rcu()
,start_poll_synchronize_rcu()
, orstart_poll_synchronize_rcu_expedited()
Description
If any type of full RCU grace period has elapsed since the earliercall toget_state_synchronize_rcu()
,start_poll_synchronize_rcu()
,orstart_poll_synchronize_rcu_expedited()
, just return. Otherwise,invokesynchronize_rcu_expedited()
to wait for a full grace period.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu()
that was invoked at the callto the function that providedoldstate and that returned at the endof this function.
- voidcond_synchronize_rcu_expedited_full(structrcu_gp_oldstate*rgosp)¶
Conditionally wait for an expedited RCU grace period
Parameters
structrcu_gp_oldstate*rgosp
value from
get_state_synchronize_rcu_full()
,start_poll_synchronize_rcu_full()
, orstart_poll_synchronize_rcu_expedited_full()
Description
If a full RCU grace period has elapsed since the call toget_state_synchronize_rcu_full()
,start_poll_synchronize_rcu_full()
,orstart_poll_synchronize_rcu_expedited_full()
from whichrgosp wasobtained, just return. Otherwise, invokesynchronize_rcu_expedited()
to wait for a full grace period.
Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.
This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu()
that was invoked at the callto the function that providedrgosp and that returned at the end ofthis function.
- boolrcu_read_lock_held_common(bool*ret)¶
might we be in RCU-sched read-side critical section?
Parameters
bool*ret
Best guess answer if lockdep cannot be relied on
Description
Returns true if lockdep must be ignored, in which case*ret
containsthe best guess described below. Otherwise returns false, in whichcase*ret
tells the caller nothing and the caller should insteadconsult lockdep.
If CONFIG_DEBUG_LOCK_ALLOC is selected, set*ret
to nonzero iff in anRCU-sched read-side critical section. In absence ofCONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-sidecritical section unless it can prove otherwise. Note that disablingof preemption (including disabling irqs) counts as an RCU-schedread-side critical section. This is useful for debug checks in functionsthat required that they be called within an RCU-sched read-sidecritical section.
Check debug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.
Note that if the CPU is in the idle loop from an RCU point of view (ie:that we are in the section between ct_idle_enter() and ct_idle_exit())thenrcu_read_lock_held()
sets*ret
to false even if the CPU did anrcu_read_lock()
. The reason for this is that RCU ignores CPUs that arein such a section, considering these as in extended quiescent state,so such a CPU is effectively never in an RCU read-side critical sectionregardless of what RCU primitives it invokes. This state of affairs isrequired --- we need to keep an RCU-free window in idle where the CPU maypossibly enter into low power mode. This way we can notice an extendedquiescent state to other CPUs that started a grace period. Otherwisewe would delay any grace period as long as we run in the idle task.
Similarly, we avoid claiming an RCU read lock held if the currentCPU is offline.
- voidrcu_async_hurry(void)¶
Make future async RCU callbacks not lazy.
Parameters
void
no arguments
Description
After a call to this function, future calls tocall_rcu()
will be processed in a timely fashion.
- voidrcu_async_relax(void)¶
Make future async RCU callbacks lazy.
Parameters
void
no arguments
Description
After a call to this function, future calls tocall_rcu()
will be processed in a lazy fashion.
- voidrcu_expedite_gp(void)¶
Expedite future RCU grace periods
Parameters
void
no arguments
Description
After a call to this function, future calls tosynchronize_rcu()
andfriends act as the correspondingsynchronize_rcu_expedited()
functionhad instead been called.
- voidrcu_unexpedite_gp(void)¶
Cancel prior
rcu_expedite_gp()
invocation
Parameters
void
no arguments
Description
Undo a prior call torcu_expedite_gp()
. If all prior calls torcu_expedite_gp()
are undone by a subsequent call torcu_unexpedite_gp()
,and if the rcu_expedited sysfs/boot parameter is not set, then allsubsequent calls tosynchronize_rcu()
and friends will return totheir normal non-expedited behavior.
- intrcu_read_lock_held(void)¶
might we be in RCU read-side critical section?
Parameters
void
no arguments
Description
If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCUread-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,this assumes we are in an RCU read-side critical section unless it canprove otherwise. This is useful for debug checks in functions thatrequire that they be called within an RCU read-side critical section.
Checks debug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.
Note thatrcu_read_lock()
and the matchingrcu_read_unlock()
mustoccur in the same context, for example, it is illegal to invokercu_read_unlock()
in process context if the matchingrcu_read_lock()
was invoked from within an irq handler.
Note thatrcu_read_lock()
is disallowed if the CPU is either idle oroffline from an RCU perspective, so check for those as well.
- intrcu_read_lock_bh_held(void)¶
might we be in RCU-bh read-side critical section?
Parameters
void
no arguments
Description
Check for bottom half being disabled, which covers both theCONFIG_PROVE_RCU and not cases. Note that if someone usesrcu_read_lock_bh()
, but then later enables BH, lockdep (if enabled)will show the situation. This is useful for debug checks in functionsthat require that they be called within an RCU read-side criticalsection.
Check debug_lockdep_rcu_enabled() to prevent false positives during boot.
Note thatrcu_read_lock_bh()
is disallowed if the CPU is either idle oroffline from an RCU perspective, so check for those as well.
- voidwakeme_after_rcu(structrcu_head*head)¶
Callback function to awaken a task after grace period
Parameters
structrcu_head*head
Pointer to rcu_head member within rcu_synchronize structure
Description
Awaken the corresponding task now that a grace period has elapsed.
- voidinit_rcu_head_on_stack(structrcu_head*head)¶
initialize on-stack rcu_head for debugobjects
Parameters
structrcu_head*head
pointer to rcu_head structure to be initialized
Description
This function informs debugobjects of a new rcu_head structure thathas been allocated as an auto variable on the stack. This functionis not required for rcu_head structures that are statically defined orthat are dynamically allocated on the heap. This function has noeffect for !CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.
- voiddestroy_rcu_head_on_stack(structrcu_head*head)¶
destroy on-stack rcu_head for debugobjects
Parameters
structrcu_head*head
pointer to rcu_head structure to be initialized
Description
This function informs debugobjects that an on-stack rcu_head structureis about to go out of scope. As withinit_rcu_head_on_stack()
, thisfunction is not required for rcu_head structures that are staticallydefined or that are dynamically allocated on the heap. Also as withinit_rcu_head_on_stack()
, this function has no effect for!CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.
- unsignedlongget_completed_synchronize_rcu(void)¶
Return a pre-completed polled state cookie
Parameters
void
no arguments
Description
Returns a value that will always be treated by functions likepoll_state_synchronize_rcu()
as a cookie whose grace period has alreadycompleted.
- unsignedlongget_completed_synchronize_srcu(void)¶
Return a pre-completed polled state cookie
Parameters
void
no arguments
Description
Returns a value thatpoll_state_synchronize_srcu()
will always treatas a cookie whose grace period has already completed.
- boolsame_state_synchronize_srcu(unsignedlongoldstate1,unsignedlongoldstate2)¶
Are two old-state values identical?
Parameters
unsignedlongoldstate1
First old-state value.
unsignedlongoldstate2
Second old-state value.
Description
The two old-state values must have been obtained from eitherget_state_synchronize_srcu()
,start_poll_synchronize_srcu()
, orget_completed_synchronize_srcu()
. Returnstrue if the two values areidentical andfalse otherwise. This allows structures whose lifetimesare tracked by old-state values to push these values to a list header,allowing those structures to be slightly smaller.
- intsrcu_read_lock_held(conststructsrcu_struct*ssp)¶
might we be in SRCU read-side critical section?
Parameters
conststructsrcu_struct*ssp
The srcu_struct structure to check
Description
If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an SRCUread-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,this assumes we are in an SRCU read-side critical section unless it canprove otherwise.
Checks debug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.
Note that SRCU is based on its own statemachine and it doesn’trelies on normal RCU, it can be called from the CPU whichis in the idle loop from an RCU point of view or offline.
- srcu_dereference_check¶
srcu_dereference_check(p,ssp,c)
fetch SRCU-protected pointer for later dereferencing
Parameters
p
the pointer to fetch and protect for later dereferencing
ssp
pointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.
c
condition to check for update-side use
Description
If PROVE_RCU is enabled, invoking this outside of an RCU read-sidecritical section will result in an RCU-lockdep splat, unlessc evaluatesto 1. Thec argument will normally be a logical expression containinglockdep_is_held() calls.
- srcu_dereference¶
srcu_dereference(p,ssp)
fetch SRCU-protected pointer for later dereferencing
Parameters
p
the pointer to fetch and protect for later dereferencing
ssp
pointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.
Description
Makesrcu_dereference_check()
do the dirty work. If PROVE_RCUis enabled, invoking this outside of an RCU read-side criticalsection will result in an RCU-lockdep splat.
- srcu_dereference_notrace¶
srcu_dereference_notrace(p,ssp)
no tracing and no lockdep calls from here
Parameters
p
the pointer to fetch and protect for later dereferencing
ssp
pointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.
- intsrcu_read_lock(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to register the new reader.
Description
Enter an SRCU read-side critical section. Note that SRCU read-sidecritical sections may be nested. However, it is illegal tocall anything that waits on an SRCU grace period for the samesrcu_struct, whether directly or indirectly. Please note thatone way to indirectly wait on an SRCU grace period is to acquirea mutex that is held elsewhere while callingsynchronize_srcu()
orsynchronize_srcu_expedited()
.
The return value fromsrcu_read_lock()
is guaranteed to benon-negative. This value must be passed unaltered to the matchingsrcu_read_unlock()
. Note thatsrcu_read_lock()
and the matchingsrcu_read_unlock()
must occur in the same context, for example, it isillegal to invokesrcu_read_unlock()
in an irq handler if the matchingsrcu_read_lock()
was invoked in process context. Or, for that matter toinvokesrcu_read_unlock()
from one task and the matchingsrcu_read_lock()
from another.
- structsrcu_ctr__percpu*srcu_read_lock_fast(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to register the new reader.
Description
Enter an SRCU read-side critical section, but for a light-weightsmp_mb()-free reader. Seesrcu_read_lock()
for more information.
Ifsrcu_read_lock_fast()
is ever used on an srcu_struct structure,then none of the other flavors may be used, whether before, during,or after. Note that grace-period auto-expediting is disabled for _fastsrcu_struct structures because auto-expedited grace periods invokesynchronize_rcu_expedited()
, IPIs and all.
Note thatsrcu_read_lock_fast()
can be invoked only from those contextswhere RCU is watching, that is, from contexts where it would be legalto invokercu_read_lock()
. Otherwise, lockdep will complain.
- structsrcu_ctr__percpu*srcu_down_read_fast(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to register the new reader.
Description
Enter a semaphore-like SRCU read-side critical section, but fora light-weight smp_mb()-free reader. Seesrcu_read_lock_fast()
andsrcu_down_read()
for more information.
The same srcu_struct may be used concurrently bysrcu_down_read_fast()
andsrcu_read_lock_fast()
.
- intsrcu_read_lock_lite(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to register the new reader.
Description
Enter an SRCU read-side critical section, but for a light-weightsmp_mb()-free reader. Seesrcu_read_lock()
for more information.
Ifsrcu_read_lock_lite()
is ever used on an srcu_struct structure,then none of the other flavors may be used, whether before, during,or after. Note that grace-period auto-expediting is disabled for _litesrcu_struct structures because auto-expedited grace periods invokesynchronize_rcu_expedited()
, IPIs and all.
Note thatsrcu_read_lock_lite()
can be invoked only from those contextswhere RCU is watching, that is, from contexts where it would be legalto invokercu_read_lock()
. Otherwise, lockdep will complain.
- intsrcu_read_lock_nmisafe(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to register the new reader.
Description
Enter an SRCU read-side critical section, but in an NMI-safe manner.Seesrcu_read_lock()
for more information.
Ifsrcu_read_lock_nmisafe()
is ever used on an srcu_struct structure,then none of the other flavors may be used, whether before, during,or after.
- intsrcu_down_read(structsrcu_struct*ssp)¶
register a new reader for an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to register the new reader.
Description
Enter a semaphore-like SRCU read-side critical section. Note thatSRCU read-side critical sections may be nested. However, it isillegal to call anything that waits on an SRCU grace period for thesame srcu_struct, whether directly or indirectly. Please note thatone way to indirectly wait on an SRCU grace period is to acquirea mutex that is held elsewhere while callingsynchronize_srcu()
orsynchronize_srcu_expedited()
. But if you want lockdep to help youkeep this stuff straight, you should instead usesrcu_read_lock()
.
The semaphore-like nature ofsrcu_down_read()
means that the matchingsrcu_up_read()
can be invoked from some other context, for example,from some other task or from an irq handler. However, neithersrcu_down_read()
norsrcu_up_read()
may be invoked from an NMI handler.
Calls tosrcu_down_read()
may be nested, similar to the manner inwhich calls to down_read() may be nested. The same srcu_struct may beused concurrently bysrcu_down_read()
andsrcu_read_lock()
.
- voidsrcu_read_unlock(structsrcu_struct*ssp,intidx)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to unregister the old reader.
intidx
return value from corresponding
srcu_read_lock()
.
Description
Exit an SRCU read-side critical section.
- voidsrcu_read_unlock_fast(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to unregister the old reader.
structsrcu_ctr__percpu*scp
return value from corresponding
srcu_read_lock_fast()
.
Description
Exit a light-weight SRCU read-side critical section.
- voidsrcu_up_read_fast(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to unregister the old reader.
structsrcu_ctr__percpu*scp
return value from corresponding
srcu_read_lock_fast()
.
Description
Exit an SRCU read-side critical section, but not necessarily fromthe same context as the machingsrcu_down_read_fast()
.
- voidsrcu_read_unlock_lite(structsrcu_struct*ssp,intidx)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to unregister the old reader.
intidx
return value from corresponding
srcu_read_lock_lite()
.
Description
Exit a light-weight SRCU read-side critical section.
- voidsrcu_read_unlock_nmisafe(structsrcu_struct*ssp,intidx)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to unregister the old reader.
intidx
return value from corresponding
srcu_read_lock_nmisafe()
.
Description
Exit an SRCU read-side critical section, but in an NMI-safe manner.
- voidsrcu_up_read(structsrcu_struct*ssp,intidx)¶
unregister a old reader from an SRCU-protected structure.
Parameters
structsrcu_struct*ssp
srcu_struct in which to unregister the old reader.
intidx
return value from corresponding
srcu_read_lock()
.
Description
Exit an SRCU read-side critical section, but not necessarily fromthe same context as the machingsrcu_down_read()
.
- voidsmp_mb__after_srcu_read_unlock(void)¶
ensure full ordering after srcu_read_unlock
Parameters
void
no arguments
Description
Converts the preceding srcu_read_unlock into a two-way memory barrier.
Call this after srcu_read_unlock, to guarantee that all memory operationsthat occur after smp_mb__after_srcu_read_unlock will appear to happen afterthe preceding srcu_read_unlock.
- voidsmp_mb__after_srcu_read_lock(void)¶
ensure full ordering after srcu_read_lock
Parameters
void
no arguments
Description
Converts the preceding srcu_read_lock into a two-way memory barrier.
Call this after srcu_read_lock, to guarantee that all memory operationsthat occur after smp_mb__after_srcu_read_lock will appear to happen afterthe preceding srcu_read_lock.
- intinit_srcu_struct(structsrcu_struct*ssp)¶
initialize a sleep-RCU structure
Parameters
structsrcu_struct*ssp
structure to initialize.
Description
Must invoke this on a given srcu_struct before passing that srcu_structto any other function. Each srcu_struct represents a separate domainof SRCU protection.
- boolsrcu_readers_active(structsrcu_struct*ssp)¶
returns true if there are readers. and false otherwise
Parameters
structsrcu_struct*ssp
which srcu_struct to count active readers (holding srcu_read_lock).
Description
Note that this is not an atomic primitive, and can therefore suffersevere errors when invoked on an active srcu_struct. That said, itcan be useful as an error check at cleanup time.
- voidcleanup_srcu_struct(structsrcu_struct*ssp)¶
deconstruct a sleep-RCU structure
Parameters
structsrcu_struct*ssp
structure to clean up.
Description
Must invoke this after you are finished using a given srcu_struct thatwas initialized viainit_srcu_struct()
, else you leak memory.
- voidcall_srcu(structsrcu_struct*ssp,structrcu_head*rhp,rcu_callback_tfunc)¶
Queue a callback for invocation after an SRCU grace period
Parameters
structsrcu_struct*ssp
srcu_struct in queue the callback
structrcu_head*rhp
structure to be used for queueing the SRCU callback.
rcu_callback_tfunc
function to be invoked after the SRCU grace period
Description
The callback function will be invoked some time after a full SRCUgrace period elapses, in other words after all pre-existing SRCUread-side critical sections have completed. However, the callbackfunction might well execute concurrently with other SRCU read-sidecritical sections that started aftercall_srcu()
was invoked. SRCUread-side critical sections are delimited bysrcu_read_lock()
andsrcu_read_unlock()
, and may be nested.
The callback will be invoked from process context, but with bhdisabled. The callback function must therefore be fast and mustnot block.
See the description ofcall_rcu()
for more detailed information onmemory ordering guarantees.
- voidsynchronize_srcu_expedited(structsrcu_struct*ssp)¶
Brute-force SRCU grace period
Parameters
structsrcu_struct*ssp
srcu_struct with which to synchronize.
Description
Wait for an SRCU grace period to elapse, but be more aggressive aboutspinning rather than blocking when waiting.
Note thatsynchronize_srcu_expedited()
has the same deadlock andmemory-ordering properties as doessynchronize_srcu()
.
- voidsynchronize_srcu(structsrcu_struct*ssp)¶
wait for prior SRCU read-side critical-section completion
Parameters
structsrcu_struct*ssp
srcu_struct with which to synchronize.
Description
Wait for the count to drain to zero of both indexes. To avoid thepossible starvation ofsynchronize_srcu()
, it waits for the count ofthe index=!(ssp->srcu_ctrp -ssp->sda
->srcu_ctrs[0]) to drain to zeroat first, and then flip the ->srcu_ctrp and wait for the count of theother index.
Can block; must be called from process context.
Note that it is illegal to callsynchronize_srcu()
from the correspondingSRCU read-side critical section; doing so will result in deadlock.However, it is perfectly legal to callsynchronize_srcu()
on onesrcu_struct from some other srcu_struct’s read-side critical section,as long as the resulting graph of srcu_structs is acyclic.
There are memory-ordering constraints implied bysynchronize_srcu()
.On systems with more than one CPU, whensynchronize_srcu()
returns,each CPU is guaranteed to have executed a full memory barrier sincethe end of its last corresponding SRCU read-side critical sectionwhose beginning preceded the call tosynchronize_srcu()
. In addition,each CPU having an SRCU read-side critical section that extends beyondthe return fromsynchronize_srcu()
is guaranteed to have executed afull memory barrier after the beginning ofsynchronize_srcu()
and beforethe beginning of that SRCU read-side critical section. Note that theseguarantees include CPUs that are offline, idle, or executing in user mode,as well as CPUs that are executing in the kernel.
Furthermore, if CPU A invokedsynchronize_srcu()
, which returnedto its caller on CPU B, then both CPU A and CPU B are guaranteedto have executed a full memory barrier during the execution ofsynchronize_srcu()
. This guarantee applies even if CPU A and CPU Bare the same CPU, but again only if the system has more than one CPU.
Of course, these memory-ordering guarantees apply only whensynchronize_srcu()
,srcu_read_lock()
, andsrcu_read_unlock()
arepassed the same srcu_struct structure.
Implementation of these memory-ordering guarantees is similar tothat ofsynchronize_rcu()
.
If SRCU is likely idle as determined by srcu_should_expedite(),expedite the first request. This semantic was provided by Classic SRCU,and is relied upon by its users, so TREE SRCU must also provide it.Note that detecting idleness is heuristic and subject to both falsepositives and negatives.
- unsignedlongget_state_synchronize_srcu(structsrcu_struct*ssp)¶
Provide an end-of-grace-period cookie
Parameters
structsrcu_struct*ssp
srcu_struct to provide cookie for.
Description
This function returns a cookie that can be passed topoll_state_synchronize_srcu()
, which will return true if a full graceperiod has elapsed in the meantime. It is the caller’s responsibilityto make sure that grace period happens, for example, by invokingcall_srcu()
after return fromget_state_synchronize_srcu()
.
- unsignedlongstart_poll_synchronize_srcu(structsrcu_struct*ssp)¶
Provide cookie and start grace period
Parameters
structsrcu_struct*ssp
srcu_struct to provide cookie for.
Description
This function returns a cookie that can be passed topoll_state_synchronize_srcu()
, which will return true if a full graceperiod has elapsed in the meantime. Unlikeget_state_synchronize_srcu()
,this function also ensures that any needed SRCU grace period will bestarted. This convenience does come at a cost in terms of CPU overhead.
- boolpoll_state_synchronize_srcu(structsrcu_struct*ssp,unsignedlongcookie)¶
Has cookie’s grace period ended?
Parameters
structsrcu_struct*ssp
srcu_struct to provide cookie for.
unsignedlongcookie
Return value from
get_state_synchronize_srcu()
orstart_poll_synchronize_srcu()
.
Description
This function takes the cookie that was returned from eitherget_state_synchronize_srcu()
orstart_poll_synchronize_srcu()
, andreturnstrue if an SRCU grace period elapsed since the time that thecookie was created.
Because cookies are finite in size, wrapping/overflow is possible.This is more pronounced on 32-bit systems where cookies are 32 bits,where in theory wrapping could happen in about 14 hours assuming25-microsecond expedited SRCU grace periods. However, a more likelyoverflow lower bound is on the order of 24 days in the case ofone-millisecond SRCU grace periods. Of course, wrapping in a 64-bitsystem requires geologic timespans, as in more than seven million yearseven for expedited SRCU grace periods.
Wrapping/overflow is much more of an issue for CONFIG_SMP=n systemsthat also have CONFIG_PREEMPTION=n, which selects Tiny SRCU. This usesa 16-bit cookie, which rcutorture routinely wraps in a matter of afew minutes. If this proves to be a problem, this counter will beexpanded to the same size as for Tree SRCU.
- voidsrcu_barrier(structsrcu_struct*ssp)¶
Wait until all in-flight
call_srcu()
callbacks complete.
Parameters
structsrcu_struct*ssp
srcu_struct on which to wait for in-flight callbacks.
- unsignedlongsrcu_batches_completed(structsrcu_struct*ssp)¶
return batches completed.
Parameters
structsrcu_struct*ssp
srcu_struct on which to report batch completion.
Description
Report the number of batches, correlated with, but not necessarilyprecisely the same as, the number of grace periods that have elapsed.
- voidhlist_bl_del_rcu(structhlist_bl_node*n)¶
deletes entry from hash list without re-initialization
Parameters
structhlist_bl_node*n
the element to delete from the hash list.
Note
hlist_bl_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.
Description
In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_bl_add_head_rcu()
orhlist_bl_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_bl_for_each_entry().
- voidhlist_bl_add_head_rcu(structhlist_bl_node*n,structhlist_bl_head*h)¶
Parameters
structhlist_bl_node*n
the element to add to the hash list.
structhlist_bl_head*h
the list to add to.
Description
Adds the specified element to the specified hlist_bl,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_bl_add_head_rcu()
orhlist_bl_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_bl_for_each_entry_rcu()
, used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock()
.
- hlist_bl_for_each_entry_rcu¶
hlist_bl_for_each_entry_rcu(tpos,pos,head,member)
iterate over rcu list of given type
Parameters
tpos
the type * to use as a loop cursor.
pos
the
structhlist_bl_node
to use as a loop cursor.head
the head for your list.
member
the name of the hlist_bl_node within the struct.
- list_tail_rcu¶
list_tail_rcu(head)
returns the prev pointer of the head of the list
Parameters
head
the head of the list
Note
This should only be used with the list header, and even thenonly iflist_del()
and similar primitives are not also used on thelist header.
- voidlist_add_rcu(structlist_head*new,structlist_head*head)¶
add a new entry to rcu-protected list
Parameters
structlist_head*new
new entry to be added
structlist_head*head
list head to add it after
Description
Insert a new entry after the specified head.This is good for implementing stacks.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_add_rcu()
orlist_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu()
.
- voidlist_add_tail_rcu(structlist_head*new,structlist_head*head)¶
add a new entry to rcu-protected list
Parameters
structlist_head*new
new entry to be added
structlist_head*head
list head to add it before
Description
Insert a new entry before the specified head.This is useful for implementing queues.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_add_tail_rcu()
orlist_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu()
.
- voidlist_del_rcu(structlist_head*entry)¶
deletes entry from list without re-initialization
Parameters
structlist_head*entry
the element to delete from the list.
Note
list_empty()
on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.
Description
In particular, it means that we can not poison the forwardpointers that may still be used for walking the list.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_del_rcu()
orlist_add_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu()
.
Note that the caller is not permitted to immediately freethe newly deleted entry. Instead, eithersynchronize_rcu()
orcall_rcu()
must be used to defer freeing until an RCUgrace period has elapsed.
- voidlist_bidir_del_rcu(structlist_head*entry)¶
deletes entry from list without re-initialization
Parameters
structlist_head*entry
the element to delete from the list.
Description
In contrast tolist_del_rcu()
doesn’t poison the prev pointer thusallowing backwards traversal via list_bidir_prev_rcu().
The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with another list-mutationprimitive, such aslist_bidir_del_rcu()
orlist_add_rcu()
, running onthis same list. However, it is perfectly legal to run concurrentlywith the _rcu list-traversal primitives, such aslist_for_each_entry_rcu()
.
Note thatlist_del_rcu()
andlist_bidir_del_rcu()
must not be used onthe same list.
Note that the caller is not permitted to immediately freethe newly deleted entry. Instead, eithersynchronize_rcu()
orcall_rcu()
must be used to defer freeing until an RCUgrace period has elapsed.
Note
list_empty()
on entry does not return true after this becausethe entry is in a special undefined state that permits RCU-basedlockfree reverse traversal. In particular this means that we can notpoison the forward and backwards pointers that may still be used forwalking the list.
- voidhlist_del_init_rcu(structhlist_node*n)¶
deletes entry from hash list with re-initialization
Parameters
structhlist_node*n
the element to delete from the hash list.
Note
list_unhashed() on the node return true after this. It isuseful for RCU based read lockfree traversal if the writer sidemust know if the list entry is still hashed or already unhashed.
Description
In particular, it means that we can not poison the forward pointersthat may still be used for walking the hash list and we can onlyzero the pprev pointer so list_unhashed() will return true afterthis.
The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with anotherlist-mutation primitive, such ashlist_add_head_rcu()
orhlist_del_rcu()
, running on this same list. However, it isperfectly legal to run concurrently with the _rcu list-traversalprimitives, such ashlist_for_each_entry_rcu()
.
- voidlist_replace_rcu(structlist_head*old,structlist_head*new)¶
replace old entry by new one
Parameters
structlist_head*old
the element to be replaced
structlist_head*new
the new element to insert
Description
Theold entry will be replaced with thenew entry atomically fromthe perspective of concurrent readers. It is the caller’s responsibilityto synchronize with concurrent updaters, if any.
Note
old should not be empty.
- void__list_splice_init_rcu(structlist_head*list,structlist_head*prev,structlist_head*next,void(*sync)(void))¶
join an RCU-protected list into an existing list.
Parameters
structlist_head*list
the RCU-protected list to splice
structlist_head*prev
points to the last element of the existing list
structlist_head*next
points to the first element of the existing list
void(*sync)(void)
synchronize_rcu, synchronize_rcu_expedited, ...
Description
The list pointed to byprev andnext can be RCU-read traversedconcurrently with this function.
Note that this function blocks.
Important note: the caller must take whatever action is necessary to preventany other updates to the existing list. In principle, it is possible tomodify the list as soon as sync() begins execution. If this sort of thingbecomes necessary, an alternative version based oncall_rcu()
could becreated. But only if -really- needed -- there is no shortage of RCU APImembers.
- voidlist_splice_init_rcu(structlist_head*list,structlist_head*head,void(*sync)(void))¶
splice an RCU-protected list into an existing list, designed for stacks.
Parameters
structlist_head*list
the RCU-protected list to splice
structlist_head*head
the place in the existing list to splice the first list into
void(*sync)(void)
synchronize_rcu, synchronize_rcu_expedited, ...
- voidlist_splice_tail_init_rcu(structlist_head*list,structlist_head*head,void(*sync)(void))¶
splice an RCU-protected list into an existing list, designed for queues.
Parameters
structlist_head*list
the RCU-protected list to splice
structlist_head*head
the place in the existing list to splice the first list into
void(*sync)(void)
synchronize_rcu, synchronize_rcu_expedited, ...
- list_entry_rcu¶
list_entry_rcu(ptr,type,member)
get the struct for this entry
Parameters
ptr
the
structlist_head
pointer.type
the type of the struct this is embedded in.
member
the name of the list_head within the struct.
Description
This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu()
as long as it’s guarded byrcu_read_lock()
.
- list_first_or_null_rcu¶
list_first_or_null_rcu(ptr,type,member)
get the first element from a list
Parameters
ptr
the list head to take the element from.
type
the type of the struct this is embedded in.
member
the name of the list_head within the struct.
Description
Note that if the list is empty, it returns NULL.
This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu()
as long as it’s guarded byrcu_read_lock()
.
- list_next_or_null_rcu¶
list_next_or_null_rcu(head,ptr,type,member)
get the next element from a list
Parameters
head
the head for the list.
ptr
the list head to take the next element from.
type
the type of the struct this is embedded in.
member
the name of the list_head within the struct.
Description
Note that if the ptr is at the end of the list, NULL is returned.
This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu()
as long as it’s guarded byrcu_read_lock()
.
- list_for_each_entry_rcu¶
list_for_each_entry_rcu(pos,head,member,cond...)
iterate over rcu list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_head within the struct.
cond...
optional lockdep expression if called from non-RCU protection.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such aslist_add_rcu()
as long as the traversal is guarded byrcu_read_lock()
.
- list_for_each_entry_srcu¶
list_for_each_entry_srcu(pos,head,member,cond)
iterate over rcu list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_head within the struct.
cond
lockdep expression for the lock required to traverse the list.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such aslist_add_rcu()
as long as the traversal is guarded bysrcu_read_lock()
.The lockdep expressionsrcu_read_lock_held()
can be passed as thecond argument from read side.
- list_entry_lockless¶
list_entry_lockless(ptr,type,member)
get the struct for this entry
Parameters
ptr
the
structlist_head
pointer.type
the type of the struct this is embedded in.
member
the name of the list_head within the struct.
Description
This primitive may safely run concurrently with the _rculist-mutation primitives such aslist_add_rcu()
, but requires someimplicit RCU read-side guarding. One example is running within a specialexception-time environment where preemption is disabled and where lockdepcannot be invoked. Another example is when items are added to the list,but never deleted.
- list_for_each_entry_lockless¶
list_for_each_entry_lockless(pos,head,member)
iterate over rcu list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_struct within the struct.
Description
This primitive may safely run concurrently with the _rculist-mutation primitives such aslist_add_rcu()
, but requires someimplicit RCU read-side guarding. One example is running within a specialexception-time environment where preemption is disabled and where lockdepcannot be invoked. Another example is when items are added to the list,but never deleted.
- list_for_each_entry_continue_rcu¶
list_for_each_entry_continue_rcu(pos,head,member)
continue iteration over list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_head within the struct.
Description
Continue to iterate over list of given type, continuing afterthe current position which must have been in the list when the RCU readlock was taken.This would typically require either that you obtained the node from aprevious walk of the list in the same RCU read-side critical section, orthat you held some sort of non-RCU reference (such as a reference count)to keep the node aliveand in the list.
This iterator is similar tolist_for_each_entry_from_rcu()
exceptthis starts after the given position and that one starts at the givenposition.
- list_for_each_entry_from_rcu¶
list_for_each_entry_from_rcu(pos,head,member)
iterate over a list from current point
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the list_node within the struct.
Description
Iterate over the tail of a list starting from a given position,which must have been in the list when the RCU read lock was taken.This would typically require either that you obtained the node from aprevious walk of the list in the same RCU read-side critical section, orthat you held some sort of non-RCU reference (such as a reference count)to keep the node aliveand in the list.
This iterator is similar tolist_for_each_entry_continue_rcu()
exceptthis starts from the given position and that one starts from the positionafter the given position.
- voidhlist_del_rcu(structhlist_node*n)¶
deletes entry from hash list without re-initialization
Parameters
structhlist_node*n
the element to delete from the hash list.
Note
list_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.
Description
In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()
orhlist_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry()
.
- voidhlist_replace_rcu(structhlist_node*old,structhlist_node*new)¶
replace old entry by new one
Parameters
structhlist_node*old
the element to be replaced
structhlist_node*new
the new element to insert
Description
Theold entry will be replaced with thenew entry atomically fromthe perspective of concurrent readers. It is the caller’s responsibilityto synchronize with concurrent updaters, if any.
- voidhlists_swap_heads_rcu(structhlist_head*left,structhlist_head*right)¶
swap the lists the hlist heads point to
Parameters
structhlist_head*left
The hlist head on the left
structhlist_head*right
The hlist head on the right
Description
- The lists start out as [left ][node1 ... ] and
[right ][node2 ... ]
- The lists end up as [left ][node2 ... ]
[right ][node1 ... ]
- voidhlist_add_head_rcu(structhlist_node*n,structhlist_head*h)¶
Parameters
structhlist_node*n
the element to add to the hash list.
structhlist_head*h
the list to add to.
Description
Adds the specified element to the specified hlist,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()
orhlist_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu()
, used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock()
.
- voidhlist_add_tail_rcu(structhlist_node*n,structhlist_head*h)¶
Parameters
structhlist_node*n
the element to add to the hash list.
structhlist_head*h
the list to add to.
Description
Adds the specified element to the specified hlist,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()
orhlist_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu()
, used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock()
.
- voidhlist_add_before_rcu(structhlist_node*n,structhlist_node*next)¶
Parameters
structhlist_node*n
the new element to add to the hash list.
structhlist_node*next
the existing element to add the new element before.
Description
Adds the specified element to the specified hlistbefore the specified node while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()
orhlist_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu()
, used to prevent memory-consistencyproblems on Alpha CPUs.
- voidhlist_add_behind_rcu(structhlist_node*n,structhlist_node*prev)¶
Parameters
structhlist_node*n
the new element to add to the hash list.
structhlist_node*prev
the existing element to add the new element after.
Description
Adds the specified element to the specified hlistafter the specified node while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()
orhlist_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu()
, used to prevent memory-consistencyproblems on Alpha CPUs.
- hlist_for_each_entry_rcu¶
hlist_for_each_entry_rcu(pos,head,member,cond...)
iterate over rcu list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the hlist_node within the struct.
cond...
optional lockdep expression if called from non-RCU protection.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()
as long as the traversal is guarded byrcu_read_lock()
.
- hlist_for_each_entry_srcu¶
hlist_for_each_entry_srcu(pos,head,member,cond)
iterate over rcu list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the hlist_node within the struct.
cond
lockdep expression for the lock required to traverse the list.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()
as long as the traversal is guarded bysrcu_read_lock()
.The lockdep expressionsrcu_read_lock_held()
can be passed as thecond argument from read side.
- hlist_for_each_entry_rcu_notrace¶
hlist_for_each_entry_rcu_notrace(pos,head,member)
iterate over rcu list of given type (for tracing)
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the hlist_node within the struct.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()
as long as the traversal is guarded byrcu_read_lock()
.
This is the same ashlist_for_each_entry_rcu()
except that it doesnot do any RCU debugging or tracing.
- hlist_for_each_entry_rcu_bh¶
hlist_for_each_entry_rcu_bh(pos,head,member)
iterate over rcu list of given type
Parameters
pos
the type * to use as a loop cursor.
head
the head for your list.
member
the name of the hlist_node within the struct.
Description
This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()
as long as the traversal is guarded byrcu_read_lock()
.
- hlist_for_each_entry_continue_rcu¶
hlist_for_each_entry_continue_rcu(pos,member)
iterate over a hlist continuing after current point
Parameters
pos
the type * to use as a loop cursor.
member
the name of the hlist_node within the struct.
- hlist_for_each_entry_continue_rcu_bh¶
hlist_for_each_entry_continue_rcu_bh(pos,member)
iterate over a hlist continuing after current point
Parameters
pos
the type * to use as a loop cursor.
member
the name of the hlist_node within the struct.
- hlist_for_each_entry_from_rcu¶
hlist_for_each_entry_from_rcu(pos,member)
iterate over a hlist continuing from current point
Parameters
pos
the type * to use as a loop cursor.
member
the name of the hlist_node within the struct.
- voidhlist_nulls_del_init_rcu(structhlist_nulls_node*n)¶
deletes entry from hash list with re-initialization
Parameters
structhlist_nulls_node*n
the element to delete from the hash list.
Note
hlist_nulls_unhashed() on the node return true after this. It isuseful for RCU based read lockfree traversal if the writer sidemust know if the list entry is still hashed or already unhashed.
Description
In particular, it means that we can not poison the forward pointersthat may still be used for walking the hash list and we can onlyzero the pprev pointer so list_unhashed() will return true afterthis.
The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with anotherlist-mutation primitive, such ashlist_nulls_add_head_rcu()
orhlist_nulls_del_rcu()
, running on this same list. However, it isperfectly legal to run concurrently with the _rcu list-traversalprimitives, such ashlist_nulls_for_each_entry_rcu()
.
- hlist_nulls_first_rcu¶
hlist_nulls_first_rcu(head)
returns the first element of the hash list.
Parameters
head
the head of the list.
- hlist_nulls_next_rcu¶
hlist_nulls_next_rcu(node)
returns the element of the list afternode.
Parameters
node
element of the list.
- voidhlist_nulls_del_rcu(structhlist_nulls_node*n)¶
deletes entry from hash list without re-initialization
Parameters
structhlist_nulls_node*n
the element to delete from the hash list.
Note
hlist_nulls_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.
Description
In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()
orhlist_nulls_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry().
- voidhlist_nulls_add_head_rcu(structhlist_nulls_node*n,structhlist_nulls_head*h)¶
Parameters
structhlist_nulls_node*n
the element to add to the hash list.
structhlist_nulls_head*h
the list to add to.
Description
Adds the specified element to the specified hlist_nulls,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()
orhlist_nulls_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry_rcu()
, used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock()
.
- voidhlist_nulls_add_tail_rcu(structhlist_nulls_node*n,structhlist_nulls_head*h)¶
Parameters
structhlist_nulls_node*n
the element to add to the hash list.
structhlist_nulls_head*h
the list to add to.
Description
Adds the specified element to the specified hlist_nulls,while permitting racing traversals.
The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()
orhlist_nulls_del_rcu()
, running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry_rcu()
, used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock()
.
- hlist_nulls_for_each_entry_rcu¶
hlist_nulls_for_each_entry_rcu(tpos,pos,head,member)
iterate over rcu list of given type
Parameters
tpos
the type * to use as a loop cursor.
pos
the
structhlist_nulls_node
to use as a loop cursor.head
the head of the list.
member
the name of the hlist_nulls_node within the struct.
Description
The barrier() is needed to make sure compiler doesn’t cache first element [1],as this loop can be restarted [2][1] Documentation/memory-barriers.txt around line 1533[2]Using RCU hlist_nulls to protect list and objects around line 146
- hlist_nulls_for_each_entry_safe¶
hlist_nulls_for_each_entry_safe(tpos,pos,head,member)
iterate over list of given type safe against removal of list entry
Parameters
tpos
the type * to use as a loop cursor.
pos
the
structhlist_nulls_node
to use as a loop cursor.head
the head of the list.
member
the name of the hlist_nulls_node within the struct.
- boolrcu_sync_is_idle(structrcu_sync*rsp)¶
Are readers permitted to use their fastpaths?
Parameters
structrcu_sync*rsp
Pointer to rcu_sync structure to use for synchronization
Description
Returns true if readers are permitted to use their fastpaths. Must beinvoked within some flavor of RCU read-side critical section.
- voidrcu_sync_init(structrcu_sync*rsp)¶
Initialize an rcu_sync structure
Parameters
structrcu_sync*rsp
Pointer to rcu_sync structure to be initialized
- voidrcu_sync_func(structrcu_head*rhp)¶
Callback function managing reader access to fastpath
Parameters
structrcu_head*rhp
Pointer to rcu_head in rcu_sync structure to use for synchronization
Description
This function is passed tocall_rcu()
function byrcu_sync_enter()
andrcu_sync_exit()
, so that it is invoked after a grace period following thethat invocation of enter/exit.
If it is called byrcu_sync_enter()
it signals that all the readers wereswitched onto slow path.
If it is called byrcu_sync_exit()
it takes action based on events thathave taken place in the meantime, so that closely spacedrcu_sync_enter()
andrcu_sync_exit()
pairs need not wait for a grace period.
If anotherrcu_sync_enter()
is invoked before the grace periodended, reset state to allow the nextrcu_sync_exit()
to let thereaders back onto their fastpaths (after a grace period). If bothanotherrcu_sync_enter()
and its matchingrcu_sync_exit()
are invokedbefore the grace period ended, re-invokecall_rcu()
on behalf of thatrcu_sync_exit()
. Otherwise, set all state back to idle so that readerscan again use their fastpaths.
- voidrcu_sync_enter(structrcu_sync*rsp)¶
Force readers onto slowpath
Parameters
structrcu_sync*rsp
Pointer to rcu_sync structure to use for synchronization
Description
This function is used by updaters who need readers to make use ofa slowpath during the update. After this function returns, allsubsequent calls torcu_sync_is_idle()
will return false, whichtells readers to stay off their fastpaths. A later call torcu_sync_exit()
re-enables reader fastpaths.
When called in isolation,rcu_sync_enter()
must wait for a graceperiod, however, closely spaced calls torcu_sync_enter()
canoptimize away the grace-period wait via a state machine implementedbyrcu_sync_enter()
,rcu_sync_exit()
, andrcu_sync_func()
.
- voidrcu_sync_exit(structrcu_sync*rsp)¶
Allow readers back onto fast path after grace period
Parameters
structrcu_sync*rsp
Pointer to rcu_sync structure to use for synchronization
Description
This function is used by updaters who have completed, and can thereforenow allow readers to make use of their fastpaths after a grace periodhas elapsed. After this grace period has completed, all subsequentcalls torcu_sync_is_idle()
will return true, which tells readers thatthey can once again use their fastpaths.
- voidrcu_sync_dtor(structrcu_sync*rsp)¶
Clean up an rcu_sync structure
Parameters
structrcu_sync*rsp
Pointer to rcu_sync structure to be cleaned up
- structrcu_tasks_percpu¶
Per-CPU component of definition for a Tasks-RCU-like mechanism.
Definition:
struct rcu_tasks_percpu { struct rcu_segcblist cblist; raw_spinlock_t __private lock; unsigned long rtp_jiffies; unsigned long rtp_n_lock_retries; struct timer_list lazy_timer; unsigned int urgent_gp; struct work_struct rtp_work; struct irq_work rtp_irq_work; struct rcu_head barrier_q_head; struct list_head rtp_blkd_tasks; struct list_head rtp_exit_list; int cpu; int index; struct rcu_tasks *rtpp;};
Members
cblist
Callback list.
lock
Lock protecting per-CPU callback list.
rtp_jiffies
Jiffies counter value for statistics.
rtp_n_lock_retries
Rough lock-contention statistic.
lazy_timer
Timer to unlazify callbacks.
urgent_gp
Number of additional non-lazy grace periods.
rtp_work
Work queue for invoking callbacks.
rtp_irq_work
IRQ work queue for deferred wakeups.
barrier_q_head
RCU callback for barrier operation.
rtp_blkd_tasks
List of tasks blocked as readers.
rtp_exit_list
List of tasks in the latter portion of do_exit().
cpu
CPU number corresponding to this entry.
index
Index of this CPU in rtpcp_array of the rcu_tasks structure.
rtpp
Pointer to the rcu_tasks structure.
- structrcu_tasks¶
Definition for a Tasks-RCU-like mechanism.
Definition:
struct rcu_tasks { struct rcuwait cbs_wait; raw_spinlock_t cbs_gbl_lock; struct mutex tasks_gp_mutex; int gp_state; int gp_sleep; int init_fract; unsigned long gp_jiffies; unsigned long gp_start; unsigned long tasks_gp_seq; unsigned long n_ipis; unsigned long n_ipis_fails; struct task_struct *kthread_ptr; unsigned long lazy_jiffies; rcu_tasks_gp_func_t gp_func; pregp_func_t pregp_func; pertask_func_t pertask_func; postscan_func_t postscan_func; holdouts_func_t holdouts_func; postgp_func_t postgp_func; call_rcu_func_t call_func; unsigned int wait_state; struct rcu_tasks_percpu __percpu *rtpcpu; struct rcu_tasks_percpu **rtpcp_array; int percpu_enqueue_shift; int percpu_enqueue_lim; int percpu_dequeue_lim; unsigned long percpu_dequeue_gpseq; struct mutex barrier_q_mutex; atomic_t barrier_q_count; struct completion barrier_q_completion; unsigned long barrier_q_seq; unsigned long barrier_q_start; char *name; char *kname;};
Members
cbs_wait
RCU wait allowing a new callback to get kthread’s attention.
cbs_gbl_lock
Lock protecting callback list.
tasks_gp_mutex
Mutex protecting grace period, needed during mid-boot dead zone.
gp_state
Grace period’s most recent state transition (debugging).
gp_sleep
Per-grace-period sleep to prevent CPU-bound looping.
init_fract
Initial backoff sleep interval.
gp_jiffies
Time of lastgp_state transition.
gp_start
Most recent grace-period start in jiffies.
tasks_gp_seq
Number of grace periods completed since boot in upper bits.
n_ipis
Number of IPIs sent to encourage grace periods to end.
n_ipis_fails
Number of IPI-send failures.
kthread_ptr
This flavor’s grace-period/callback-invocation kthread.
lazy_jiffies
Number of jiffies to allow callbacks to be lazy.
gp_func
This flavor’s grace-period-wait function.
pregp_func
This flavor’s pre-grace-period function (optional).
pertask_func
This flavor’s per-task scan function (optional).
postscan_func
This flavor’s post-task scan function (optional).
holdouts_func
This flavor’s holdout-list scan function (optional).
postgp_func
This flavor’s post-grace-period function (optional).
call_func
This flavor’s
call_rcu()
-equivalent function.wait_state
Task state for synchronous grace-period waits (default TASK_UNINTERRUPTIBLE).
rtpcpu
This flavor’s rcu_tasks_percpu structure.
rtpcp_array
Array of pointers to rcu_tasks_percpu structure of CPUs in cpu_possible_mask.
percpu_enqueue_shift
Shift down CPU ID this much when enqueuing callbacks.
percpu_enqueue_lim
Number of per-CPU callback queues in use for enqueuing.
percpu_dequeue_lim
Number of per-CPU callback queues in use for dequeuing.
percpu_dequeue_gpseq
RCU grace-period number to propagate enqueue limit to dequeuers.
barrier_q_mutex
Serialize barrier operations.
barrier_q_count
Number of queues being waited on.
barrier_q_completion
Barrier wait/wakeup mechanism.
barrier_q_seq
Sequence number for barrier operations.
barrier_q_start
Most recent barrier start in jiffies.
name
This flavor’s textual name.
kname
This flavor’s kthread name.
- voidcall_rcu_tasks(structrcu_head*rhp,rcu_callback_tfunc)¶
Queue an RCU for invocation task-based grace period
Parameters
structrcu_head*rhp
structure to be used for queueing the RCU updates.
rcu_callback_tfunc
actual callback function to be invoked after the grace period
Description
The callback function will be invoked some time after a full graceperiod elapses, in other words after all currently executing RCUread-side critical sections have completed.call_rcu_tasks()
assumesthat the read-side critical sections end at a voluntary contextswitch (not a preemption!),cond_resched_tasks_rcu_qs()
, entry into idle,or transition to usermode execution. As such, there are no read-sideprimitives analogous torcu_read_lock()
andrcu_read_unlock()
becausethis primitive is intended to determine that all tasks have passedthrough a safe state, not so much for data-structure synchronization.
See the description ofcall_rcu()
for more detailed information onmemory ordering guarantees.
- voidsynchronize_rcu_tasks(void)¶
wait until an rcu-tasks grace period has elapsed.
Parameters
void
no arguments
Description
Control will return to the caller some time after a full rcu-tasksgrace period has elapsed, in other words after all currentlyexecuting rcu-tasks read-side critical sections have elapsed. Theseread-side critical sections are delimited by calls to schedule(),cond_resched_tasks_rcu_qs()
, idle execution, userspace execution, callstosynchronize_rcu_tasks()
, and (in theory, anyway) cond_resched().
This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of functionpreambles and profiling hooks. Thesynchronize_rcu_tasks()
functionis not (yet) intended for heavy use from multiple CPUs.
See the description ofsynchronize_rcu()
for more detailed informationon memory ordering guarantees.
- voidrcu_barrier_tasks(void)¶
Wait for in-flight
call_rcu_tasks()
callbacks.
Parameters
void
no arguments
Description
Although the current implementation is guaranteed to wait, it is notobligated to, for example, if there are no pending callbacks.
- voidsynchronize_rcu_tasks_rude(void)¶
wait for a rude rcu-tasks grace period
Parameters
void
no arguments
Description
Control will return to the caller some time after a rude rcu-tasksgrace period has elapsed, in other words after all currentlyexecuting rcu-tasks read-side critical sections have elapsed. Theseread-side critical sections are delimited by calls to schedule(),cond_resched_tasks_rcu_qs()
, userspace execution (which is a schedulablecontext), and (in theory, anyway) cond_resched().
This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of function preamblesand profiling hooks. Thesynchronize_rcu_tasks_rude()
function is not(yet) intended for heavy use from multiple CPUs.
See the description ofsynchronize_rcu()
for more detailed informationon memory ordering guarantees.
- voidcall_rcu_tasks_trace(structrcu_head*rhp,rcu_callback_tfunc)¶
Queue a callback trace task-based grace period
Parameters
structrcu_head*rhp
structure to be used for queueing the RCU updates.
rcu_callback_tfunc
actual callback function to be invoked after the grace period
Description
The callback function will be invoked some time after a trace rcu-tasksgrace period elapses, in other words after all currently executingtrace rcu-tasks read-side critical sections have completed. Theseread-side critical sections are delimited by calls torcu_read_lock_trace()
andrcu_read_unlock_trace()
.
See the description ofcall_rcu()
for more detailed information onmemory ordering guarantees.
- voidsynchronize_rcu_tasks_trace(void)¶
wait for a trace rcu-tasks grace period
Parameters
void
no arguments
Description
Control will return to the caller some time after a trace rcu-tasksgrace period has elapsed, in other words after all currently executingtrace rcu-tasks read-side critical sections have elapsed. These read-sidecritical sections are delimited by calls torcu_read_lock_trace()
andrcu_read_unlock_trace()
.
This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of function preamblesand profiling hooks. Thesynchronize_rcu_tasks_trace()
function is not(yet) intended for heavy use from multiple CPUs.
See the description ofsynchronize_rcu()
for more detailed informationon memory ordering guarantees.
- voidrcu_barrier_tasks_trace(void)¶
Wait for in-flight
call_rcu_tasks_trace()
callbacks.
Parameters
void
no arguments
Description
Although the current implementation is guaranteed to wait, it is notobligated to, for example, if there are no pending callbacks.
- voidrcu_cpu_stall_reset(void)¶
restart stall-warning timeout for current grace period
Parameters
void
no arguments
Description
To perform the reset request from the caller, disable stall detection until3 fqs loops have passed. This is required to ensure a fresh jiffies isloaded. It should be safe to do from the fqs loop as enough timerinterrupts and context switches should have passed.
The caller must disable hard irqs.
- intrcu_stall_chain_notifier_register(structnotifier_block*n)¶
Add an RCU CPU stall notifier
Parameters
structnotifier_block*n
Entry to add.
Description
Adds an RCU CPU stall notifier to an atomic notifier chain.Theaction passed to a notifier will beRCU_STALL_NOTIFY_NORM orfriends. Thedata will be the duration of the stalled grace period,in jiffies, coerced to a void* pointer.
Returns 0 on success,-EEXIST
on error.
- intrcu_stall_chain_notifier_unregister(structnotifier_block*n)¶
Remove an RCU CPU stall notifier
Parameters
structnotifier_block*n
Entry to add.
Description
Removes an RCU CPU stall notifier from an atomic notifier chain.
Returns zero on success,-ENOENT
on failure.
- voidrcu_read_lock_trace(void)¶
mark beginning of RCU-trace read-side critical section
Parameters
void
no arguments
Description
Whensynchronize_rcu_tasks_trace()
is invoked by one task, then thattask is guaranteed to block until all other tasks exit their read-sidecritical sections. Similarly, if call_rcu_trace() is invoked on onetask while other tasks are within RCU read-side critical sections,invocation of the corresponding RCU callback is deferred until afterthe all the other tasks exit their critical sections.
For more details, please see the documentation forrcu_read_lock()
.
- voidrcu_read_unlock_trace(void)¶
mark end of RCU-trace read-side critical section
Parameters
void
no arguments
Description
Pairs with a preceding call torcu_read_lock_trace()
, and nesting isallowed. Invoking arcu_read_unlock_trace()
when there is no matchingrcu_read_lock_trace()
is verboten, and will result in lockdep complaints.
For more details, please see the documentation forrcu_read_unlock()
.
- synchronize_rcu_mult¶
synchronize_rcu_mult(...)
Wait concurrently for multiple grace periods
Parameters
...
List of
call_rcu()
functions for different grace periods to wait on
Description
This macro waits concurrently for multiple types of RCU grace periods.For example, synchronize_rcu_mult(call_rcu, call_rcu_tasks) would waiton concurrent RCU and RCU-tasks grace periods. Waiting on a given SRCUdomain requires you to write a wrapper function for that SRCU domain’scall_srcu()
function, with this wrapper supplying the pointer to thecorresponding srcu_struct.
Note thatcall_rcu_hurry()
should be used instead ofcall_rcu()
because in kernels built with CONFIG_RCU_LAZY=y the delay between theinvocation ofcall_rcu()
and that of the corresponding RCU callbackcan be multiple seconds.
The first argument tells Tiny RCU’s _wait_rcu_gp() not tobother waiting for RCU. The reason for this is because anywheresynchronize_rcu_mult()
can be called is automatically already a fullgrace period.
- voidrcuref_init(rcuref_t*ref,unsignedintcnt)¶
Initialize a rcuref reference count with the given reference count
Parameters
rcuref_t*ref
Pointer to the reference count
unsignedintcnt
The initial reference count typically ‘1’
- unsignedintrcuref_read(rcuref_t*ref)¶
Read the number of held reference counts of a rcuref
Parameters
rcuref_t*ref
Pointer to the reference count
Return
The number of held references (0 ... N). The value 0 does notindicate that it is safe to schedule the object, protected by this referencecounter, for deconstruction.If you want to know if the reference counter has been marked DEAD (assignaled byrcuref_put()
) please use rcuread_is_dead().
- boolrcuref_is_dead(rcuref_t*ref)¶
Check if the rcuref has been already marked dead
Parameters
rcuref_t*ref
Pointer to the reference count
Return
True if the object has been marked DEAD. This signals that a previousinvocation ofrcuref_put()
returned true on this reference counter meaningthe protected object can safely be scheduled for deconstruction.Otherwise, returns false.
- boolrcuref_get(rcuref_t*ref)¶
Acquire one reference on a rcuref reference count
Parameters
rcuref_t*ref
Pointer to the reference count
Description
Similar toatomic_inc_not_zero()
but saturates at RCUREF_MAXREF.
Provides no memory ordering, it is assumed the caller has guaranteed theobject memory to be stable (RCU, etc.). It does provide a control dependencyand thereby orders future stores. See documentation in lib/rcuref.c
Return
False if the attempt to acquire a reference failed. This happenswhen the last reference has been put already
True if a reference was successfully acquired
- boolrcuref_put_rcusafe(rcuref_t*ref)¶
Release one reference for a rcuref reference count RCU safe
Parameters
rcuref_t*ref
Pointer to the reference count
Description
Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such that free()must come after.
Can be invoked from contexts, which guarantee that no grace period canhappen which would free the object concurrently if the decrement dropsthe last reference and the slowpath races against a concurrent get() andput() pair.rcu_read_lock()
’ed and atomic contexts qualify.
Return
True if this was the last reference with no future referencespossible. This signals the caller that it can safely release theobject which is protected by the reference counter.
False if there are still active references or the put() racedwith a concurrent get()/put() pair. Caller is not allowed torelease the protected object.
- boolrcuref_put(rcuref_t*ref)¶
Release one reference for a rcuref reference count
Parameters
rcuref_t*ref
Pointer to the reference count
Description
Can be invoked from any context.
Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such that free()must come after.
Return
True if this was the last reference with no future referencespossible. This signals the caller that it can safely schedule theobject, which is protected by the reference counter, fordeconstruction.
False if there are still active references or the put() racedwith a concurrent get()/put() pair. Caller is not allowed todeconstruct the protected object.
- boolsame_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp1,structrcu_gp_oldstate*rgosp2)¶
Are two old-state values identical?
Parameters
structrcu_gp_oldstate*rgosp1
First old-state value.
structrcu_gp_oldstate*rgosp2
Second old-state value.
Description
The two old-state values must have been obtained from eitherget_state_synchronize_rcu_full()
,start_poll_synchronize_rcu_full()
,orget_completed_synchronize_rcu_full()
. Returnstrue if the twovalues are identical andfalse otherwise. This allows structureswhose lifetimes are tracked by old-state values to push these valuesto a list header, allowing those structures to be slightly smaller.
Note that equality is judged on a bitwise basis, so that anrcu_gp_oldstate structure with an already-completed state in one fieldwill compare not-equal to a structure with an already-completed statein the other field. After all, thercu_gp_oldstate structure is opaqueso how did such a situation come to pass in the first place?