English

The Linux Kernel API

List Management Functions

voidINIT_LIST_HEAD(structlist_head*list)

Initialize a list_head structure

Parameters

structlist_head*list

list_head structure to be initialized.

Description

Initializes the list_head to point to itself. If it is a list header,the result is an empty list.

voidlist_add(structlist_head*new,structlist_head*head)

add a new entry

Parameters

structlist_head*new

new entry to be added

structlist_head*head

list head to add it after

Description

Insert a new entry after the specified head.This is good for implementing stacks.

voidlist_add_tail(structlist_head*new,structlist_head*head)

add a new entry

Parameters

structlist_head*new

new entry to be added

structlist_head*head

list head to add it before

Description

Insert a new entry before the specified head.This is useful for implementing queues.

voidlist_del(structlist_head*entry)

deletes entry from list.

Parameters

structlist_head*entry

the element to delete from the list.

Note

list_empty() on entry does not return true after this, the entry isin an undefined state.

voidlist_replace(structlist_head*old,structlist_head*new)

replace old entry by new one

Parameters

structlist_head*old

the element to be replaced

structlist_head*new

the new element to insert

Description

Ifold was empty, it will be overwritten.

voidlist_replace_init(structlist_head*old,structlist_head*new)

replace old entry by new one and initialize the old one

Parameters

structlist_head*old

the element to be replaced

structlist_head*new

the new element to insert

Description

Ifold was empty, it will be overwritten.

voidlist_swap(structlist_head*entry1,structlist_head*entry2)

replace entry1 with entry2 and re-add entry1 at entry2’s position

Parameters

structlist_head*entry1

the location to place entry2

structlist_head*entry2

the location to place entry1

voidlist_del_init(structlist_head*entry)

deletes entry from list and reinitialize it.

Parameters

structlist_head*entry

the element to delete from the list.

voidlist_move(structlist_head*list,structlist_head*head)

delete from one list and add as another’s head

Parameters

structlist_head*list

the entry to move

structlist_head*head

the head that will precede our entry

voidlist_move_tail(structlist_head*list,structlist_head*head)

delete from one list and add as another’s tail

Parameters

structlist_head*list

the entry to move

structlist_head*head

the head that will follow our entry

voidlist_bulk_move_tail(structlist_head*head,structlist_head*first,structlist_head*last)

move a subsection of a list to its tail

Parameters

structlist_head*head

the head that will follow our entry

structlist_head*first

first entry to move

structlist_head*last

last entry to move, can be the same as first

Description

Move all entries betweenfirst and includinglast beforehead.All three entries must belong to the same linked list.

intlist_is_first(conststructlist_head*list,conststructlist_head*head)
  • tests whetherlist is the first entry in listhead

Parameters

conststructlist_head*list

the entry to test

conststructlist_head*head

the head of the list

intlist_is_last(conststructlist_head*list,conststructlist_head*head)

tests whetherlist is the last entry in listhead

Parameters

conststructlist_head*list

the entry to test

conststructlist_head*head

the head of the list

intlist_is_head(conststructlist_head*list,conststructlist_head*head)

tests whetherlist is the listhead

Parameters

conststructlist_head*list

the entry to test

conststructlist_head*head

the head of the list

intlist_empty(conststructlist_head*head)

tests whether a list is empty

Parameters

conststructlist_head*head

the list to test.

voidlist_del_init_careful(structlist_head*entry)

deletes entry from list and reinitialize it.

Parameters

structlist_head*entry

the element to delete from the list.

Description

This is the same aslist_del_init(), except designed to be usedtogether withlist_empty_careful() in a way to guarantee orderingof other memory operations.

Any memory operations done before alist_del_init_careful() areguaranteed to be visible after alist_empty_careful() test.

intlist_empty_careful(conststructlist_head*head)

tests whether a list is empty and not being modified

Parameters

conststructlist_head*head

the list to test

Description

tests whether a list is empty _and_ checks that no other CPU might bein the process of modifying either member (next or prev)

NOTE

usinglist_empty_careful() without synchronizationcan only be safe if the only activity that can happento the list entry islist_del_init(). Eg. it cannot be usedif another CPU could re-list_add() it.

voidlist_rotate_left(structlist_head*head)

rotate the list to the left

Parameters

structlist_head*head

the head of the list

voidlist_rotate_to_front(structlist_head*list,structlist_head*head)

Rotate list to specific item.

Parameters

structlist_head*list

The desired new front of the list.

structlist_head*head

The head of the list.

Description

Rotates list so thatlist becomes the new front of the list.

intlist_is_singular(conststructlist_head*head)

tests whether a list has just one entry.

Parameters

conststructlist_head*head

the list to test.

voidlist_cut_position(structlist_head*list,structlist_head*head,structlist_head*entry)

cut a list into two

Parameters

structlist_head*list

a new list to add all removed entries

structlist_head*head

a list with entries

structlist_head*entry

an entry within head, could be the head itselfand if so we won’t cut the list

Description

This helper moves the initial part ofhead, up to andincludingentry, fromhead tolist. You shouldpass onentry an element you know is onhead.listshould be an empty list or a list you do not care aboutlosing its data.

voidlist_cut_before(structlist_head*list,structlist_head*head,structlist_head*entry)

cut a list into two, before given entry

Parameters

structlist_head*list

a new list to add all removed entries

structlist_head*head

a list with entries

structlist_head*entry

an entry within head, could be the head itself

Description

This helper moves the initial part ofhead, up to butexcludingentry, fromhead tolist. You should passinentry an element you know is onhead.list shouldbe an empty list or a list you do not care about losingits data.Ifentry ==head, all entries onhead are moved tolist.

voidlist_splice(conststructlist_head*list,structlist_head*head)

join two lists, this is designed for stacks

Parameters

conststructlist_head*list

the new list to add.

structlist_head*head

the place to add it in the first list.

voidlist_splice_tail(structlist_head*list,structlist_head*head)

join two lists, each list being a queue

Parameters

structlist_head*list

the new list to add.

structlist_head*head

the place to add it in the first list.

voidlist_splice_init(structlist_head*list,structlist_head*head)

join two lists and reinitialise the emptied list.

Parameters

structlist_head*list

the new list to add.

structlist_head*head

the place to add it in the first list.

Description

The list atlist is reinitialised

voidlist_splice_tail_init(structlist_head*list,structlist_head*head)

join two lists and reinitialise the emptied list

Parameters

structlist_head*list

the new list to add.

structlist_head*head

the place to add it in the first list.

Description

Each of the lists is a queue.The list atlist is reinitialised

list_entry

list_entry(ptr,type,member)

get the struct for this entry

Parameters

ptr

thestructlist_head pointer.

type

the type of the struct this is embedded in.

member

the name of the list_head within the struct.

list_first_entry

list_first_entry(ptr,type,member)

get the first element from a list

Parameters

ptr

the list head to take the element from.

type

the type of the struct this is embedded in.

member

the name of the list_head within the struct.

Description

Note, that list is expected to be not empty.

list_last_entry

list_last_entry(ptr,type,member)

get the last element from a list

Parameters

ptr

the list head to take the element from.

type

the type of the struct this is embedded in.

member

the name of the list_head within the struct.

Description

Note, that list is expected to be not empty.

list_first_entry_or_null

list_first_entry_or_null(ptr,type,member)

get the first element from a list

Parameters

ptr

the list head to take the element from.

type

the type of the struct this is embedded in.

member

the name of the list_head within the struct.

Description

Note that if the list is empty, it returns NULL.

list_next_entry

list_next_entry(pos,member)

get the next element in list

Parameters

pos

the type * to cursor

member

the name of the list_head within the struct.

list_next_entry_circular

list_next_entry_circular(pos,head,member)

get the next element in list

Parameters

pos

the type * to cursor.

head

the list head to take the element from.

member

the name of the list_head within the struct.

Description

Wraparound if pos is the last element (return the first element).Note, that list is expected to be not empty.

list_prev_entry

list_prev_entry(pos,member)

get the prev element in list

Parameters

pos

the type * to cursor

member

the name of the list_head within the struct.

list_prev_entry_circular

list_prev_entry_circular(pos,head,member)

get the prev element in list

Parameters

pos

the type * to cursor.

head

the list head to take the element from.

member

the name of the list_head within the struct.

Description

Wraparound if pos is the first element (return the last element).Note, that list is expected to be not empty.

list_for_each

list_for_each(pos,head)

iterate over a list

Parameters

pos

thestructlist_head to use as a loop cursor.

head

the head for your list.

list_for_each_rcu

list_for_each_rcu(pos,head)

Iterate over a list in an RCU-safe fashion

Parameters

pos

thestructlist_head to use as a loop cursor.

head

the head for your list.

list_for_each_continue

list_for_each_continue(pos,head)

continue iteration over a list

Parameters

pos

thestructlist_head to use as a loop cursor.

head

the head for your list.

Description

Continue to iterate over a list, continuing after the current position.

list_for_each_prev

list_for_each_prev(pos,head)

iterate over a list backwards

Parameters

pos

thestructlist_head to use as a loop cursor.

head

the head for your list.

list_for_each_safe

list_for_each_safe(pos,n,head)

iterate over a list safe against removal of list entry

Parameters

pos

thestructlist_head to use as a loop cursor.

n

anotherstructlist_head to use as temporary storage

head

the head for your list.

list_for_each_prev_safe

list_for_each_prev_safe(pos,n,head)

iterate over a list backwards safe against removal of list entry

Parameters

pos

thestructlist_head to use as a loop cursor.

n

anotherstructlist_head to use as temporary storage

head

the head for your list.

size_tlist_count_nodes(structlist_head*head)

count nodes in the list

Parameters

structlist_head*head

the head for your list.

list_entry_is_head

list_entry_is_head(pos,head,member)

test if the entry points to the head of the list

Parameters

pos

the type * to cursor

head

the head for your list.

member

the name of the list_head within the struct.

list_for_each_entry

list_for_each_entry(pos,head,member)

iterate over list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_head within the struct.

list_for_each_entry_reverse

list_for_each_entry_reverse(pos,head,member)

iterate backwards over list of given type.

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_head within the struct.

list_prepare_entry

list_prepare_entry(pos,head,member)

prepare a pos entry for use inlist_for_each_entry_continue()

Parameters

pos

the type * to use as a start point

head

the head of the list

member

the name of the list_head within the struct.

Description

Prepares a pos entry for use as a start point inlist_for_each_entry_continue().

list_for_each_entry_continue

list_for_each_entry_continue(pos,head,member)

continue iteration over list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_head within the struct.

Description

Continue to iterate over list of given type, continuing afterthe current position.

list_for_each_entry_continue_reverse

list_for_each_entry_continue_reverse(pos,head,member)

iterate backwards from the given point

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_head within the struct.

Description

Start to iterate over list of given type backwards, continuing afterthe current position.

list_for_each_entry_from

list_for_each_entry_from(pos,head,member)

iterate over list of given type from the current point

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_head within the struct.

Description

Iterate over list of given type, continuing from current position.

list_for_each_entry_from_reverse

list_for_each_entry_from_reverse(pos,head,member)

iterate backwards over list of given type from the current point

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_head within the struct.

Description

Iterate backwards over list of given type, continuing from current position.

list_for_each_entry_safe

list_for_each_entry_safe(pos,n,head,member)

iterate over list of given type safe against removal of list entry

Parameters

pos

the type * to use as a loop cursor.

n

another type * to use as temporary storage

head

the head for your list.

member

the name of the list_head within the struct.

list_for_each_entry_safe_continue

list_for_each_entry_safe_continue(pos,n,head,member)

continue list iteration safe against removal

Parameters

pos

the type * to use as a loop cursor.

n

another type * to use as temporary storage

head

the head for your list.

member

the name of the list_head within the struct.

Description

Iterate over list of given type, continuing after current point,safe against removal of list entry.

list_for_each_entry_safe_from

list_for_each_entry_safe_from(pos,n,head,member)

iterate over list from current point safe against removal

Parameters

pos

the type * to use as a loop cursor.

n

another type * to use as temporary storage

head

the head for your list.

member

the name of the list_head within the struct.

Description

Iterate over list of given type from current point, safe againstremoval of list entry.

list_for_each_entry_safe_reverse

list_for_each_entry_safe_reverse(pos,n,head,member)

iterate backwards over list safe against removal

Parameters

pos

the type * to use as a loop cursor.

n

another type * to use as temporary storage

head

the head for your list.

member

the name of the list_head within the struct.

Description

Iterate backwards over list of given type, safe against removalof list entry.

list_safe_reset_next

list_safe_reset_next(pos,n,member)

reset a stale list_for_each_entry_safe loop

Parameters

pos

the loop cursor used in the list_for_each_entry_safe loop

n

temporary storage used in list_for_each_entry_safe

member

the name of the list_head within the struct.

Description

list_safe_reset_next is not safe to use in general if the list may bemodified concurrently (eg. the lock is dropped in the loop body). Anexception to this is if the cursor element (pos) is pinned in the list,and list_safe_reset_next is called after re-taking the lock and beforecompleting the current iteration of the loop body.

inthlist_unhashed(conststructhlist_node*h)

Has node been removed from list and reinitialized?

Parameters

conststructhlist_node*h

Node to be checked

Description

Not that not all removal functions will leave a node in unhashedstate. For example,hlist_nulls_del_init_rcu() does leave thenode in unhashed state, but hlist_nulls_del() does not.

inthlist_unhashed_lockless(conststructhlist_node*h)

Version of hlist_unhashed for lockless use

Parameters

conststructhlist_node*h

Node to be checked

Description

This variant ofhlist_unhashed() must be used in lockless contextsto avoid potential load-tearing. The READ_ONCE() is paired with thevarious WRITE_ONCE() in hlist helpers that are defined below.

inthlist_empty(conststructhlist_head*h)

Is the specified hlist_head structure an empty hlist?

Parameters

conststructhlist_head*h

Structure to check.

voidhlist_del(structhlist_node*n)

Delete the specified hlist_node from its list

Parameters

structhlist_node*n

Node to delete.

Description

Note that this function leaves the node in hashed state. Usehlist_del_init() or similar instead to unhashn.

voidhlist_del_init(structhlist_node*n)

Delete the specified hlist_node from its list and initialize

Parameters

structhlist_node*n

Node to delete.

Description

Note that this function leaves the node in unhashed state.

voidhlist_add_head(structhlist_node*n,structhlist_head*h)

add a new entry at the beginning of the hlist

Parameters

structhlist_node*n

new entry to be added

structhlist_head*h

hlist head to add it after

Description

Insert a new entry after the specified head.This is good for implementing stacks.

voidhlist_add_before(structhlist_node*n,structhlist_node*next)

add a new entry before the one specified

Parameters

structhlist_node*n

new entry to be added

structhlist_node*next

hlist node to add it before, which must be non-NULL

voidhlist_add_behind(structhlist_node*n,structhlist_node*prev)

add a new entry after the one specified

Parameters

structhlist_node*n

new entry to be added

structhlist_node*prev

hlist node to add it after, which must be non-NULL

voidhlist_add_fake(structhlist_node*n)

create a fake hlist consisting of a single headless node

Parameters

structhlist_node*n

Node to make a fake list out of

Description

This makesn appear to be its own predecessor on a headless hlist.The point of this is to allow things likehlist_del() to work correctlyin cases where there is no list.

boolhlist_fake(structhlist_node*h)

Is this node a fake hlist?

Parameters

structhlist_node*h

Node to check for being a self-referential fake hlist.

boolhlist_is_singular_node(structhlist_node*n,structhlist_head*h)

is node the only element of the specified hlist?

Parameters

structhlist_node*n

Node to check for singularity.

structhlist_head*h

Header for potentially singular list.

Description

Check whether the node is the only node of the head withoutaccessing head, thus avoiding unnecessary cache misses.

voidhlist_move_list(structhlist_head*old,structhlist_head*new)

Move an hlist

Parameters

structhlist_head*old

hlist_head for old list.

structhlist_head*new

hlist_head for new list.

Description

Move a list from one list head to another. Fixup the pprevreference of the first entry if it exists.

voidhlist_splice_init(structhlist_head*from,structhlist_node*last,structhlist_head*to)

move all entries from one list to another

Parameters

structhlist_head*from

hlist_head from which entries will be moved

structhlist_node*last

last entry on thefrom list

structhlist_head*to

hlist_head to which entries will be moved

Description

to can be empty,from must contain at leastlast.

hlist_for_each_entry

hlist_for_each_entry(pos,head,member)

iterate over list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the hlist_node within the struct.

hlist_for_each_entry_continue

hlist_for_each_entry_continue(pos,member)

iterate over a hlist continuing after current point

Parameters

pos

the type * to use as a loop cursor.

member

the name of the hlist_node within the struct.

hlist_for_each_entry_from

hlist_for_each_entry_from(pos,member)

iterate over a hlist continuing from current point

Parameters

pos

the type * to use as a loop cursor.

member

the name of the hlist_node within the struct.

hlist_for_each_entry_safe

hlist_for_each_entry_safe(pos,n,head,member)

iterate over list of given type safe against removal of list entry

Parameters

pos

the type * to use as a loop cursor.

n

astructhlist_node to use as temporary storage

head

the head for your list.

member

the name of the hlist_node within the struct.

size_thlist_count_nodes(structhlist_head*head)

count nodes in the hlist

Parameters

structhlist_head*head

the head for your hlist.

Basic C Library Functions

When writing drivers, you cannot in general use routines which are fromthe C Library. Some of the functions have been found generally usefuland they are listed below. The behaviour of these functions may varyslightly from those defined by ANSI, and these deviations are noted inthe text.

String Conversions

unsignedlonglongsimple_strtoull(constchar*cp,char**endp,unsignedintbase)

convert a string to an unsigned long long

Parameters

constchar*cp

The start of the string

char**endp

A pointer to the end of the parsed string will be placed here

unsignedintbase

The number base to use

Description

This function has caveats. Please use kstrtoull instead.

unsignedlongsimple_strtoul(constchar*cp,char**endp,unsignedintbase)

convert a string to an unsigned long

Parameters

constchar*cp

The start of the string

char**endp

A pointer to the end of the parsed string will be placed here

unsignedintbase

The number base to use

Description

This function has caveats. Please use kstrtoul instead.

longsimple_strtol(constchar*cp,char**endp,unsignedintbase)

convert a string to a signed long

Parameters

constchar*cp

The start of the string

char**endp

A pointer to the end of the parsed string will be placed here

unsignedintbase

The number base to use

Description

This function has caveats. Please use kstrtol instead.

longlongsimple_strtoll(constchar*cp,char**endp,unsignedintbase)

convert a string to a signed long long

Parameters

constchar*cp

The start of the string

char**endp

A pointer to the end of the parsed string will be placed here

unsignedintbase

The number base to use

Description

This function has caveats. Please use kstrtoll instead.

intvsnprintf(char*buf,size_tsize,constchar*fmt_str,va_listargs)

Format a string and place it in a buffer

Parameters

char*buf

The buffer to place the result into

size_tsize

The size of the buffer, including the trailing null space

constchar*fmt_str

The format string to use

va_listargs

Arguments for the format string

Description

This function generally follows C99 vsnprintf, but has someextensions and a few limitations:

  • ``n`` is unsupported

  • ``p*`` is handled by pointer()

See pointer() orHow to get printk format specifiers right for moreextensive description.

Please update the documentation in both places when making changes

The return value is the number of characters which wouldbe generated for the given input, excluding the trailing‘0’, as per ISO C99. If you want to have the exactnumber of characters written intobuf as return value(not including the trailing ‘0’), usevscnprintf(). If thereturn is greater than or equal tosize, the resultingstring is truncated.

If you’re not already dealing with a va_list consider usingsnprintf().

intvscnprintf(char*buf,size_tsize,constchar*fmt,va_listargs)

Format a string and place it in a buffer

Parameters

char*buf

The buffer to place the result into

size_tsize

The size of the buffer, including the trailing null space

constchar*fmt

The format string to use

va_listargs

Arguments for the format string

Description

The return value is the number of characters which have been written intothebuf not including the trailing ‘0’. Ifsize is == 0 the functionreturns 0.

If you’re not already dealing with a va_list consider usingscnprintf().

See thevsnprintf() documentation for format string extensions over C99.

intsnprintf(char*buf,size_tsize,constchar*fmt,...)

Format a string and place it in a buffer

Parameters

char*buf

The buffer to place the result into

size_tsize

The size of the buffer, including the trailing null space

constchar*fmt

The format string to use

...

Arguments for the format string

Description

The return value is the number of characters which would begenerated for the given input, excluding the trailing null,as per ISO C99. If the return is greater than or equal tosize, the resulting string is truncated.

See thevsnprintf() documentation for format string extensions over C99.

intscnprintf(char*buf,size_tsize,constchar*fmt,...)

Format a string and place it in a buffer

Parameters

char*buf

The buffer to place the result into

size_tsize

The size of the buffer, including the trailing null space

constchar*fmt

The format string to use

...

Arguments for the format string

Description

The return value is the number of characters written intobuf not includingthe trailing ‘0’. Ifsize is == 0 the function returns 0.

intvsprintf(char*buf,constchar*fmt,va_listargs)

Format a string and place it in a buffer

Parameters

char*buf

The buffer to place the result into

constchar*fmt

The format string to use

va_listargs

Arguments for the format string

Description

The function returns the number of characters writtenintobuf. Usevsnprintf() orvscnprintf() in order to avoidbuffer overflows.

If you’re not already dealing with a va_list consider usingsprintf().

See thevsnprintf() documentation for format string extensions over C99.

intsprintf(char*buf,constchar*fmt,...)

Format a string and place it in a buffer

Parameters

char*buf

The buffer to place the result into

constchar*fmt

The format string to use

...

Arguments for the format string

Description

The function returns the number of characters writtenintobuf. Usesnprintf() orscnprintf() in order to avoidbuffer overflows.

See thevsnprintf() documentation for format string extensions over C99.

intvbin_printf(u32*bin_buf,size_tsize,constchar*fmt_str,va_listargs)

Parse a format string and place args’ binary value in a buffer

Parameters

u32*bin_buf

The buffer to place args’ binary value

size_tsize

The size of the buffer(by words(32bits), not characters)

constchar*fmt_str

The format string to use

va_listargs

Arguments for the format string

Description

The format follows C99 vsnprintf, exceptn is ignored, and its argumentis skipped.

The return value is the number of words(32bits) which would be generated forthe given input.

NOTE

If the return value is greater thansize, the resulting bin_buf is NOTvalid forbstr_printf().

intbstr_printf(char*buf,size_tsize,constchar*fmt_str,constu32*bin_buf)

Format a string from binary arguments and place it in a buffer

Parameters

char*buf

The buffer to place the result into

size_tsize

The size of the buffer, including the trailing null space

constchar*fmt_str

The format string to use

constu32*bin_buf

Binary arguments for the format string

Description

This function like C99 vsnprintf, but the difference is that vsnprintf getsarguments from stack, and bstr_printf gets arguments frombin_buf which isa binary buffer that generated by vbin_printf.

The format follows C99 vsnprintf, but has some extensions:

see vsnprintf comment for details.

The return value is the number of characters which wouldbe generated for the given input, excluding the trailing‘0’, as per ISO C99. If you want to have the exactnumber of characters written intobuf as return value(not including the trailing ‘0’), usevscnprintf(). If thereturn is greater than or equal tosize, the resultingstring is truncated.

intvsscanf(constchar*buf,constchar*fmt,va_listargs)

Unformat a buffer into a list of arguments

Parameters

constchar*buf

input buffer

constchar*fmt

format of buffer

va_listargs

arguments

intsscanf(constchar*buf,constchar*fmt,...)

Unformat a buffer into a list of arguments

Parameters

constchar*buf

input buffer

constchar*fmt

formatting of buffer

...

resulting arguments

intkstrtoul(constchar*s,unsignedintbase,unsignedlong*res)

convert a string to an unsigned long

Parameters

constchar*s

The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.

unsignedintbase

The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.

unsignedlong*res

Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoul(). Return code must be checked.

intkstrtol(constchar*s,unsignedintbase,long*res)

convert a string to a long

Parameters

constchar*s

The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.

unsignedintbase

The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.

long*res

Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtol(). Return code must be checked.

intkstrtoull(constchar*s,unsignedintbase,unsignedlonglong*res)

convert a string to an unsigned long long

Parameters

constchar*s

The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.

unsignedintbase

The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.

unsignedlonglong*res

Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoull(). Return code must be checked.

intkstrtoll(constchar*s,unsignedintbase,longlong*res)

convert a string to a long long

Parameters

constchar*s

The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.

unsignedintbase

The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.

longlong*res

Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoll(). Return code must be checked.

intkstrtouint(constchar*s,unsignedintbase,unsignedint*res)

convert a string to an unsigned int

Parameters

constchar*s

The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.

unsignedintbase

The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.

unsignedint*res

Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtoul(). Return code must be checked.

intkstrtoint(constchar*s,unsignedintbase,int*res)

convert a string to an int

Parameters

constchar*s

The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.

unsignedintbase

The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.

int*res

Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Preferred oversimple_strtol(). Return code must be checked.

intkstrtobool(constchar*s,bool*res)

convert common user inputs into boolean values

Parameters

constchar*s

input string

bool*res

result

Description

This routine returns 0 iff the first character is one of ‘YyTt1NnFf0’, or[oO][NnFf] for “on” and “off”. Otherwise it will return -EINVAL. Valuepointed to by res is updated upon finding a match.

intstring_get_size(u64size,u64blk_size,constenumstring_size_unitsunits,char*buf,intlen)

get the size in the specified units

Parameters

u64size

The size to be converted in blocks

u64blk_size

Size of the block (use 1 for size in bytes)

constenumstring_size_unitsunits

Units to use (powers of 1000 or 1024), whether to include space separator

char*buf

buffer to format to

intlen

length of buffer

Description

This function returns a string formatted to 3 significant figuresgiving the size in the required units.buf should have room forat least 9 bytes and will always be zero terminated.

Return value: number of characters of output that would have been written(which may be greater than len, if output was truncated).

intparse_int_array_user(constchar__user*from,size_tcount,int**array)

Split string into a sequence of integers

Parameters

constchar__user*from

The user space buffer to read from

size_tcount

The maximum number of bytes to read

int**array

Returned pointer to sequence of integers

Description

On successarray is allocated and initialized with a sequence ofintegers extracted from thefrom plus an additional element thatbegins the sequence and specifies the integers count.

Caller takes responsibility for freeingarray when it is no longerneeded.

intstring_unescape(char*src,char*dst,size_tsize,unsignedintflags)

unquote characters in the given string

Parameters

char*src

source buffer (escaped)

char*dst

destination buffer (unescaped)

size_tsize

size of the destination buffer (0 to unlimit)

unsignedintflags

combination of the flags.

Description

The function unquotes characters in the given string.

Because the size of the output will be the same as or less than the size ofthe input, the transformation may be performed in place.

Caller must provide valid source and destination pointers. Be aware thatdestination buffer will always be NULL-terminated. Source string must beNULL-terminated as well. The supported flags are:

UNESCAPE_SPACE:        '\f' - form feed        '\n' - new line        '\r' - carriage return        '\t' - horizontal tab        '\v' - vertical tabUNESCAPE_OCTAL:        '\NNN' - byte with octal value NNN (1 to 3 digits)UNESCAPE_HEX:        '\xHH' - byte with hexadecimal value HH (1 to 2 digits)UNESCAPE_SPECIAL:        '\"' - double quote        '\\' - backslash        '\a' - alert (BEL)        '\e' - escapeUNESCAPE_ANY:        all previous together

Return

The amount of the characters processed to the destination buffer excludingtrailing ‘0’ is returned.

intstring_escape_mem(constchar*src,size_tisz,char*dst,size_tosz,unsignedintflags,constchar*only)

quote characters in the given memory buffer

Parameters

constchar*src

source buffer (unescaped)

size_tisz

source buffer size

char*dst

destination buffer (escaped)

size_tosz

destination buffer size

unsignedintflags

combination of the flags

constchar*only

NULL-terminated string containing characters used to limitthe selected escape class. If characters are included inonlythat would not normally be escaped by the classes selectedinflags, they will be copied todst unescaped.

Description

The process of escaping byte buffer includes several parts. They are appliedin the following sequence.

  1. The character is not matched to the one fromonly string and thusmust go as-is to the output.

  2. The character is matched to the printable and ASCII classes, if asked,and in case of match it passes through to the output.

  3. The character is matched to the printable or ASCII class, if asked,and in case of match it passes through to the output.

  4. The character is checked if it falls into the class given byflags.ESCAPE_OCTAL andESCAPE_HEX are going last since they cover anycharacter. Note that they actually can’t go together, otherwiseESCAPE_HEX will be ignored.

Caller must provide valid source and destination pointers. Be aware thatdestination buffer will not be NULL-terminated, thus caller have to appendit if needs. The supported flags are:

%ESCAPE_SPACE: (special white space, not space itself)        '\f' - form feed        '\n' - new line        '\r' - carriage return        '\t' - horizontal tab        '\v' - vertical tab%ESCAPE_SPECIAL:        '\"' - double quote        '\\' - backslash        '\a' - alert (BEL)        '\e' - escape%ESCAPE_NULL:        '\0' - null%ESCAPE_OCTAL:        '\NNN' - byte with octal value NNN (3 digits)%ESCAPE_ANY:        all previous together%ESCAPE_NP:        escape only non-printable characters, checked by isprint()%ESCAPE_ANY_NP:        all previous together%ESCAPE_HEX:        '\xHH' - byte with hexadecimal value HH (2 digits)%ESCAPE_NA:        escape only non-ascii characters, checked by isascii()%ESCAPE_NAP:        escape only non-printable or non-ascii characters%ESCAPE_APPEND:        append characters from @only to be escaped by the given classes

ESCAPE_APPEND would help to pass additional characters to the escaped, whenone ofESCAPE_NP,ESCAPE_NA, orESCAPE_NAP is provided.

One notable caveat, theESCAPE_NAP,ESCAPE_NP andESCAPE_NA have thehigher priority than the rest of the flags (ESCAPE_NAP is the highest).It doesn’t make much sense to use either of them withoutESCAPE_OCTALorESCAPE_HEX, because they cover most of the other character classes.ESCAPE_NAP can utilizeESCAPE_SPACE orESCAPE_SPECIAL in addition tothe above.

Return

The total size of the escaped output that would be generated forthe given input and flags. To check whether the output wastruncated, compare the return value to osz. There is room left indst for a ‘0’ terminator if and only if ret < osz.

char**kasprintf_strarray(gfp_tgfp,constchar*prefix,size_tn)

allocate and fill array of sequential strings

Parameters

gfp_tgfp

flags for the slab allocator

constchar*prefix

prefix to be used

size_tn

amount of lines to be allocated and filled

Description

Allocates and fillsn strings using pattern “s-````zu”, where prefixis provided by caller. The caller is responsible to free them withkfree_strarray() after use.

Returns array of strings or NULL when memory can’t be allocated.

voidkfree_strarray(char**array,size_tn)

free a number of dynamically allocated strings contained in an array and the array itself

Parameters

char**array

Dynamically allocated array of strings to free.

size_tn

Number of strings (starting from the beginning of the array) to free.

Description

Passing a non-NULLarray andn == 0 as well as NULLarray are validuse-cases. Ifarray is NULL, the function does nothing.

char*skip_spaces(constchar*str)

Removes leading whitespace fromstr.

Parameters

constchar*str

The string to be stripped.

Description

Returns a pointer to the first non-whitespace character instr.

char*strim(char*s)

Removes leading and trailing whitespace froms.

Parameters

char*s

The string to be stripped.

Description

Note that the first trailing whitespace is replaced with aNUL-terminatorin the given strings. Returns a pointer to the first non-whitespacecharacter ins.

boolsysfs_streq(constchar*s1,constchar*s2)

return true if strings are equal, modulo trailing newline

Parameters

constchar*s1

one string

constchar*s2

another string

Description

This routine returns true iff two strings are equal, treating bothNUL and newline-then-NUL as equivalent string terminations. It’sgeared for use with sysfs input strings, which generally terminatewith newlines but are compared against values without newlines.

intmatch_string(constchar*const*array,size_tn,constchar*string)

matches given string in an array

Parameters

constchar*const*array

array of strings

size_tn

number of strings in the array or -1 for NULL terminated arrays

constchar*string

string to match with

Description

This routine will look for a string in an array of strings up to then-th element in the array or until the first NULL element.

Historically the value of -1 forn, was used to search in arrays thatare NULL terminated. However, the function does not make a distinctionwhen finishing the search: eithern elements have been compared ORthe first NULL element was found.

Return

index of astring in thearray if matches, or-EINVAL otherwise.

int__sysfs_match_string(constchar*const*array,size_tn,constchar*str)

matches given string in an array

Parameters

constchar*const*array

array of strings

size_tn

number of strings in the array or -1 for NULL terminated arrays

constchar*str

string to match with

Description

Returns index ofstr in thearray or -EINVAL, just likematch_string().Uses sysfs_streq instead of strcmp for matching.

This routine will look for a string in an array of strings up to then-th element in the array or until the first NULL element.

Historically the value of -1 forn, was used to search in arrays thatare NULL terminated. However, the function does not make a distinctionwhen finishing the search: eithern elements have been compared ORthe first NULL element was found.

char*strreplace(char*str,charold,charnew)

Replace all occurrences of character in string.

Parameters

char*str

The string to operate on.

charold

The character being replaced.

charnew

The characterold is replaced with.

Description

Replaces the eachold character with anew one in the given stringstr.

Return

pointer to the stringstr itself.

voidmemcpy_and_pad(void*dest,size_tdest_len,constvoid*src,size_tcount,intpad)

Copy one buffer to another with padding

Parameters

void*dest

Where to copy to

size_tdest_len

The destination buffer size

constvoid*src

Where to copy from

size_tcount

The number of bytes to copy

intpad

Character to use for padding if space is left in destination.

String Manipulation

unsafe_memcpy

unsafe_memcpy(dst,src,bytes,justification)

memcpy implementation with no FORTIFY bounds checking

Parameters

dst

Destination memory address to write to

src

Source memory address to read from

bytes

How many bytes to write todst fromsrc

justification

Free-form text or comment describing why the use is needed

Description

This should be used for corner cases where the compiler cannot do theright thing, or during transitions between APIs, etc. It should be usedvery rarely, and includes a place for justification detailing where boundschecking has happened, and why existing solutions cannot be employed.

char*strncpy(char*constp,constchar*q,__kernel_size_tsize)

Copy a string to memory with non-guaranteed NUL padding

Parameters

char*constp

pointer to destination of copy

constchar*q

pointer to NUL-terminated source string to copy

__kernel_size_tsize

bytes to write atp

Description

If strlen(q) >=size, the copy ofq will stop aftersize bytes,andp will NOT be NUL-terminated

If strlen(q) <size, following the copy ofq, trailing NUL byteswill be written top untilsize total bytes have been written.

Do not use this function. While FORTIFY_SOURCE tries to avoidover-reads ofq, it cannot defend against writing unterminatedresults top. Usingstrncpy() remains ambiguous and fragile.Instead, please choose an alternative, so that the expectationofp’s contents is unambiguous:

p needs to be:

padded tosize

not padded

NUL-terminated

strscpy_pad()

strscpy()

not NUL-terminated

strtomem_pad()

strtomem()

Note strscpy*()’s differing return values for detecting truncation,and strtomem*()’s expectation that the destination is marked with__nonstring when it is a character array.

__kernel_size_tstrnlen(constchar*constp,__kernel_size_tmaxlen)

Return bounded count of characters in a NUL-terminated string

Parameters

constchar*constp

pointer to NUL-terminated string to count.

__kernel_size_tmaxlen

maximum number of characters to count.

Description

Returns number of characters inp (NOT including the final NUL), ormaxlen, if no NUL has been found up to there.

strlen

strlen(p)

Return count of characters in a NUL-terminated string

Parameters

p

pointer to NUL-terminated string to count.

Description

Do not use this function unless the string length is known atcompile-time. Whenp is unterminated, this function may crashor return unexpected counts that could lead to memory contentexposures. Preferstrnlen().

Returns number of characters inp (NOT including the final NUL).

size_tstrlcat(char*constp,constchar*constq,size_tavail)

Append a string to an existing string

Parameters

char*constp

pointer toNUL-terminated string to append to

constchar*constq

pointer toNUL-terminated string to append from

size_tavail

Maximum bytes available inp

Description

AppendsNUL-terminated stringq after theNUL-terminatedstring atp, but will not write beyondavail bytes total,potentially truncating the copy fromq.p will stayNUL-terminated only if aNUL already existed withintheavail bytes ofp. If so, the resulting number ofbytes copied fromq will be at most “avail - strlen(p) - 1”.

Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when the sizesofp andq are known to the compiler. Prefer building thestring with formatting, viascnprintf(), seq_buf, or similar.

Returns total bytes that _would_ have been contained bypregardless of truncation, similar tosnprintf(). If returnvalue is >=avail, the string has been truncated.

char*strcat(char*constp,constchar*q)

Append a string to an existing string

Parameters

char*constp

pointer to NUL-terminated string to append to

constchar*q

pointer to NUL-terminated source string to append from

Description

Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when thedestination buffer size is known to the compiler. Preferbuilding the string with formatting, viascnprintf() or similar.At the very least, usestrncat().

Returnsp.

char*strncat(char*constp,constchar*constq,__kernel_size_tcount)

Append a string to an existing string

Parameters

char*constp

pointer to NUL-terminated string to append to

constchar*constq

pointer to source string to append from

__kernel_size_tcount

Maximum bytes to read fromq

Description

Appends at mostcount bytes fromq (stopping at the firstNUL byte) after the NUL-terminated string atp.p will beNUL-terminated.

Do not use this function. While FORTIFY_SOURCE tries to avoidread and write overflows, this is only possible when the sizesofp andq are known to the compiler. Prefer building thestring with formatting, viascnprintf() or similar.

Returnsp.

char*strcpy(char*constp,constchar*constq)

Copy a string into another string buffer

Parameters

char*constp

pointer to destination of copy

constchar*constq

pointer to NUL-terminated source string to copy

Description

Do not use this function. While FORTIFY_SOURCE tries to avoidoverflows, this is only possible when the sizes ofq andp areknown to the compiler. Preferstrscpy(), though note its differentreturn values for detecting truncation.

Returnsp.

intstrncasecmp(constchar*s1,constchar*s2,size_tlen)

Case insensitive, length-limited string comparison

Parameters

constchar*s1

One string

constchar*s2

The other string

size_tlen

the maximum number of characters to compare

char*stpcpy(char*__restrict__dest,constchar*__restrict__src)

copy a string from src to dest returning a pointer to the new end of dest, including src’sNUL-terminator. May overrun dest.

Parameters

char*__restrict__dest

pointer to end of string being copied into. Must be large enoughto receive copy.

constchar*__restrict__src

pointer to the beginning of string being copied from. Must not overlapdest.

Description

stpcpy differs from strcpy in a key way: the return value is a pointerto the newNUL-terminating character indest. (For strcpy, the returnvalue is a pointer to the start ofdest). This interface is consideredunsafe as it doesn’t perform bounds checking of the inputs. As such it’snot recommended for usage. Instead, its definition is provided in casethe compiler lowers other libcalls to stpcpy.

intstrcmp(constchar*cs,constchar*ct)

Compare two strings

Parameters

constchar*cs

One string

constchar*ct

Another string

intstrncmp(constchar*cs,constchar*ct,size_tcount)

Compare two length-limited strings

Parameters

constchar*cs

One string

constchar*ct

Another string

size_tcount

The maximum number of bytes to compare

char*strchr(constchar*s,intc)

Find the first occurrence of a character in a string

Parameters

constchar*s

The string to be searched

intc

The character to search for

Description

Note that theNUL-terminator is considered part of the string, and canbe searched for.

char*strchrnul(constchar*s,intc)

Find and return a character in a string, or end of string

Parameters

constchar*s

The string to be searched

intc

The character to search for

Description

Returns pointer to first occurrence of ‘c’ in s. If c is not found, thenreturn a pointer to the null byte at the end of s.

char*strrchr(constchar*s,intc)

Find the last occurrence of a character in a string

Parameters

constchar*s

The string to be searched

intc

The character to search for

char*strnchr(constchar*s,size_tcount,intc)

Find a character in a length limited string

Parameters

constchar*s

The string to be searched

size_tcount

The number of characters to be searched

intc

The character to search for

Description

Note that theNUL-terminator is considered part of the string, and canbe searched for.

size_tstrspn(constchar*s,constchar*accept)

Calculate the length of the initial substring ofs which only contain letters inaccept

Parameters

constchar*s

The string to be searched

constchar*accept

The string to search for

size_tstrcspn(constchar*s,constchar*reject)

Calculate the length of the initial substring ofs which does not contain letters inreject

Parameters

constchar*s

The string to be searched

constchar*reject

The string to avoid

char*strpbrk(constchar*cs,constchar*ct)

Find the first occurrence of a set of characters

Parameters

constchar*cs

The string to be searched

constchar*ct

The characters to search for

char*strsep(char**s,constchar*ct)

Split a string into tokens

Parameters

char**s

The string to be searched

constchar*ct

The characters to search for

Description

strsep() updatess to point after the token, ready for the next call.

It returns empty tokens, too, behaving exactly like the libc functionof that name. In fact, it was stolen from glibc2 and de-fancy-fied.Same semantics, slimmer shape. ;)

void*memset(void*s,intc,size_tcount)

Fill a region of memory with the given value

Parameters

void*s

Pointer to the start of the area.

intc

The byte to fill the area with

size_tcount

The size of the area.

Description

Do not usememset() to access IO space, use memset_io() instead.

void*memset16(uint16_t*s,uint16_tv,size_tcount)

Fill a memory area with a uint16_t

Parameters

uint16_t*s

Pointer to the start of the area.

uint16_tv

The value to fill the area with

size_tcount

The number of values to store

Description

Differs frommemset() in that it fills with a uint16_t insteadof a byte. Remember thatcount is the number of uint16_ts tostore, not the number of bytes.

void*memset32(uint32_t*s,uint32_tv,size_tcount)

Fill a memory area with a uint32_t

Parameters

uint32_t*s

Pointer to the start of the area.

uint32_tv

The value to fill the area with

size_tcount

The number of values to store

Description

Differs frommemset() in that it fills with a uint32_t insteadof a byte. Remember thatcount is the number of uint32_ts tostore, not the number of bytes.

void*memset64(uint64_t*s,uint64_tv,size_tcount)

Fill a memory area with a uint64_t

Parameters

uint64_t*s

Pointer to the start of the area.

uint64_tv

The value to fill the area with

size_tcount

The number of values to store

Description

Differs frommemset() in that it fills with a uint64_t insteadof a byte. Remember thatcount is the number of uint64_ts tostore, not the number of bytes.

void*memcpy(void*dest,constvoid*src,size_tcount)

Copy one area of memory to another

Parameters

void*dest

Where to copy to

constvoid*src

Where to copy from

size_tcount

The size of the area.

Description

You should not use this function to access IO space, use memcpy_toio()or memcpy_fromio() instead.

void*memmove(void*dest,constvoid*src,size_tcount)

Copy one area of memory to another

Parameters

void*dest

Where to copy to

constvoid*src

Where to copy from

size_tcount

The size of the area.

Description

Unlikememcpy(),memmove() copes with overlapping areas.

__visibleintmemcmp(constvoid*cs,constvoid*ct,size_tcount)

Compare two areas of memory

Parameters

constvoid*cs

One area of memory

constvoid*ct

Another area of memory

size_tcount

The size of the area.

intbcmp(constvoid*a,constvoid*b,size_tlen)

returns 0 if and only if the buffers have identical contents.

Parameters

constvoid*a

pointer to first buffer.

constvoid*b

pointer to second buffer.

size_tlen

size of buffers.

Description

The sign or magnitude of a non-zero return value has no particularmeaning, and architectures may implement their own more efficientbcmp(). Sowhile this particular implementation is a simple (tail) call to memcmp, donot rely on anything but whether the return value is zero or non-zero.

void*memscan(void*addr,intc,size_tsize)

Find a character in an area of memory.

Parameters

void*addr

The memory area

intc

The byte to search for

size_tsize

The size of the area.

Description

returns the address of the first occurrence ofc, or 1 byte pastthe area ifc is not found

char*strstr(constchar*s1,constchar*s2)

Find the first substring in aNUL terminated string

Parameters

constchar*s1

The string to be searched

constchar*s2

The string to search for

char*strnstr(constchar*s1,constchar*s2,size_tlen)

Find the first substring in a length-limited string

Parameters

constchar*s1

The string to be searched

constchar*s2

The string to search for

size_tlen

the maximum number of characters to search

void*memchr(constvoid*s,intc,size_tn)

Find a character in an area of memory.

Parameters

constvoid*s

The memory area

intc

The byte to search for

size_tn

The size of the area.

Description

returns the address of the first occurrence ofc, orNULLifc is not found

void*memchr_inv(constvoid*start,intc,size_tbytes)

Find an unmatching character in an area of memory.

Parameters

constvoid*start

The memory area

intc

Find a character other than c

size_tbytes

The size of the area.

Description

returns the address of the first character other thanc, orNULLif the whole buffer contains justc.

void*memdup_array_user(constvoid__user*src,size_tn,size_tsize)

duplicate array from user space

Parameters

constvoid__user*src

source address in user space

size_tn

number of array members to copy

size_tsize

size of one array member

Return

anERR_PTR() on failure. Result is physicallycontiguous, to be freed bykfree().

void*vmemdup_array_user(constvoid__user*src,size_tn,size_tsize)

duplicate array from user space

Parameters

constvoid__user*src

source address in user space

size_tn

number of array members to copy

size_tsize

size of one array member

Return

anERR_PTR() on failure. Result may be notphysically contiguous. Usekvfree() to free.

strscpy

strscpy(dst,src,...)

Copy a C-string into a sized buffer

Parameters

dst

Where to copy the string to

src

Where to copy the string from

...

Size of destination buffer (optional)

Description

Copy the source stringsrc, or as much of it as fits, into thedestinationdst buffer. The behavior is undefined if the stringbuffers overlap. The destinationdst buffer is always NUL terminated,unless it’s zero-sized.

The size argument... is only required whendst is not an array, orwhen the copy needs to be smaller than sizeof(dst).

Preferred tostrncpy() since it always returns a valid string, anddoesn’t unnecessarily force the tail of the destination buffer to bezero padded. If padding is desired please usestrscpy_pad().

Returns the number of characters copied indst (not including thetrailingNUL) or -E2BIG ifsize is 0 or the copy fromsrc wastruncated.

strscpy_pad

strscpy_pad(dst,src,...)

Copy a C-string into a sized buffer

Parameters

dst

Where to copy the string to

src

Where to copy the string from

...

Size of destination buffer

Description

Copy the string, or as much of it as fits, into the dest buffer. Thebehavior is undefined if the string buffers overlap. The destinationbuffer is alwaysNUL terminated, unless it’s zero-sized.

If the source string is shorter than the destination buffer, theremaining bytes in the buffer will be filled withNUL bytes.

For full explanation of why you may want to consider using the‘strscpy’ functions please see the function docstring forstrscpy().

Return

  • The number of characters copied (not including the trailingNULs)

  • -E2BIG if count is 0 orsrc was truncated.

boolmem_is_zero(constvoid*s,size_tn)

Check if an area of memory is all 0’s.

Parameters

constvoid*s

The memory area

size_tn

The size of the area

Return

True if the area of memory is all 0’s.

sysfs_match_string

sysfs_match_string(_a,_s)

matches given string in an array

Parameters

_a

array of strings

_s

string to match with

Description

Helper for__sysfs_match_string(). Calculates the size ofa automatically.

boolstrstarts(constchar*str,constchar*prefix)

doesstr start withprefix?

Parameters

constchar*str

string to examine

constchar*prefix

prefix to look for.

voidmemzero_explicit(void*s,size_tcount)

Fill a region of memory (e.g. sensitive keying data) with 0s.

Parameters

void*s

Pointer to the start of the area.

size_tcount

The size of the area.

Note

usually usingmemset() is just fine (!), but in caseswhere clearing out _local_ data at the end of a scope isnecessary,memzero_explicit() should be used instead inorder to prevent the compiler from optimising away zeroing.

Description

memzero_explicit() doesn’t need an arch-specific version asit just invokes the one ofmemset() implicitly.

constchar*kbasename(constchar*path)

return the last part of a pathname.

Parameters

constchar*path

path to extract the filename from.

strtomem_pad

strtomem_pad(dest,src,pad)

Copy NUL-terminated string to non-NUL-terminated buffer

Parameters

dest

Pointer of destination character array (marked as __nonstring)

src

Pointer to NUL-terminated string

pad

Padding character to fill any remaining bytes ofdest after copy

Description

This is a replacement forstrncpy() uses where the destination is nota NUL-terminated string, but with bounds checking on the source size, andan explicit padding character. If padding is not required, usestrtomem().

Note that the size ofdest is not an argument, as the length ofdestmust be discoverable by the compiler.

strtomem

strtomem(dest,src)

Copy NUL-terminated string to non-NUL-terminated buffer

Parameters

dest

Pointer of destination character array (marked as __nonstring)

src

Pointer to NUL-terminated string

Description

This is a replacement forstrncpy() uses where the destination is nota NUL-terminated string, but with bounds checking on the source size, andwithout trailing padding. If padding is required, usestrtomem_pad().

Note that the size ofdest is not an argument, as the length ofdestmust be discoverable by the compiler.

memtostr

memtostr(dest,src)

Copy a possibly non-NUL-term string to a NUL-term string

Parameters

dest

Pointer to destination NUL-terminates string

src

Pointer to character array (likely marked as __nonstring)

Description

This is a replacement forstrncpy() uses where the source is nota NUL-terminated string.

Note that sizes ofdest andsrc must be known at compile-time.

memtostr_pad

memtostr_pad(dest,src)

Copy a possibly non-NUL-term string to a NUL-term string with NUL padding in the destination

Parameters

dest

Pointer to destination NUL-terminates string

src

Pointer to character array (likely marked as __nonstring)

Description

This is a replacement forstrncpy() uses where the source is nota NUL-terminated string.

Note that sizes ofdest andsrc must be known at compile-time.

memset_after

memset_after(obj,v,member)

Set a value after a struct member to the end of a struct

Parameters

obj

Address of target struct instance

v

Byte value to repeatedly write

member

after which struct member to start writing bytes

Description

This is good for clearing padding following the given member.

memset_startat

memset_startat(obj,v,member)

Set a value starting at a member to the end of a struct

Parameters

obj

Address of target struct instance

v

Byte value to repeatedly write

member

struct member to start writing at

Description

Note that if there is padding between the prior member and the targetmember,memset_after() should be used to clear the prior padding.

size_tstr_has_prefix(constchar*str,constchar*prefix)

Test if a string has a given prefix

Parameters

constchar*str

The string to test

constchar*prefix

The string to see ifstr starts with

Description

A common way to test a prefix of a string is to do:

strncmp(str, prefix, sizeof(prefix) - 1)

But this can lead to bugs due to typos, or if prefix is a pointerand not a constant. Instead usestr_has_prefix().

Return

  • strlen(prefix) ifstr starts withprefix

  • 0 ifstr does not start withprefix

char*kstrdup(constchar*s,gfp_tgfp)

allocate space for and copy an existing string

Parameters

constchar*s

the string to duplicate

gfp_tgfp

the GFP mask used in thekmalloc() call when allocating memory

Return

newly allocated copy ofs orNULL in case of error

constchar*kstrdup_const(constchar*s,gfp_tgfp)

conditionally duplicate an existing const string

Parameters

constchar*s

the string to duplicate

gfp_tgfp

the GFP mask used in thekmalloc() call when allocating memory

Note

Strings allocated by kstrdup_const should be freed by kfree_const andmust not be passed to krealloc().

Return

source string if it is in .rodata section otherwisefallback to kstrdup.

char*kstrndup(constchar*s,size_tmax,gfp_tgfp)

allocate space for and copy an existing string

Parameters

constchar*s

the string to duplicate

size_tmax

read at mostmax chars froms

gfp_tgfp

the GFP mask used in thekmalloc() call when allocating memory

Note

Usekmemdup_nul() instead if the size is known exactly.

Return

newly allocated copy ofs orNULL in case of error

void*kmemdup(constvoid*src,size_tlen,gfp_tgfp)

duplicate region of memory

Parameters

constvoid*src

memory region to duplicate

size_tlen

memory region length

gfp_tgfp

GFP mask to use

Return

newly allocated copy ofsrc orNULL in case of error,result is physically contiguous. Usekfree() to free.

char*kmemdup_nul(constchar*s,size_tlen,gfp_tgfp)

Create a NUL-terminated string from unterminated data

Parameters

constchar*s

The data to stringify

size_tlen

The size of the data

gfp_tgfp

the GFP mask used in thekmalloc() call when allocating memory

Return

newly allocated copy ofs with NUL-termination orNULL incase of error

void*memdup_user(constvoid__user*src,size_tlen)

duplicate memory region from user space

Parameters

constvoid__user*src

source address in user space

size_tlen

number of bytes to copy

Return

anERR_PTR() on failure. Result is physicallycontiguous, to be freed bykfree().

void*vmemdup_user(constvoid__user*src,size_tlen)

duplicate memory region from user space

Parameters

constvoid__user*src

source address in user space

size_tlen

number of bytes to copy

Return

anERR_PTR() on failure. Result may be notphysically contiguous. Usekvfree() to free.

char*strndup_user(constchar__user*s,longn)

duplicate an existing string from user space

Parameters

constchar__user*s

The string to duplicate

longn

Maximum number of bytes to copy, including the trailing NUL.

Return

newly allocated copy ofs or anERR_PTR() in case of error

void*memdup_user_nul(constvoid__user*src,size_tlen)

duplicate memory region from user space and NUL-terminate

Parameters

constvoid__user*src

source address in user space

size_tlen

number of bytes to copy

Return

anERR_PTR() on failure.

Basic Kernel Library Functions

The Linux kernel provides more basic utility functions.

Bit Operations

voidset_bit(longnr,volatileunsignedlong*addr)

Atomically set a bit in memory

Parameters

longnr

the bit to set

volatileunsignedlong*addr

the address to start counting from

Description

This is a relaxed atomic operation (no implied memory barriers).

Note thatnr may be almost arbitrarily large; this function is notrestricted to acting on a single-word quantity.

voidclear_bit(longnr,volatileunsignedlong*addr)

Clears a bit in memory

Parameters

longnr

Bit to clear

volatileunsignedlong*addr

Address to start counting from

Description

This is a relaxed atomic operation (no implied memory barriers).

voidchange_bit(longnr,volatileunsignedlong*addr)

Toggle a bit in memory

Parameters

longnr

Bit to change

volatileunsignedlong*addr

Address to start counting from

Description

This is a relaxed atomic operation (no implied memory barriers).

Note thatnr may be almost arbitrarily large; this function is notrestricted to acting on a single-word quantity.

booltest_and_set_bit(longnr,volatileunsignedlong*addr)

Set a bit and return its old value

Parameters

longnr

Bit to set

volatileunsignedlong*addr

Address to count from

Description

This is an atomic fully-ordered operation (implied full memory barrier).

booltest_and_clear_bit(longnr,volatileunsignedlong*addr)

Clear a bit and return its old value

Parameters

longnr

Bit to clear

volatileunsignedlong*addr

Address to count from

Description

This is an atomic fully-ordered operation (implied full memory barrier).

booltest_and_change_bit(longnr,volatileunsignedlong*addr)

Change a bit and return its old value

Parameters

longnr

Bit to change

volatileunsignedlong*addr

Address to count from

Description

This is an atomic fully-ordered operation (implied full memory barrier).

void___set_bit(unsignedlongnr,volatileunsignedlong*addr)

Set a bit in memory

Parameters

unsignedlongnr

the bit to set

volatileunsignedlong*addr

the address to start counting from

Description

Unlikeset_bit(), this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.

void___clear_bit(unsignedlongnr,volatileunsignedlong*addr)

Clears a bit in memory

Parameters

unsignedlongnr

the bit to clear

volatileunsignedlong*addr

the address to start counting from

Description

Unlikeclear_bit(), this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.

void___change_bit(unsignedlongnr,volatileunsignedlong*addr)

Toggle a bit in memory

Parameters

unsignedlongnr

the bit to change

volatileunsignedlong*addr

the address to start counting from

Description

Unlikechange_bit(), this function is non-atomic. If it is called on the sameregion of memory concurrently, the effect may be that only one operationsucceeds.

bool___test_and_set_bit(unsignedlongnr,volatileunsignedlong*addr)

Set a bit and return its old value

Parameters

unsignedlongnr

Bit to set

volatileunsignedlong*addr

Address to count from

Description

This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.

bool___test_and_clear_bit(unsignedlongnr,volatileunsignedlong*addr)

Clear a bit and return its old value

Parameters

unsignedlongnr

Bit to clear

volatileunsignedlong*addr

Address to count from

Description

This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.

bool___test_and_change_bit(unsignedlongnr,volatileunsignedlong*addr)

Change a bit and return its old value

Parameters

unsignedlongnr

Bit to change

volatileunsignedlong*addr

Address to count from

Description

This operation is non-atomic. If two instances of this operation race, onecan appear to succeed but actually fail.

bool_test_bit(unsignedlongnr,volatileconstunsignedlong*addr)

Determine whether a bit is set

Parameters

unsignedlongnr

bit number to test

constvolatileunsignedlong*addr

Address to start counting from

bool_test_bit_acquire(unsignedlongnr,volatileconstunsignedlong*addr)

Determine, with acquire semantics, whether a bit is set

Parameters

unsignedlongnr

bit number to test

constvolatileunsignedlong*addr

Address to start counting from

voidclear_bit_unlock(longnr,volatileunsignedlong*addr)

Clear a bit in memory, for unlock

Parameters

longnr

the bit to set

volatileunsignedlong*addr

the address to start counting from

Description

This operation is atomic and provides release barrier semantics.

void__clear_bit_unlock(longnr,volatileunsignedlong*addr)

Clears a bit in memory

Parameters

longnr

Bit to clear

volatileunsignedlong*addr

Address to start counting from

Description

This is a non-atomic operation but implies a release barrier before thememory operation. It can be used for an unlock if no other CPUs canconcurrently modify other bits in the word.

booltest_and_set_bit_lock(longnr,volatileunsignedlong*addr)

Set a bit and return its old value, for lock

Parameters

longnr

Bit to set

volatileunsignedlong*addr

Address to count from

Description

This operation is atomic and provides acquire barrier semantics ifthe returned value is 0.It can be used to implement bit locks.

boolxor_unlock_is_negative_byte(unsignedlongmask,volatileunsignedlong*addr)

XOR a single byte in memory and test if it is negative, for unlock.

Parameters

unsignedlongmask

Change the bits which are set in this mask.

volatileunsignedlong*addr

The address of the word containing the byte to change.

Description

Changes some of bits 0-6 in the word pointed to byaddr.This operation is atomic and provides release barrier semantics.Used to optimise some folio operations which are commonly pairedwith an unlock or end of writeback. Bit 7 is used as PG_waiters toindicate whether anybody is waiting for the unlock.

Return

Whether the top bit of the byte is set.

Bitmap Operations

bitmaps provide an array of bits, implemented using anarray of unsigned longs. The number of valid bits in agiven bitmap does _not_ need to be an exact multiple ofBITS_PER_LONG.

The possible unused bits in the last, partially used wordof a bitmap are ‘don’t care’. The implementation makesno particular effort to keep them zero. It ensures thattheir value will not affect the results of any operation.The bitmap operations that return Boolean (bitmap_empty,for example) or scalar (bitmap_weight, for example) resultscarefully filter out these unused bits from impacting theirresults.

The byte ordering of bitmaps is more natural on littleendian architectures. See the big-endian headersinclude/asm-ppc64/bitops.h and include/asm-s390/bitops.hfor the best explanations of this ordering.

The DECLARE_BITMAP(name,bits) macro, in linux/types.h, can be usedto declare an array named ‘name’ of just enough unsigned longs tocontain all bit positions from 0 to ‘bits’ - 1.

The available bitmap operations and their rough meaning in thecase that the bitmap is a single unsigned long are thus:

The generated code is more efficient when nbits is known atcompile-time and at most BITS_PER_LONG.

bitmap_zero(dst, nbits)                     *dst = 0ULbitmap_fill(dst, nbits)                     *dst = ~0ULbitmap_copy(dst, src, nbits)                *dst = *srcbitmap_and(dst, src1, src2, nbits)          *dst = *src1 & *src2bitmap_or(dst, src1, src2, nbits)           *dst = *src1 | *src2bitmap_xor(dst, src1, src2, nbits)          *dst = *src1 ^ *src2bitmap_andnot(dst, src1, src2, nbits)       *dst = *src1 & ~(*src2)bitmap_complement(dst, src, nbits)          *dst = ~(*src)bitmap_equal(src1, src2, nbits)             Are *src1 and *src2 equal?bitmap_intersects(src1, src2, nbits)        Do *src1 and *src2 overlap?bitmap_subset(src1, src2, nbits)            Is *src1 a subset of *src2?bitmap_empty(src, nbits)                    Are all bits zero in *src?bitmap_full(src, nbits)                     Are all bits set in *src?bitmap_weight(src, nbits)                   Hamming Weight: number set bitsbitmap_weight_and(src1, src2, nbits)        Hamming Weight of and'ed bitmapbitmap_weight_andnot(src1, src2, nbits)     Hamming Weight of andnot'ed bitmapbitmap_set(dst, pos, nbits)                 Set specified bit areabitmap_clear(dst, pos, nbits)               Clear specified bit areabitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free areabitmap_find_next_zero_area_off(buf, len, pos, n, mask, mask_off)  as abovebitmap_shift_right(dst, src, n, nbits)      *dst = *src >> nbitmap_shift_left(dst, src, n, nbits)       *dst = *src << nbitmap_cut(dst, src, first, n, nbits)       Cut n bits from first, copy restbitmap_replace(dst, old, new, mask, nbits)  *dst = (*old & ~(*mask)) | (*new & *mask)bitmap_scatter(dst, src, mask, nbits)       *dst = map(dense, sparse)(src)bitmap_gather(dst, src, mask, nbits)        *dst = map(sparse, dense)(src)bitmap_remap(dst, src, old, new, nbits)     *dst = map(old, new)(src)bitmap_bitremap(oldbit, old, new, nbits)    newbit = map(old, new)(oldbit)bitmap_onto(dst, orig, relmap, nbits)       *dst = orig relative to relmapbitmap_fold(dst, orig, sz, nbits)           dst bits = orig bits mod szbitmap_parse(buf, buflen, dst, nbits)       Parse bitmap dst from kernel bufbitmap_parse_user(ubuf, ulen, dst, nbits)   Parse bitmap dst from user bufbitmap_parselist(buf, dst, nbits)           Parse bitmap dst from kernel bufbitmap_parselist_user(buf, dst, nbits)      Parse bitmap dst from user bufbitmap_find_free_region(bitmap, bits, order)  Find and allocate bit regionbitmap_release_region(bitmap, pos, order)   Free specified bit regionbitmap_allocate_region(bitmap, pos, order)  Allocate specified bit regionbitmap_from_arr32(dst, buf, nbits)          Copy nbits from u32[] buf to dstbitmap_from_arr64(dst, buf, nbits)          Copy nbits from u64[] buf to dstbitmap_to_arr32(buf, src, nbits)            Copy nbits from buf to u32[] dstbitmap_to_arr64(buf, src, nbits)            Copy nbits from buf to u64[] dstbitmap_get_value8(map, start)               Get 8bit value from map at startbitmap_set_value8(map, value, start)        Set 8bit value to map at startbitmap_read(map, start, nbits)              Read an nbits-sized value from                                            map at startbitmap_write(map, value, start, nbits)      Write an nbits-sized value to                                            map at start

Note, bitmap_zero() and bitmap_fill() operate over the region ofunsigned longs, that is, bits behind bitmap till the unsigned longboundary will be zeroed or filled as well. Consider to usebitmap_clear() or bitmap_set() to make explicit zeroing or fillingrespectively.

Also the following operations in asm/bitops.h apply to bitmaps.:

set_bit(bit, addr)                  *addr |= bitclear_bit(bit, addr)                *addr &= ~bitchange_bit(bit, addr)               *addr ^= bittest_bit(bit, addr)                 Is bit set in *addr?test_and_set_bit(bit, addr)         Set bit and return old valuetest_and_clear_bit(bit, addr)       Clear bit and return old valuetest_and_change_bit(bit, addr)      Change bit and return old valuefind_first_zero_bit(addr, nbits)    Position first zero bit in *addrfind_first_bit(addr, nbits)         Position first set bit in *addrfind_next_zero_bit(addr, nbits, bit)                                    Position next zero bit in *addr >= bitfind_next_bit(addr, nbits, bit)     Position next set bit in *addr >= bitfind_next_and_bit(addr1, addr2, nbits, bit)                                    Same as find_next_bit, but in                                    (*addr1 & *addr2)
void__bitmap_shift_right(unsignedlong*dst,constunsignedlong*src,unsignedshift,unsignednbits)

logical right shift of the bits in a bitmap

Parameters

unsignedlong*dst

destination bitmap

constunsignedlong*src

source bitmap

unsignedshift

shift by this many bits

unsignednbits

bitmap size, in bits

Description

Shifting right (dividing) means moving bits in the MS -> LS bitdirection. Zeros are fed into the vacated MS positions and theLS bits shifted off the bottom are lost.

void__bitmap_shift_left(unsignedlong*dst,constunsignedlong*src,unsignedintshift,unsignedintnbits)

logical left shift of the bits in a bitmap

Parameters

unsignedlong*dst

destination bitmap

constunsignedlong*src

source bitmap

unsignedintshift

shift by this many bits

unsignedintnbits

bitmap size, in bits

Description

Shifting left (multiplying) means moving bits in the LS -> MSdirection. Zeros are fed into the vacated LS bit positionsand those MS bits shifted off the top are lost.

voidbitmap_cut(unsignedlong*dst,constunsignedlong*src,unsignedintfirst,unsignedintcut,unsignedintnbits)

remove bit region from bitmap and right shift remaining bits

Parameters

unsignedlong*dst

destination bitmap, might overlap with src

constunsignedlong*src

source bitmap

unsignedintfirst

start bit of region to be removed

unsignedintcut

number of bits to remove

unsignedintnbits

bitmap size, in bits

Description

Set the n-th bit ofdst iff the n-th bit ofsrc is set andn is less thanfirst, or the m-th bit ofsrc is set for anym such thatfirst <= n < nbits, and m = n +cut.

In pictures, example for a big-endian 32-bit architecture:

Thesrc bitmap is:

31                                   63|                                    |10000000 11000001 11110010 00010101  10000000 11000001 01110010 00010101                |  |              |                                    |               16  14             0                                   32

ifcut is 3, andfirst is 14, bits 14-16 insrc are cut anddst is:

31                                   63|                                    |10110000 00011000 00110010 00010101  00010000 00011000 00101110 01000010                   |              |                                    |                   14 (bit 17     0                                   32                       from @src)

Note thatdst andsrc might overlap partially or entirely.

This is implemented in the obvious way, with a shift and carrystep for each moved bit. Optimisation is left as an exercisefor the compiler.

unsignedlongbitmap_find_next_zero_area_off(unsignedlong*map,unsignedlongsize,unsignedlongstart,unsignedintnr,unsignedlongalign_mask,unsignedlongalign_offset)

find a contiguous aligned zero area

Parameters

unsignedlong*map

The address to base the search on

unsignedlongsize

The bitmap size in bits

unsignedlongstart

The bitnumber to start searching at

unsignedintnr

The number of zeroed bits we’re looking for

unsignedlongalign_mask

Alignment mask for zero area

unsignedlongalign_offset

Alignment offset for zero area.

Description

Thealign_mask should be one less than a power of 2; the effect is thatthe bit offset of all zero areas this function finds plusalign_offsetis multiple of that power of 2.

voidbitmap_remap(unsignedlong*dst,constunsignedlong*src,constunsignedlong*old,constunsignedlong*new,unsignedintnbits)

Apply map defined by a pair of bitmaps to another bitmap

Parameters

unsignedlong*dst

remapped result

constunsignedlong*src

subset to be remapped

constunsignedlong*old

defines domain of map

constunsignedlong*new

defines range of map

unsignedintnbits

number of bits in each of these bitmaps

Description

Letold andnew define a mapping of bit positions, such thatwhatever position is held by the n-th set bit inold is mappedto the n-th set bit innew. In the more general case, allowingfor the possibility that the weight ‘w’ ofnew is less than theweight ofold, map the position of the n-th set bit inold tothe position of the m-th set bit innew, where m == n % w.

If either of theold andnew bitmaps are empty, or ifsrc anddst point to the same location, then this routine copiessrctodst.

The positions of unset bits inold are mapped to themselves(the identity map).

Apply the above specified mapping tosrc, placing the result indst, clearing any bits previously set indst.

For example, lets say thatold has bits 4 through 7 set, andnew has bits 12 through 15 set. This defines the mapping of bitposition 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all otherbit positions unchanged. So if saysrc comes into this routinewith bits 1, 5 and 7 set, thendst should leave with bits 1,13 and 15 set.

intbitmap_bitremap(intoldbit,constunsignedlong*old,constunsignedlong*new,intbits)

Apply map defined by a pair of bitmaps to a single bit

Parameters

intoldbit

bit position to be mapped

constunsignedlong*old

defines domain of map

constunsignedlong*new

defines range of map

intbits

number of bits in each of these bitmaps

Description

Letold andnew define a mapping of bit positions, such thatwhatever position is held by the n-th set bit inold is mappedto the n-th set bit innew. In the more general case, allowingfor the possibility that the weight ‘w’ ofnew is less than theweight ofold, map the position of the n-th set bit inold tothe position of the m-th set bit innew, where m == n % w.

The positions of unset bits inold are mapped to themselves(the identity map).

Apply the above specified mapping to bit positionoldbit, returningthe new bit position.

For example, lets say thatold has bits 4 through 7 set, andnew has bits 12 through 15 set. This defines the mapping of bitposition 4 to 12, 5 to 13, 6 to 14 and 7 to 15, and of all otherbit positions unchanged. So if sayoldbit is 5, then this routinereturns 13.

voidbitmap_from_arr32(unsignedlong*bitmap,constu32*buf,unsignedintnbits)

copy the contents of u32 array of bits to bitmap

Parameters

unsignedlong*bitmap

array of unsigned longs, the destination bitmap

constu32*buf

array of u32 (in host byte order), the source bitmap

unsignedintnbits

number of bits inbitmap

voidbitmap_to_arr32(u32*buf,constunsignedlong*bitmap,unsignedintnbits)

copy the contents of bitmap to a u32 array of bits

Parameters

u32*buf

array of u32 (in host byte order), the dest bitmap

constunsignedlong*bitmap

array of unsigned longs, the source bitmap

unsignedintnbits

number of bits inbitmap

voidbitmap_from_arr64(unsignedlong*bitmap,constu64*buf,unsignedintnbits)

copy the contents of u64 array of bits to bitmap

Parameters

unsignedlong*bitmap

array of unsigned longs, the destination bitmap

constu64*buf

array of u64 (in host byte order), the source bitmap

unsignedintnbits

number of bits inbitmap

voidbitmap_to_arr64(u64*buf,constunsignedlong*bitmap,unsignedintnbits)

copy the contents of bitmap to a u64 array of bits

Parameters

u64*buf

array of u64 (in host byte order), the dest bitmap

constunsignedlong*bitmap

array of unsigned longs, the source bitmap

unsignedintnbits

number of bits inbitmap

intbitmap_pos_to_ord(constunsignedlong*buf,unsignedintpos,unsignedintnbits)

find ordinal of set bit at given position in bitmap

Parameters

constunsignedlong*buf

pointer to a bitmap

unsignedintpos

a bit position inbuf (0 <=pos <nbits)

unsignedintnbits

number of valid bit positions inbuf

Description

Map the bit at positionpos inbuf (of lengthnbits) to theordinal of which set bit it is. If it is not set or ifposis not a valid bit position, map to -1.

If for example, just bits 4 through 7 are set inbuf, thenposvalues 4 through 7 will get mapped to 0 through 3, respectively,and otherpos values will get mapped to -1. Whenpos value 7gets mapped to (returns)ord value 3 in this example, that meansthat bit 7 is the 3rd (starting with 0th) set bit inbuf.

The bit positions 0 throughbits are valid positions inbuf.

voidbitmap_onto(unsignedlong*dst,constunsignedlong*orig,constunsignedlong*relmap,unsignedintbits)

translate one bitmap relative to another

Parameters

unsignedlong*dst

resulting translated bitmap

constunsignedlong*orig

original untranslated bitmap

constunsignedlong*relmap

bitmap relative to which translated

unsignedintbits

number of bits in each of these bitmaps

Description

Set the n-th bit ofdst iff there exists some m such that then-th bit ofrelmap is set, the m-th bit oforig is set, andthe n-th bit ofrelmap is also the m-th _set_ bit ofrelmap.(If you understood the previous sentence the first time yourread it, you’re overqualified for your current job.)

In other words,orig is mapped onto (surjectively)dst,using the map { <n, m> | the n-th bit ofrelmap is them-th set bit ofrelmap }.

Any set bits inorig above bit number W, where W is theweight of (number of set bits in)relmap are mapped nowhere.In particular, if for all bits m set inorig, m >= W, thendst will end up empty. In situations where the possibilityof such an empty result is not desired, one way to avoid it isto use thebitmap_fold() operator, below, to first fold theorig bitmap over itself so that all its set bits x are in therange 0 <= x < W. Thebitmap_fold() operator does this bysetting the bit (m % W) indst, for each bit (m) set inorig.

Example [1] for bitmap_onto():

Let’s sayrelmap has bits 30-39 set, andorig has bits1, 3, 5, 7, 9 and 11 set. Then on return from this routine,dst will have bits 31, 33, 35, 37 and 39 set.

When bit 0 is set inorig, it means turn on the bit indst corresponding to whatever is the first bit (if any)that is turned on inrelmap. Since bit 0 was off in theabove example, we leave off that bit (bit 30) indst.

When bit 1 is set inorig (as in the above example), itmeans turn on the bit indst corresponding to whateveris the second bit that is turned on inrelmap. The secondbit inrelmap that was turned on in the above example wasbit 31, so we turned on bit 31 indst.

Similarly, we turned on bits 33, 35, 37 and 39 indst,because they were the 4th, 6th, 8th and 10th set bitsset inrelmap, and the 4th, 6th, 8th and 10th bits oforig (i.e. bits 3, 5, 7 and 9) were also set.

When bit 11 is set inorig, it means turn on the bit indst corresponding to whatever is the twelfth bit that isturned on inrelmap. In the above example, there wereonly ten bits turned on inrelmap (30..39), so that bit11 was set inorig had no affect ondst.

Example [2] for bitmap_fold() + bitmap_onto():

Let’s sayrelmap has these ten bits set:

40 41 42 43 45 48 53 61 74 95

(for the curious, that’s 40 plus the first ten terms of theFibonacci sequence.)

Further lets say we use the following code, invokingbitmap_fold() then bitmap_onto, as suggested above toavoid the possibility of an emptydst result:

unsigned long *tmp;     // a temporary bitmap's bitsbitmap_fold(tmp, orig, bitmap_weight(relmap, bits), bits);bitmap_onto(dst, tmp, relmap, bits);

Then this table shows what various values ofdst would be, forvariousorig’s. I list the zero-based positions of each set bit.The tmp column shows the intermediate result, as computed byusingbitmap_fold() to fold theorig bitmap modulo ten(the weight ofrelmap):

orig

tmp

dst

0

0

40

1

1

41

9

9

95

10

0

40[1]

1 3 5 7

1 3 5 7

41 43 48 61

0 1 2 3 4

0 1 2 3 4

40 41 42 43 45

0 9 18 27

0 9 8 7

40 61 74 95

0 10 20 30

0

40

0 11 22 33

0 1 2 3

40 41 42 43

0 12 24 36

0 2 4 6

40 42 45 53

78 102 211

1 2 8

41 42 74[1]

[1](1,2)

For these marked lines, if we hadn’t first donebitmap_fold()into tmp, then thedst result would have been empty.

If either oforig orrelmap is empty (no set bits), thendstwill be returned empty.

If (as explained above) the only set bits inorig are in positionsm where m >= W, (where W is the weight ofrelmap) thendst willonce again be returned empty.

All bits indst not set by the above rule are cleared.

voidbitmap_fold(unsignedlong*dst,constunsignedlong*orig,unsignedintsz,unsignedintnbits)

fold larger bitmap into smaller, modulo specified size

Parameters

unsignedlong*dst

resulting smaller bitmap

constunsignedlong*orig

original larger bitmap

unsignedintsz

specified size

unsignedintnbits

number of bits in each of these bitmaps

Description

For each bit oldbit inorig, set bit oldbit modsz indst.Clear all other bits indst. See further the comment andExample [2] forbitmap_onto() for why and how to use this.

unsignedlongbitmap_find_next_zero_area(unsignedlong*map,unsignedlongsize,unsignedlongstart,unsignedintnr,unsignedlongalign_mask)

find a contiguous aligned zero area

Parameters

unsignedlong*map

The address to base the search on

unsignedlongsize

The bitmap size in bits

unsignedlongstart

The bitnumber to start searching at

unsignedintnr

The number of zeroed bits we’re looking for

unsignedlongalign_mask

Alignment mask for zero area

Description

Thealign_mask should be one less than a power of 2; the effect is thatthe bit offset of all zero areas this function finds is multiples of thatpower of 2. Aalign_mask of 0 means no alignment is required.

boolbitmap_or_equal(constunsignedlong*src1,constunsignedlong*src2,constunsignedlong*src3,unsignedintnbits)

Check whether the or of two bitmaps is equal to a third

Parameters

constunsignedlong*src1

Pointer to bitmap 1

constunsignedlong*src2

Pointer to bitmap 2 will be or’ed with bitmap 1

constunsignedlong*src3

Pointer to bitmap 3. Compare to the result of*src1 |*src2

unsignedintnbits

number of bits in each of these bitmaps

Return

True if (*src1 |*src2) ==*src3, false otherwise

voidbitmap_scatter(unsignedlong*dst,constunsignedlong*src,constunsignedlong*mask,unsignedintnbits)

Scatter a bitmap according to the given mask

Parameters

unsignedlong*dst

scattered bitmap

constunsignedlong*src

gathered bitmap

constunsignedlong*mask

mask representing bits to assign to in the scattered bitmap

unsignedintnbits

number of bits in each of these bitmaps

Description

Scatters bitmap with sequential bits according to the givenmask.

Or in binary formsrcmaskdst0000000001011010 0001001100010011 0000001100000010

(Bits 0, 1, 2, 3, 4, 5 are copied to the bits 0, 1, 4, 8, 9, 12)

A more ‘visual’ description of the operation:

src:  0000000001011010                ||||||         +------+|||||         |  +----+||||         |  |+----+|||         |  ||   +-+||         |  ||   |  ||mask: ...v..vv...v..vv      ...0..11...0..10dst:  0000001100000010

A relationship exists betweenbitmap_scatter() andbitmap_gather(). Seebitmap_gather() for the bitmap gather detailed operations. TL;DR:bitmap_gather() can be seen as the ‘reverse’bitmap_scatter() operation.

Example

Ifsrc bitmap = 0x005a, withmask = 0x1313,dst will be 0x0302.

voidbitmap_gather(unsignedlong*dst,constunsignedlong*src,constunsignedlong*mask,unsignedintnbits)

Gather a bitmap according to given mask

Parameters

unsignedlong*dst

gathered bitmap

constunsignedlong*src

scattered bitmap

constunsignedlong*mask

mask representing bits to extract from in the scattered bitmap

unsignedintnbits

number of bits in each of these bitmaps

Description

Gathers bitmap with sparse bits according to the givenmask.

Or in binary formsrcmaskdst0000001100000010 0001001100010011 0000000000011010

(Bits 0, 1, 4, 8, 9, 12 are copied to the bits 0, 1, 2, 3, 4, 5)

A more ‘visual’ description of the operation:

mask: ...v..vv...v..vvsrc:  0000001100000010         ^  ^^   ^   0         |  ||   |  10         |  ||   > 010         |  |+--> 1010         |  +--> 11010         +----> 011010dst:  0000000000011010

A relationship exists betweenbitmap_gather() andbitmap_scatter(). Seebitmap_scatter() for the bitmap scatter detailed operations. TL;DR:bitmap_scatter() can be seen as the ‘reverse’bitmap_gather() operation.

Suppose scattered computed using bitmap_scatter(scattered, src, mask, n).The operation bitmap_gather(result, scattered, mask, n) leads to a resultequal or equivalent to src.

The result can be ‘equivalent’ becausebitmap_scatter() andbitmap_gather()are not bijective.The result and src values are equivalent in that sense that a call tobitmap_scatter(res, src, mask, n) and a call tobitmap_scatter(res, result, mask, n) will lead to the same res value.

Example

Ifsrc bitmap = 0x0302, withmask = 0x1313,dst will be 0x001a.

voidbitmap_release_region(unsignedlong*bitmap,unsignedintpos,intorder)

release allocated bitmap region

Parameters

unsignedlong*bitmap

array of unsigned longs corresponding to the bitmap

unsignedintpos

beginning of bit region to release

intorder

region size (log base 2 of number of bits) to release

Description

This is the complement to __bitmap_find_free_region() and releasesthe found region (by clearing it in the bitmap).

intbitmap_allocate_region(unsignedlong*bitmap,unsignedintpos,intorder)

allocate bitmap region

Parameters

unsignedlong*bitmap

array of unsigned longs corresponding to the bitmap

unsignedintpos

beginning of bit region to allocate

intorder

region size (log base 2 of number of bits) to allocate

Description

Allocate (set bits in) a specified region of a bitmap.

Return

0 on success, or-EBUSY if specified region wasn’tfree (not all bits were zero).

intbitmap_find_free_region(unsignedlong*bitmap,unsignedintbits,intorder)

find a contiguous aligned mem region

Parameters

unsignedlong*bitmap

array of unsigned longs corresponding to the bitmap

unsignedintbits

number of bits in the bitmap

intorder

region size (log base 2 of number of bits) to find

Description

Find a region of free (zero) bits in abitmap ofbits bits andallocate them (set them to one). Only consider regions of lengtha power (order) of two, aligned to that power of two, whichmakes the search algorithm much faster.

Return

the bit offset in bitmap of the allocated region,or -errno on failure.

BITMAP_FROM_U64

BITMAP_FROM_U64(n)

Represent u64 value in the format suitable for bitmap.

Parameters

n

u64 value

Description

Linux bitmaps are internally arrays of unsigned longs, i.e. 32-bitintegers in 32-bit environment, and 64-bit integers in 64-bit one.

There are four combinations of endianness and length of the word in linuxABIs: LE64, BE64, LE32 and BE32.

On 64-bit kernels 64-bit LE and BE numbers are naturally ordered inbitmaps and therefore don’t require any special handling.

On 32-bit kernels 32-bit LE ABI orders lo word of 64-bit number in memoryprior to hi, and 32-bit BE orders hi word prior to lo. The bitmap on theother hand is represented as an array of 32-bit words and the position ofbit N may therefore be calculated as: word #(N/32) and bit #(N``32``) in thatword. For example, bit #42 is located at 10th position of 2nd word.It matches 32-bit LE ABI, and we can simply let the compiler store 64-bitvalues in memory as it usually does. But for BE we need to swap hi and lowords manually.

With all that, the macroBITMAP_FROM_U64() does explicit reordering of hi andlo parts of u64. For LE32 it does nothing, and for BE environment it swapshi and lo words, as is expected by bitmap.

voidbitmap_from_u64(unsignedlong*dst,u64mask)

Check and swap words within u64.

Parameters

unsignedlong*dst

destination bitmap

u64mask

source bitmap

Description

In 32-bit Big Endian kernel, when using(u32*)(:c:type:`val`)[*]to read u64 mask, we will get the wrong word.That is(u32*)(:c:type:`val`)[0] gets the upper 32 bits,but we expect the lower 32-bits of u64.

unsignedlongbitmap_read(constunsignedlong*map,unsignedlongstart,unsignedlongnbits)

read a value of n-bits from the memory region

Parameters

constunsignedlong*map

address to the bitmap memory region

unsignedlongstart

bit offset of the n-bit value

unsignedlongnbits

size of value in bits, nonzero, up to BITS_PER_LONG

Return

value ofnbits bits located at thestart bit offset within themap memory region. Fornbits = 0 andnbits > BITS_PER_LONG the returnvalue is undefined.

voidbitmap_write(unsignedlong*map,unsignedlongvalue,unsignedlongstart,unsignedlongnbits)

write n-bit value within a memory region

Parameters

unsignedlong*map

address to the bitmap memory region

unsignedlongvalue

value to write, clamped to nbits

unsignedlongstart

bit offset of the n-bit value

unsignedlongnbits

size of value in bits, nonzero, up to BITS_PER_LONG.

Description

bitmap_write() behaves as-if implemented asnbits calls of __assign_bit(),i.e. bits beyondnbits are ignored:

for (bit = 0; bit < nbits; bit++)

__assign_bit(start + bit, bitmap, val & BIT(bit));

Fornbits == 0 andnbits > BITS_PER_LONG no writes are performed.

Command-line Parsing

intget_option(char**str,int*pint)

Parse integer from an option string

Parameters

char**str

option string

int*pint

(optional output) integer value parsed fromstr

Read an int from an option string; if available accept a subsequentcomma as well.

Whenpint is NULL the function can be used as a validator ofthe current option in the string.

Return values:0 - no int in string1 - int found, no subsequent comma2 - int found including a subsequent comma3 - hyphen found to denote a range

Leading hyphen without integer is no integer case, but we consume itfor the sake of simplification.

char*get_options(constchar*str,intnints,int*ints)

Parse a string into a list of integers

Parameters

constchar*str

String to be parsed

intnints

size of integer array

int*ints

integer array (must have room for at least one element)

This function parses a string containing a comma-separatedlist of integers, a hyphen-separated range of _positive_ integers,or a combination of both. The parse halts when the array isfull, or when no more numbers can be retrieved from thestring.

Whennints is 0, the function just validates the givenstr andreturns the amount of parseable integers as described below.

Return

The first element is filled by the number of collected integersin the range. The rest is what was parsed from thestr.

Return value is the character in the string which causedthe parse to end (typically a null terminator, ifstr iscompletely parseable).

unsignedlonglongmemparse(constchar*ptr,char**retptr)

parse a string with mem suffixes into a number

Parameters

constchar*ptr

Where parse begins

char**retptr

(output) Optional pointer to next char after parse completes

Parses a string into a number. The number stored atptr ispotentially suffixed with K, M, G, T, P, E.

Error Pointers

IS_ERR_VALUE

IS_ERR_VALUE(x)

Detect an error pointer.

Parameters

x

The pointer to check.

Description

LikeIS_ERR(), but does not generate a compiler warning if result is unused.

void*ERR_PTR(longerror)

Create an error pointer.

Parameters

longerror

A negative error code.

Description

Encodeserror into a pointer value. Users should consider the resultopaque and not assume anything about how the error is encoded.

Return

A pointer witherror encoded within its value.

longPTR_ERR(__forceconstvoid*ptr)

Extract the error code from an error pointer.

Parameters

__forceconstvoid*ptr

An error pointer.

Return

The error code withinptr.

boolIS_ERR(__forceconstvoid*ptr)

Detect an error pointer.

Parameters

__forceconstvoid*ptr

The pointer to check.

Return

true ifptr is an error pointer, false otherwise.

boolIS_ERR_OR_NULL(__forceconstvoid*ptr)

Detect an error pointer or a null pointer.

Parameters

__forceconstvoid*ptr

The pointer to check.

Description

LikeIS_ERR(), but also returns true for a null pointer.

void*ERR_CAST(__forceconstvoid*ptr)

Explicitly cast an error-valued pointer to another pointer type

Parameters

__forceconstvoid*ptr

The pointer to cast.

Description

Explicitly cast an error-valued pointer to another pointer type in such away as to make it clear that’s what’s going on.

intPTR_ERR_OR_ZERO(__forceconstvoid*ptr)

Extract the error code from a pointer if it has one.

Parameters

__forceconstvoid*ptr

A potential error pointer.

Description

Convenience function that can be used inside a function that returnsan error code to propagate errors received as error pointers.For example,returnPTR_ERR_OR_ZERO(ptr); replaces:

if(IS_ERR(ptr))returnPTR_ERR(ptr);elsereturn0;

Return

The error code withinptr if it is an error pointer; 0 otherwise.

Sorting

voidsort_r(void*base,size_tnum,size_tsize,cmp_r_func_tcmp_func,swap_r_func_tswap_func,constvoid*priv)

sort an array of elements

Parameters

void*base

pointer to data to sort

size_tnum

number of elements

size_tsize

size of each element

cmp_r_func_tcmp_func

pointer to comparison function

swap_r_func_tswap_func

pointer to swap function or NULL

constvoid*priv

third argument passed to comparison function

Description

This function does a heapsort on the given array. You may providea swap_func function if you need to do something more than a memorycopy (e.g. fix up pointers or auxiliary data), but the built-in swapavoids a slow retpoline and so is significantly faster.

The comparison function must adhere to specific mathematicalproperties to ensure correct and stable sorting:- Antisymmetry: cmp_func(a, b) must return the opposite sign ofcmp_func(b, a).- Transitivity: if cmp_func(a, b) <= 0 and cmp_func(b, c) <= 0, thencmp_func(a, c) <= 0.

Sorting time is O(n log n) both on average and worst-case. Whilequicksort is slightly faster on average, it suffers from exploitableO(n*n) worst-case behavior and extra memory requirements that makeit less suitable for kernel use.

voidsort_r_nonatomic(void*base,size_tnum,size_tsize,cmp_r_func_tcmp_func,swap_r_func_tswap_func,constvoid*priv)

sort an array of elements, with cond_resched

Parameters

void*base

pointer to data to sort

size_tnum

number of elements

size_tsize

size of each element

cmp_r_func_tcmp_func

pointer to comparison function

swap_r_func_tswap_func

pointer to swap function or NULL

constvoid*priv

third argument passed to comparison function

Description

Same as sort_r, but preferred for larger arrays as it does a periodiccond_resched().

voidlist_sort(void*priv,structlist_head*head,list_cmp_func_tcmp)

sort a list

Parameters

void*priv

private data, opaque tolist_sort(), passed tocmp

structlist_head*head

the list to sort

list_cmp_func_tcmp

the elements comparison function

Description

The comparison functioncmp must return > 0 ifa should sort afterb (”a >b” if you want an ascending sort), and <= 0 ifa shouldsort beforebor their original order should be preserved. It isalways called with the element that came first in the input ina,and list_sort is a stable sort, so it is not necessary to distinguishthea <b anda ==b cases.

The comparison function must adhere to specific mathematical propertiesto ensure correct and stable sorting:- Antisymmetry: cmp(a,b) must return the opposite sign ofcmp(b,a).- Transitivity: if cmp(a,b) <= 0 and cmp(b,c) <= 0, thencmp(a,c) <= 0.

This is compatible with two styles ofcmp function:- The traditional style which returns <0 / =0 / >0, or- Returning a boolean 0/1.The latter offers a chance to save a few cycles in the comparison(which is used by e.g. plug_ctx_cmp() in block/blk-mq.c).

A good way to write a multi-word comparison is:

if (a->high != b->high)        return a->high > b->high;if (a->middle != b->middle)        return a->middle > b->middle;return a->low > b->low;

This mergesort is as eager as possible while always performing at least2:1 balanced merges. Given two pending sublists of size 2^k, they aremerged to a size-2^(k+1) list as soon as we have 2^k following elements.

Thus, it will avoid cache thrashing as long as 3*2^k elements canfit into the cache. Not quite as good as a fully-eager bottom-upmergesort, but it does use 0.2*n fewer comparisons, so is faster inthe common case that everything fits into L1.

The merging is controlled by “count”, the number of elements in thepending lists. This is beautifully simple code, but rather subtle.

Each time we increment “count”, we set one bit (bit k) and clearbits k-1 .. 0. Each time this happens (except the very first timefor each bit, when count increments to 2^k), we merge two lists ofsize 2^k into one list of size 2^(k+1).

This merge happens exactly when the count reaches an odd multiple of2^k, which is when we have 2^k elements pending in smaller lists,so it’s safe to merge away two lists of size 2^k.

After this happens twice, we have created two lists of size 2^(k+1),which will be merged into a list of size 2^(k+2) before we createa third list of size 2^(k+1), so there are never more than two pending.

The number of pending lists of size 2^k is determined by thestate of bit k of “count” plus two extra pieces of information:

  • The state of bit k-1 (when k == 0, consider bit -1 always set), and

  • Whether the higher-order bits are zero or non-zero (i.e.is count >= 2^(k+1)).

There are six states we distinguish. “x” represents some arbitrarybits, and “y” represents some arbitrary non-zero bits:0: 00x: 0 pending of size 2^k; x pending of sizes < 2^k1: 01x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k2: x10x: 0 pending of size 2^k; 2^k + x pending of sizes < 2^k3: x11x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k4: y00x: 1 pending of size 2^k; 2^k + x pending of sizes < 2^k5: y01x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k(merge and loop back to state 2)

We gain lists of size 2^k in the 2->3 and 4->5 transitions (becausebit k-1 is set while the more significant bits are non-zero) andmerge them away in the 5->2 transition. Note in particular that justbefore the 5->2 transition, all lower-order bits are 11 (state 3),so there is one list of each smaller size.

When we reach the end of the input, we merge all the pendinglists, from smallest to largest. If you work through cases 2 to5 above, you can see that the number of elements we merge with a listof size 2^k varies from 2^(k-1) (cases 3 and 5 when x == 0) to2^(k+1) - 1 (second merge of case 5 when x == 2^(k-1) - 1).

Text Searching

INTRODUCTION

The textsearch infrastructure provides text searching facilities forboth linear and non-linear data. Individual search algorithms areimplemented in modules and chosen by the user.

ARCHITECTURE

  User  +----------------+  |        finish()|<--------------(6)-----------------+  |get_next_block()|<--------------(5)---------------+ |  |                |                     Algorithm   | |  |                |                    +------------------------------+  |                |                    |  init()   find()   destroy() |  |                |                    +------------------------------+  |                |       Core API           ^       ^          ^  |                |      +---------------+  (2)     (4)        (8)  |             (1)|----->| prepare()     |---+       |          |  |             (3)|----->| find()/next() |-----------+          |  |             (7)|----->| destroy()     |----------------------+  +----------------+      +---------------+(1) User configures a search by calling textsearch_prepare() specifying    the search parameters such as the pattern and algorithm name.(2) Core requests the algorithm to allocate and initialize a search    configuration according to the specified parameters.(3) User starts the search(es) by calling textsearch_find() or    textsearch_next() to fetch subsequent occurrences. A state variable    is provided to the algorithm to store persistent variables.(4) Core eventually resets the search offset and forwards the find()    request to the algorithm.(5) Algorithm calls get_next_block() provided by the user continuously    to fetch the data to be searched in block by block.(6) Algorithm invokes finish() after the last call to get_next_block    to clean up any leftovers from get_next_block. (Optional)(7) User destroys the configuration by calling textsearch_destroy().(8) Core notifies the algorithm to destroy algorithm specific    allocations. (Optional)

USAGE

Before a search can be performed, a configuration must be createdby callingtextsearch_prepare() specifying the searching algorithm,the pattern to look for and flags. As a flag, you can set TS_IGNORECASEto perform case insensitive matching. But it might slow downperformance of algorithm, so you should use it at own your risk.The returned configuration may then be used for an arbitraryamount of times and even in parallel as long as a separate structts_state variable is provided to every instance.

The actual search is performed by either callingtextsearch_find_continuous() for linear data or by providingan own get_next_block() implementation andcallingtextsearch_find(). Both functions returnthe position of the first occurrence of the pattern or UINT_MAX ifno match was found. Subsequent occurrences can be found by callingtextsearch_next() regardless of the linearity of the data.

Once you’re done using a configuration it must be given back viatextsearch_destroy.

EXAMPLE:

int pos;struct ts_config *conf;struct ts_state state;const char *pattern = "chicken";const char *example = "We dance the funky chicken";conf = textsearch_prepare("kmp", pattern, strlen(pattern),                          GFP_KERNEL, TS_AUTOLOAD);if (IS_ERR(conf)) {    err = PTR_ERR(conf);    goto errout;}pos = textsearch_find_continuous(conf, &state, example, strlen(example));if (pos != UINT_MAX)    panic("Oh my god, dancing chickens at %d\n", pos);textsearch_destroy(conf);
inttextsearch_register(structts_ops*ops)

register a textsearch module

Parameters

structts_ops*ops

operations lookup table

Description

This function must be called by textsearch modules to announcetheir presence. The specified &**ops** must havename set to aunique identifier and the callbacks find(), init(), get_pattern(),and get_pattern_len() must be implemented.

Returns 0 or -EEXISTS if another module has already registeredwith same name.

inttextsearch_unregister(structts_ops*ops)

unregister a textsearch module

Parameters

structts_ops*ops

operations lookup table

Description

This function must be called by textsearch modules to announcetheir disappearance for examples when the module gets unloaded.Theops parameter must be the same as the one during theregistration.

Returns 0 on success or -ENOENT if no matching textsearchregistration was found.

unsignedinttextsearch_find_continuous(structts_config*conf,structts_state*state,constvoid*data,unsignedintlen)

search a pattern in continuous/linear data

Parameters

structts_config*conf

search configuration

structts_state*state

search state

constvoid*data

data to search in

unsignedintlen

length of data

Description

A simplified version oftextsearch_find() for continuous/linear data.Calltextsearch_next() to retrieve subsequent matches.

Returns the position of first occurrence of the pattern orUINT_MAX if no occurrence was found.

structts_config*textsearch_prepare(constchar*algo,constvoid*pattern,unsignedintlen,gfp_tgfp_mask,intflags)

Prepare a search

Parameters

constchar*algo

name of search algorithm

constvoid*pattern

pattern data

unsignedintlen

length of pattern

gfp_tgfp_mask

allocation mask

intflags

search flags

Description

Looks up the search algorithm module and creates a new textsearchconfiguration for the specified pattern.

Returns a new textsearch configuration according to the specifiedparameters or aERR_PTR(). If a zero length pattern is passed, thisfunction returns EINVAL.

Note

The format of the pattern may not be compatible between

the various search algorithms.

voidtextsearch_destroy(structts_config*conf)

destroy a search configuration

Parameters

structts_config*conf

search configuration

Description

Releases all references of the configuration and freesup the memory.

unsignedinttextsearch_next(structts_config*conf,structts_state*state)

continue searching for a pattern

Parameters

structts_config*conf

search configuration

structts_state*state

search state

Description

Continues a search looking for more occurrences of the pattern.textsearch_find() must be called to find the first occurrencein order to reset the state.

Returns the position of the next occurrence of the pattern orUINT_MAX if not match was found.

unsignedinttextsearch_find(structts_config*conf,structts_state*state)

start searching for a pattern

Parameters

structts_config*conf

search configuration

structts_state*state

search state

Description

Returns the position of first occurrence of the pattern orUINT_MAX if no match was found.

void*textsearch_get_pattern(structts_config*conf)

return head of the pattern

Parameters

structts_config*conf

search configuration

unsignedinttextsearch_get_pattern_len(structts_config*conf)

return length of the pattern

Parameters

structts_config*conf

search configuration

CRC and Math Functions in Linux

Arithmetic Overflow Checking

check_add_overflow

check_add_overflow(a,b,d)

Calculate addition with overflow checking

Parameters

a

first addend

b

second addend

d

pointer to store sum

Description

Returns true on wrap-around, false otherwise.

*d holds the results of the attempted addition, regardless of whetherwrap-around occurred.

wrapping_add

wrapping_add(type,a,b)

Intentionally perform a wrapping addition

Parameters

type

type for result of calculation

a

first addend

b

second addend

Description

Return the potentially wrapped-around addition withouttripping any wrap-around sanitizers that may be enabled.

wrapping_assign_add

wrapping_assign_add(var,offset)

Intentionally perform a wrapping increment assignment

Parameters

var

variable to be incremented

offset

amount to add

Description

Incrementsvar byoffset with wrap-around. Returns the resultingvalue ofvar. Will not trip any wrap-around sanitizers.

Returns the new value ofvar.

check_sub_overflow

check_sub_overflow(a,b,d)

Calculate subtraction with overflow checking

Parameters

a

minuend; value to subtract from

b

subtrahend; value to subtract froma

d

pointer to store difference

Description

Returns true on wrap-around, false otherwise.

*d holds the results of the attempted subtraction, regardless of whetherwrap-around occurred.

wrapping_sub

wrapping_sub(type,a,b)

Intentionally perform a wrapping subtraction

Parameters

type

type for result of calculation

a

minuend; value to subtract from

b

subtrahend; value to subtract froma

Description

Return the potentially wrapped-around subtraction withouttripping any wrap-around sanitizers that may be enabled.

wrapping_assign_sub

wrapping_assign_sub(var,offset)

Intentionally perform a wrapping decrement assign

Parameters

var

variable to be decremented

offset

amount to subtract

Description

Decrementsvar byoffset with wrap-around. Returns the resultingvalue ofvar. Will not trip any wrap-around sanitizers.

Returns the new value ofvar.

check_mul_overflow

check_mul_overflow(a,b,d)

Calculate multiplication with overflow checking

Parameters

a

first factor

b

second factor

d

pointer to store product

Description

Returns true on wrap-around, false otherwise.

*d holds the results of the attempted multiplication, regardless of whetherwrap-around occurred.

wrapping_mul

wrapping_mul(type,a,b)

Intentionally perform a wrapping multiplication

Parameters

type

type for result of calculation

a

first factor

b

second factor

Description

Return the potentially wrapped-around multiplication withouttripping any wrap-around sanitizers that may be enabled.

check_shl_overflow

check_shl_overflow(a,s,d)

Calculate a left-shifted value and check overflow

Parameters

a

Value to be shifted

s

How many bits left to shift

d

Pointer to where to store the result

Description

Computes*d = (a <<s)

Returns true if ‘*d’ cannot hold the result or when ‘a <<s’ doesn’tmake sense. Example conditions:

  • a <<s’ causes bits to be lost when stored in*d.

  • s’ is garbage (e.g. negative) or so large that the result of‘a <<s’ is guaranteed to be 0.

  • a’ is negative.

  • a <<s’ sets the sign bit, if any, in ‘*d’.

*d’ will hold the results of the attempted shift, but is notconsidered “safe for use” if true is returned.

overflows_type

overflows_type(n,T)

helper for checking the overflows between value, variables, or data type

Parameters

n

source constant value or variable to be checked

T

destination variable or data type proposed to storex

Description

Compares thex expression for whether or not it can safely fit inthe storage of the type inT.x andT can have different types.Ifx is a constant expression, this will also resolve to a constantexpression.

Return

true if overflow can occur, false otherwise.

castable_to_type

castable_to_type(n,T)

like __same_type(), but also allows for casted literals

Parameters

n

variable or constant value

T

variable or data type

Description

Unlike the __same_type() macro, this allows a constant value as thefirst argument. If this value would not overflow into an assignmentof the second argument’s type, it returns true. Otherwise, this fallsback to __same_type().

size_tsize_mul(size_tfactor1,size_tfactor2)

Calculate size_t multiplication with saturation at SIZE_MAX

Parameters

size_tfactor1

first factor

size_tfactor2

second factor

Return

calculatefactor1 *factor2, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Thelvalue must be size_t to avoid implicit type conversion.

size_tsize_add(size_taddend1,size_taddend2)

Calculate size_t addition with saturation at SIZE_MAX

Parameters

size_taddend1

first addend

size_taddend2

second addend

Return

calculateaddend1 +addend2, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Thelvalue must be size_t to avoid implicit type conversion.

size_tsize_sub(size_tminuend,size_tsubtrahend)

Calculate size_t subtraction with saturation at SIZE_MAX

Parameters

size_tminuend

value to subtract from

size_tsubtrahend

value to subtract fromminuend

Return

calculateminuend -subtrahend, both promoted to size_t,with any overflow causing the return value to be SIZE_MAX. Forcomposition with thesize_add() andsize_mul() helpers, neitherargument may be SIZE_MAX (or the result with be forced to SIZE_MAX).The lvalue must be size_t to avoid implicit type conversion.

array_size

array_size(a,b)

Calculate size of 2-dimensional array.

Parameters

a

dimension one

b

dimension two

Description

Calculates size of 2-dimensional array:a *b.

Return

number of bytes needed to represent the array or SIZE_MAX onoverflow.

array3_size

array3_size(a,b,c)

Calculate size of 3-dimensional array.

Parameters

a

dimension one

b

dimension two

c

dimension three

Description

Calculates size of 3-dimensional array:a *b *c.

Return

number of bytes needed to represent the array or SIZE_MAX onoverflow.

flex_array_size

flex_array_size(p,member,count)

Calculate size of a flexible array member within an enclosing structure.

Parameters

p

Pointer to the structure.

member

Name of the flexible array member.

count

Number of elements in the array.

Description

Calculates size of a flexible array ofcount number ofmemberelements, at the end of structurep.

Return

number of bytes needed or SIZE_MAX on overflow.

struct_size

struct_size(p,member,count)

Calculate size of structure with trailing flexible array.

Parameters

p

Pointer to the structure.

member

Name of the array member.

count

Number of elements in the array.

Description

Calculates size of memory needed for structure ofp followed by anarray ofcount number ofmember elements.

Return

number of bytes needed or SIZE_MAX on overflow.

struct_size_t

struct_size_t(type,member,count)

Calculate size of structure with trailing flexible array

Parameters

type

structure type name.

member

Name of the array member.

count

Number of elements in the array.

Description

Calculates size of memory needed for structuretype followed by anarray ofcount number ofmember elements. Prefer usingstruct_size()when possible instead, to keep calculations associated with a specificinstance variable of typetype.

Return

number of bytes needed or SIZE_MAX on overflow.

__DEFINE_FLEX

__DEFINE_FLEX(type,name,member,count,trailer...)

helper macro forDEFINE_FLEX() family. Enables caller macro to pass arbitrary trailing expressions

Parameters

type

structure type name, including “struct” keyword.

name

Name for a variable to define.

member

Name of the array member.

count

Number of elements in the array; must be compile-time const.

trailer...

Trailing expressions for attributes and/or initializers.

_DEFINE_FLEX

_DEFINE_FLEX(type,name,member,count,initializer...)

helper macro forDEFINE_FLEX() family. Enables caller macro to pass (different) initializer.

Parameters

type

structure type name, including “struct” keyword.

name

Name for a variable to define.

member

Name of the array member.

count

Number of elements in the array; must be compile-time const.

initializer...

Initializer expression (e.g., pass= { } at minimum).

DEFINE_RAW_FLEX

DEFINE_RAW_FLEX(type,name,member,count)

Define an on-stack instance of structure with a trailing flexible array member, when it does not have a __counted_by annotation.

Parameters

type

structure type name, including “struct” keyword.

name

Name for a variable to define.

member

Name of the array member.

count

Number of elements in the array; must be compile-time const.

Description

Define a zeroed, on-stack, instance oftype structure with a trailingflexible array member.Use __struct_size(name) to get compile-time size of it afterwards.Use __member_size(name->member) to get compile-time size ofname members.Use STACK_FLEX_ARRAY_SIZE(name,member) to get compile-time number ofelements in arraymember.

DEFINE_FLEX

DEFINE_FLEX(TYPE,NAME,MEMBER,COUNTER,COUNT)

Define an on-stack instance of structure with a trailing flexible array member.

Parameters

TYPE

structure type name, including “struct” keyword.

NAME

Name for a variable to define.

MEMBER

Name of the array member.

COUNTER

Name of the __counted_by member.

COUNT

Number of elements in the array; must be compile-time const.

Description

Define a zeroed, on-stack, instance ofTYPE structure with a trailingflexible array member.Use __struct_size(NAME) to get compile-time size of it afterwards.Use __member_size(NAME->member) to get compile-time size ofNAME members.Use STACK_FLEX_ARRAY_SIZE(name,member) to get compile-time number ofelements in arraymember.

STACK_FLEX_ARRAY_SIZE

STACK_FLEX_ARRAY_SIZE(name,array)

helper macro forDEFINE_FLEX() family. Returns the number of elements inarray.

Parameters

name

Name for a variable defined inDEFINE_RAW_FLEX()/DEFINE_FLEX().

array

Name of the array member.

CRC Functions

uint8_tcrc4(uint8_tc,uint64_tx,intbits)

calculate the 4-bit crc of a value.

Parameters

uint8_tc

starting crc4

uint64_tx

value to checksum

intbits

number of bits inx to checksum

Description

Returns the crc4 value ofx, using polynomial 0b10111.

Thex value is treated as left-aligned, and bits abovebits are ignoredin the crc calculations.

u8crc7_be(u8crc,constu8*buffer,size_tlen)

update the CRC7 for the data buffer

Parameters

u8crc

previous CRC7 value

constu8*buffer

data pointer

size_tlen

number of bytes in the buffer

Context

any

Description

Returns the updated CRC7 value.The CRC7 is left-aligned in the byte (the lsbit is always 0), as thatmakes the computation easier, and all callers want it in that form.

voidcrc8_populate_msb(u8table[CRC8_TABLE_SIZE],u8polynomial)

fill crc table for given polynomial in reverse bit order.

Parameters

u8table[CRC8_TABLE_SIZE]

table to be filled.

u8polynomial

polynomial for which table is to be filled.

voidcrc8_populate_lsb(u8table[CRC8_TABLE_SIZE],u8polynomial)

fill crc table for given polynomial in regular bit order.

Parameters

u8table[CRC8_TABLE_SIZE]

table to be filled.

u8polynomial

polynomial for which table is to be filled.

u8crc8(constu8table[CRC8_TABLE_SIZE],constu8*pdata,size_tnbytes,u8crc)

calculate a crc8 over the given input data.

Parameters

constu8table[CRC8_TABLE_SIZE]

crc table used for calculation.

constu8*pdata

pointer to data buffer.

size_tnbytes

number of bytes in data buffer.

u8crc

previous returned crc8 value.

u16crc16(u16crc,constu8*p,size_tlen)

compute the CRC-16 for the data buffer

Parameters

u16crc

previous CRC value

constu8*p

data pointer

size_tlen

number of bytes in the buffer

Description

Returns the updated CRC value.

u32crc32_generic_shift(u32crc,size_tlen,u32polynomial)

Appendlen 0 bytes to crc, in logarithmic time

Parameters

u32crc

The original little-endian CRC (i.e. lsbit is x^31 coefficient)

size_tlen

The number of bytes.crc is multiplied by x^(8***len**)

u32polynomial

The modulus used to reduce the result to 32 bits.

Description

It’s possible to parallelize CRC computations by computing a CRCover separate ranges of a buffer, then summing them.This shifts the given CRC by 8*len bits (i.e. produces the same effectas appending len bytes of zero to the data), in time proportionalto log(len).

u16crc_ccitt(u16crc,u8const*buffer,size_tlen)

recompute the CRC (CRC-CCITT variant) for the data buffer

Parameters

u16crc

previous CRC value

u8const*buffer

data pointer

size_tlen

number of bytes in the buffer

u16crc_itu_t(u16crc,constu8*buffer,size_tlen)

Compute the CRC-ITU-T for the data buffer

Parameters

u16crc

previous CRC value

constu8*buffer

data pointer

size_tlen

number of bytes in the buffer

Description

Returns the updated CRC value

Base 2 log and power Functions

boolis_power_of_2(unsignedlongn)

check if a value is a power of two

Parameters

unsignedlongn

the value to check

Description

Determine whether some value is a power of two, where zero isnot considered a power of two.

Return

true ifn is a power of 2, otherwise false.

unsignedlong__roundup_pow_of_two(unsignedlongn)

round up to nearest power of two

Parameters

unsignedlongn

value to round up

unsignedlong__rounddown_pow_of_two(unsignedlongn)

round down to nearest power of two

Parameters

unsignedlongn

value to round down

const_ilog2

const_ilog2(n)

log base 2 of 32-bit or a 64-bit constant unsigned value

Parameters

n

parameter

Description

Use this where sparse expects a true constant expression, e.g. for arrayindices.

ilog2

ilog2(n)

log base 2 of 32-bit or a 64-bit unsigned value

Parameters

n

parameter

Description

constant-capable log of base 2 calculation- this can be used to initialise global variables from constant data, hencethe massive ternary operator construction

selects the appropriately-sized optimised version depending on sizeof(n)

roundup_pow_of_two

roundup_pow_of_two(n)

round the given value up to nearest power of two

Parameters

n

parameter

Description

round the given value up to the nearest power of two- the result is undefined when n == 0- this can be used to initialise global variables from constant data

rounddown_pow_of_two

rounddown_pow_of_two(n)

round the given value down to nearest power of two

Parameters

n

parameter

Description

round the given value down to the nearest power of two- the result is undefined when n == 0- this can be used to initialise global variables from constant data

order_base_2

order_base_2(n)

calculate the (rounded up) base 2 order of the argument

Parameters

n

parameter

Description

The first few values calculated by this routine:

ob2(0) = 0ob2(1) = 0ob2(2) = 1ob2(3) = 2ob2(4) = 2ob2(5) = 3... and so on.

bits_per

bits_per(n)

calculate the number of bits required for the argument

Parameters

n

parameter

Description

This is constant-capable and can be used for compile timeinitializations, e.g bitfields.

The first few values calculated by this routine:bf(0) = 1bf(1) = 1bf(2) = 2bf(3) = 2bf(4) = 3... and so on.

Integer log and power Functions

unsignedintintlog2(u32value)

computes log2 of a value; the result is shifted left by 24 bits

Parameters

u32value

The value (must be != 0)

Description

to use rational values you can use the following method:

intlog2(value) = intlog2(value * 2^x) - x * 2^24

Some usecase examples:

intlog2(8) will give 3 << 24 = 3 * 2^24

intlog2(9) will give 3 << 24 + ... = 3.16... * 2^24

intlog2(1.5) = intlog2(3) - 2^24 = 0.584... * 2^24

Return

log2(value) * 2^24

unsignedintintlog10(u32value)

computes log10 of a value; the result is shifted left by 24 bits

Parameters

u32value

The value (must be != 0)

Description

to use rational values you can use the following method:

intlog10(value) = intlog10(value * 10^x) - x * 2^24

An usecase example:

intlog10(1000) will give 3 << 24 = 3 * 2^24

due to the implementation intlog10(1000) might be not exactly 3 * 2^24

look at intlog2 for similar examples

Return

log10(value) * 2^24

u64int_pow(u64base,unsignedintexp)

computes the exponentiation of the given base and exponent

Parameters

u64base

base which will be raised to the given power

unsignedintexp

power to be raised to

Description

Computes: pow(base, exp), i.e.base raised to theexp power

unsignedlongint_sqrt(unsignedlongx)

computes the integer square root

Parameters

unsignedlongx

integer of which to calculate the sqrt

Description

Computes: floor(sqrt(x))

u32int_sqrt64(u64x)

strongly typed int_sqrt function when minimum 64 bit input is expected.

Parameters

u64x

64bit integer of which to calculate the sqrt

Division Functions

do_div

do_div(n,base)

returns 2 values: calculate remainder and update new dividend

Parameters

n

uint64_t dividend (will be updated)

base

uint32_t divisor

Description

Summary:uint32_tremainder=n%base;n=n/base;

Return

(uint32_t)remainder

NOTE

macro parametern is evaluated multiple times,beware of side effects!

u64div_u64_rem(u64dividend,u32divisor,u32*remainder)

unsigned 64bit divide with 32bit divisor with remainder

Parameters

u64dividend

unsigned 64bit dividend

u32divisor

unsigned 32bit divisor

u32*remainder

pointer to unsigned 32bit remainder

Return

sets*remainder, then returns dividend / divisor

Description

This is commonly provided by 32bit archs to provide an optimized 64bitdivide.

s64div_s64_rem(s64dividend,s32divisor,s32*remainder)

signed 64bit divide with 32bit divisor with remainder

Parameters

s64dividend

signed 64bit dividend

s32divisor

signed 32bit divisor

s32*remainder

pointer to signed 32bit remainder

Return

sets*remainder, then returns dividend / divisor

u64div64_u64_rem(u64dividend,u64divisor,u64*remainder)

unsigned 64bit divide with 64bit divisor and remainder

Parameters

u64dividend

unsigned 64bit dividend

u64divisor

unsigned 64bit divisor

u64*remainder

pointer to unsigned 64bit remainder

Return

sets*remainder, then returns dividend / divisor

u64div64_u64(u64dividend,u64divisor)

unsigned 64bit divide with 64bit divisor

Parameters

u64dividend

unsigned 64bit dividend

u64divisor

unsigned 64bit divisor

Return

dividend / divisor

s64div64_s64(s64dividend,s64divisor)

signed 64bit divide with 64bit divisor

Parameters

s64dividend

signed 64bit dividend

s64divisor

signed 64bit divisor

Return

dividend / divisor

u64div_u64(u64dividend,u32divisor)

unsigned 64bit divide with 32bit divisor

Parameters

u64dividend

unsigned 64bit dividend

u32divisor

unsigned 32bit divisor

Description

This is the most common 64bit divide and should be used if possible,as many 32bit archs can optimize this variant better than a full 64bitdivide.

Return

dividend / divisor

s64div_s64(s64dividend,s32divisor)

signed 64bit divide with 32bit divisor

Parameters

s64dividend

signed 64bit dividend

s32divisor

signed 32bit divisor

Return

dividend / divisor

DIV64_U64_ROUND_UP

DIV64_U64_ROUND_UP(ll,d)

unsigned 64bit divide with 64bit divisor rounded up

Parameters

ll

unsigned 64bit dividend

d

unsigned 64bit divisor

Description

Divide unsigned 64bit dividend by unsigned 64bit divisorand round up.

Return

dividend / divisor rounded up

DIV_U64_ROUND_UP

DIV_U64_ROUND_UP(ll,d)

unsigned 64bit divide with 32bit divisor rounded up

Parameters

ll

unsigned 64bit dividend

d

unsigned 32bit divisor

Description

Divide unsigned 64bit dividend by unsigned 32bit divisorand round up.

Return

dividend / divisor rounded up

DIV64_U64_ROUND_CLOSEST

DIV64_U64_ROUND_CLOSEST(dividend,divisor)

unsigned 64bit divide with 64bit divisor rounded to nearest integer

Parameters

dividend

unsigned 64bit dividend

divisor

unsigned 64bit divisor

Description

Divide unsigned 64bit dividend by unsigned 64bit divisorand round to closest integer.

Return

dividend / divisor rounded to nearest integer

DIV_U64_ROUND_CLOSEST

DIV_U64_ROUND_CLOSEST(dividend,divisor)

unsigned 64bit divide with 32bit divisor rounded to nearest integer

Parameters

dividend

unsigned 64bit dividend

divisor

unsigned 32bit divisor

Description

Divide unsigned 64bit dividend by unsigned 32bit divisorand round to closest integer.

Return

dividend / divisor rounded to nearest integer

DIV_S64_ROUND_CLOSEST

DIV_S64_ROUND_CLOSEST(dividend,divisor)

signed 64bit divide with 32bit divisor rounded to nearest integer

Parameters

dividend

signed 64bit dividend

divisor

signed 32bit divisor

Description

Divide signed 64bit dividend by signed 32bit divisorand round to closest integer.

Return

dividend / divisor rounded to nearest integer

u64roundup_u64(u64x,u32y)

Round up a 64bit value to the next specified 32bit multiple

Parameters

u64x

the value to up

u32y

32bit multiple to round up to

Description

Roundsx to the next multiple ofy. For 32bitx values, see roundup andthe faster round_up() for powers of 2.

Return

rounded up value.

unsignedlonggcd(unsignedlonga,unsignedlongb)

calculate and return the greatest common divisor of 2 unsigned longs

Parameters

unsignedlonga

first value

unsignedlongb

second value

UUID/GUID

voidgenerate_random_uuid(unsignedcharuuid[16])

generate a random UUID

Parameters

unsignedcharuuid[16]

where to put the generated UUID

Description

Random UUID interface

Used to create a Boot ID or a filesystem UUID/GUID, but can beuseful for other kernel drivers.

booluuid_is_valid(constchar*uuid)

checks if a UUID string is valid

Parameters

constchar*uuid

UUID string to check

Description

It checks if the UUID string is following the format:

xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

where x is a hex digit.

Return

true if input is valid UUID string.

Kernel IPC facilities

IPC utilities

intipc_init(void)

initialise ipc subsystem

Parameters

void

no arguments

Description

The various sysv ipc resources (semaphores, messages and sharedmemory) are initialised.

A callback routine is registered into the memory hotplug notifierchain: since msgmni scales to lowmem this callback routine will becalled upon successful memory add / remove to recompute msmgni.

voidipc_init_ids(structipc_ids*ids)

initialise ipc identifiers

Parameters

structipc_ids*ids

ipc identifier set

Description

Set up the sequence range to use for the ipc identifier range (limitedbelow ipc_mni) then initialise the keys hashtable and ids idr.

voidipc_init_proc_interface(constchar*path,constchar*header,intids,int(*show)(structseq_file*,void*))

create a proc interface for sysipc types using a seq_file interface.

Parameters

constchar*path

Path in procfs

constchar*header

Banner to be printed at the beginning of the file.

intids

ipc id table to iterate.

int(*show)(structseq_file*,void*)

show routine.

structkern_ipc_perm*ipc_findkey(structipc_ids*ids,key_tkey)

find a key in an ipc identifier set

Parameters

structipc_ids*ids

ipc identifier set

key_tkey

key to find

Description

Returns the locked pointer to the ipc structure if found or NULLotherwise. If key is found ipc points to the owning ipc structure

Called with writer ipc_ids.rwsem held.

intipc_addid(structipc_ids*ids,structkern_ipc_perm*new,intlimit)

add an ipc identifier

Parameters

structipc_ids*ids

ipc identifier set

structkern_ipc_perm*new

new ipc permission set

intlimit

limit for the number of used ids

Description

Add an entry ‘new’ to the ipc ids idr. The permissions object isinitialised and the first free entry is set up and the index assignedis returned. The ‘new’ entry is returned in a locked state on success.

On failure the entry is not locked and a negative err-code is returned.The caller must use ipc_rcu_putref() to free the identifier.

Called with writer ipc_ids.rwsem held.

intipcget_new(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)

create a new ipc object

Parameters

structipc_namespace*ns

ipc namespace

structipc_ids*ids

ipc identifier set

conststructipc_ops*ops

the actual creation routine to call

structipc_params*params

its parameters

Description

This routine is called by sys_msgget, sys_semget() and sys_shmget()when the key is IPC_PRIVATE.

intipc_check_perms(structipc_namespace*ns,structkern_ipc_perm*ipcp,conststructipc_ops*ops,structipc_params*params)

check security and permissions for an ipc object

Parameters

structipc_namespace*ns

ipc namespace

structkern_ipc_perm*ipcp

ipc permission set

conststructipc_ops*ops

the actual security routine to call

structipc_params*params

its parameters

Description

This routine is called by sys_msgget(), sys_semget() and sys_shmget()when the key is not IPC_PRIVATE and that key already exists in theds IDR.

On success, the ipc id is returned.

It is called with ipc_ids.rwsem and ipcp->lock held.

intipcget_public(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)

get an ipc object or create a new one

Parameters

structipc_namespace*ns

ipc namespace

structipc_ids*ids

ipc identifier set

conststructipc_ops*ops

the actual creation routine to call

structipc_params*params

its parameters

Description

This routine is called by sys_msgget, sys_semget() and sys_shmget()when the key is not IPC_PRIVATE.It adds a new entry if the key is not found and does some permission/ security checkings if the key is found.

On success, the ipc id is returned.

voidipc_kht_remove(structipc_ids*ids,structkern_ipc_perm*ipcp)

remove an ipc from the key hashtable

Parameters

structipc_ids*ids

ipc identifier set

structkern_ipc_perm*ipcp

ipc perm structure containing the key to remove

Description

ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.

intipc_search_maxidx(structipc_ids*ids,intlimit)

search for the highest assigned index

Parameters

structipc_ids*ids

ipc identifier set

intlimit

known upper limit for highest assigned index

Description

The function determines the highest assigned index inids. It is intendedto be called when ids->max_idx needs to be updated.Updating ids->max_idx is necessary when the current highest index ipcobject is deleted.If no ipc object is allocated, then -1 is returned.

ipc_ids.rwsem needs to be held by the caller.

voidipc_rmid(structipc_ids*ids,structkern_ipc_perm*ipcp)

remove an ipc identifier

Parameters

structipc_ids*ids

ipc identifier set

structkern_ipc_perm*ipcp

ipc perm structure containing the identifier to remove

Description

ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.

voidipc_set_key_private(structipc_ids*ids,structkern_ipc_perm*ipcp)

switch the key of an existing ipc to IPC_PRIVATE

Parameters

structipc_ids*ids

ipc identifier set

structkern_ipc_perm*ipcp

ipc perm structure containing the key to modify

Description

ipc_ids.rwsem (as a writer) and the spinlock for this ID are heldbefore this function is called, and remain locked on the exit.

intipcperms(structipc_namespace*ns,structkern_ipc_perm*ipcp,shortflag)

check ipc permissions

Parameters

structipc_namespace*ns

ipc namespace

structkern_ipc_perm*ipcp

ipc permission set

shortflag

desired permission set

Description

Check user, group, other permissions for accessto ipc resources. return 0 if allowed

flag will most probably be 0 orS_...UGO from <linux/stat.h>

voidkernel_to_ipc64_perm(structkern_ipc_perm*in,structipc64_perm*out)

convert kernel ipc permissions to user

Parameters

structkern_ipc_perm*in

kernel permissions

structipc64_perm*out

new style ipc permissions

Description

Turn the kernel objectin into a set of permissions descriptionsfor returning to userspace (out).

voidipc64_perm_to_ipc_perm(structipc64_perm*in,structipc_perm*out)

convert new ipc permissions to old

Parameters

structipc64_perm*in

new style ipc permissions

structipc_perm*out

old style ipc permissions

Description

Turn the new style permissions objectin into a compatibilityobject and store it into theout pointer.

structkern_ipc_perm*ipc_obtain_object_idr(structipc_ids*ids,intid)

Look for an id in the ipc ids idr and return associated ipc object.

Parameters

structipc_ids*ids

ipc identifier set

intid

ipc id to look for

Description

Call inside the RCU critical section.The ipc object isnot locked on exit.

structkern_ipc_perm*ipc_obtain_object_check(structipc_ids*ids,intid)

Similar toipc_obtain_object_idr() but also checks the ipc object sequence number.

Parameters

structipc_ids*ids

ipc identifier set

intid

ipc id to look for

Description

Call inside the RCU critical section.The ipc object isnot locked on exit.

intipcget(structipc_namespace*ns,structipc_ids*ids,conststructipc_ops*ops,structipc_params*params)

Common sys_*get() code

Parameters

structipc_namespace*ns

namespace

structipc_ids*ids

ipc identifier set

conststructipc_ops*ops

operations to be called on ipc object creation, permission checksand further checks

structipc_params*params

the parameters needed by the previous operations.

Description

Common routine called by sys_msgget(), sys_semget() and sys_shmget().

intipc_update_perm(structipc64_perm*in,structkern_ipc_perm*out)

update the permissions of an ipc object

Parameters

structipc64_perm*in

the permission given as input.

structkern_ipc_perm*out

the permission of the ipc to set.

structkern_ipc_perm*ipcctl_obtain_check(structipc_namespace*ns,structipc_ids*ids,intid,intcmd,structipc64_perm*perm,intextra_perm)

retrieve an ipc object and check permissions

Parameters

structipc_namespace*ns

ipc namespace

structipc_ids*ids

the table of ids where to look for the ipc

intid

the id of the ipc to retrieve

intcmd

the cmd to check

structipc64_perm*perm

the permission to set

intextra_perm

one extra permission parameter used by msq

Description

This function does some common audit and permissions check for some IPC_XXXcmd and is called from semctl_down, shmctl_down and msgctl_down.

It:
  • retrieves the ipc object with the given id in the given table.

  • performs some audit and permission check, depending on the given cmd

  • returns a pointer to the ipc object or otherwise, the correspondingerror.

Call holding the both the rwsem and the rcu read lock.

intipc_parse_version(int*cmd)

ipc call version

Parameters

int*cmd

pointer to command

Description

Return IPC_64 for new style IPC and IPC_OLD for old style IPC.Thecmd value is turned from an encoding command and version intojust the command code.

structkern_ipc_perm*sysvipc_find_ipc(structipc_ids*ids,loff_t*pos)

Find and lock the ipc structure based on seq pos

Parameters

structipc_ids*ids

ipc identifier set

loff_t*pos

expected position

Description

The function finds an ipc structure, based on the sequence filepositionpos. If there is no ipc structure at positionpos, thenthe successor is selected.If a structure is found, then it is locked (bothrcu_read_lock() andipc_lock_object()) andpos is set to the position needed to locatethe found ipc structure.If nothing is found (i.e. EOF),pos is not modified.

The function returns the found ipc structure, or NULL at EOF.

FIFO Buffer

kfifo interface

DECLARE_KFIFO_PTR

DECLARE_KFIFO_PTR(fifo,type)

macro to declare a fifo pointer object

Parameters

fifo

name of the declared fifo

type

type of the fifo elements

DECLARE_KFIFO

DECLARE_KFIFO(fifo,type,size)

macro to declare a fifo object

Parameters

fifo

name of the declared fifo

type

type of the fifo elements

size

the number of elements in the fifo, this must be a power of 2

INIT_KFIFO

INIT_KFIFO(fifo)

Initialize a fifo declared by DECLARE_KFIFO

Parameters

fifo

name of the declared fifo datatype

DEFINE_KFIFO

DEFINE_KFIFO(fifo,type,size)

macro to define and initialize a fifo

Parameters

fifo

name of the declared fifo datatype

type

type of the fifo elements

size

the number of elements in the fifo, this must be a power of 2

Note

the macro can be used for global and local fifo data type variables.

kfifo_initialized

kfifo_initialized(fifo)

Check if the fifo is initialized

Parameters

fifo

address of the fifo to check

Description

Returntrue if fifo is initialized, otherwisefalse.Assumes the fifo was 0 before.

kfifo_esize

kfifo_esize(fifo)

returns the size of the element managed by the fifo

Parameters

fifo

address of the fifo to be used

kfifo_recsize

kfifo_recsize(fifo)

returns the size of the record length field

Parameters

fifo

address of the fifo to be used

kfifo_size

kfifo_size(fifo)

returns the size of the fifo in elements

Parameters

fifo

address of the fifo to be used

kfifo_reset

kfifo_reset(fifo)

removes the entire fifo content

Parameters

fifo

address of the fifo to be used

Note

usage ofkfifo_reset() is dangerous. It should be only called when thefifo is exclusived locked or when it is secured that no other thread isaccessing the fifo.

kfifo_reset_out

kfifo_reset_out(fifo)

skip fifo content

Parameters

fifo

address of the fifo to be used

Note

The usage ofkfifo_reset_out() is safe until it will be only calledfrom the reader thread and there is only one concurrent reader. Otherwiseit is dangerous and must be handled in the same way askfifo_reset().

kfifo_len

kfifo_len(fifo)

returns the number of used elements in the fifo

Parameters

fifo

address of the fifo to be used

kfifo_is_empty

kfifo_is_empty(fifo)

returns true if the fifo is empty

Parameters

fifo

address of the fifo to be used

kfifo_is_empty_spinlocked

kfifo_is_empty_spinlocked(fifo,lock)

returns true if the fifo is empty using a spinlock for locking

Parameters

fifo

address of the fifo to be used

lock

spinlock to be used for locking

kfifo_is_empty_spinlocked_noirqsave

kfifo_is_empty_spinlocked_noirqsave(fifo,lock)

returns true if the fifo is empty using a spinlock for locking, doesn’t disable interrupts

Parameters

fifo

address of the fifo to be used

lock

spinlock to be used for locking

kfifo_is_full

kfifo_is_full(fifo)

returns true if the fifo is full

Parameters

fifo

address of the fifo to be used

kfifo_avail

kfifo_avail(fifo)

returns the number of unused elements in the fifo

Parameters

fifo

address of the fifo to be used

kfifo_skip_count

kfifo_skip_count(fifo,count)

skip output data

Parameters

fifo

address of the fifo to be used

count

count of data to skip

kfifo_skip

kfifo_skip(fifo)

skip output data

Parameters

fifo

address of the fifo to be used

kfifo_peek_len

kfifo_peek_len(fifo)

gets the size of the next fifo record

Parameters

fifo

address of the fifo to be used

Description

This function returns the size of the next fifo record in number of bytes.

kfifo_alloc

kfifo_alloc(fifo,size,gfp_mask)

dynamically allocates a new fifo buffer

Parameters

fifo

pointer to the fifo

size

the number of elements in the fifo, this must be a power of 2

gfp_mask

get_free_pages mask, passed tokmalloc()

Description

This macro dynamically allocates a new fifo buffer.

The number of elements will be rounded-up to a power of 2.The fifo will be release withkfifo_free().Return 0 if no error, otherwise an error code.

kfifo_free

kfifo_free(fifo)

frees the fifo

Parameters

fifo

the fifo to be freed

kfifo_init

kfifo_init(fifo,buffer,size)

initialize a fifo using a preallocated buffer

Parameters

fifo

the fifo to assign the buffer

buffer

the preallocated buffer to be used

size

the size of the internal buffer, this have to be a power of 2

Description

This macro initializes a fifo using a preallocated buffer.

The number of elements will be rounded-up to a power of 2.Return 0 if no error, otherwise an error code.

kfifo_put

kfifo_put(fifo,val)

put data into the fifo

Parameters

fifo

address of the fifo to be used

val

the data to be added

Description

This macro copies the given value into the fifo.It returns 0 if the fifo was full. Otherwise it returns the numberprocessed elements.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_get

kfifo_get(fifo,val)

get data from the fifo

Parameters

fifo

address of the fifo to be used

val

address where to store the data

Description

This macro reads the data from the fifo.It returns 0 if the fifo was empty. Otherwise it returns the numberprocessed elements.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_peek

kfifo_peek(fifo,val)

get data from the fifo without removing

Parameters

fifo

address of the fifo to be used

val

address where to store the data

Description

This reads the data from the fifo without removing it from the fifo.It returns 0 if the fifo was empty. Otherwise it returns the numberprocessed elements.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_in

kfifo_in(fifo,buf,n)

put data into the fifo

Parameters

fifo

address of the fifo to be used

buf

the data to be added

n

number of elements to be added

Description

This macro copies the given buffer into the fifo and returns thenumber of copied elements.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_in_spinlocked

kfifo_in_spinlocked(fifo,buf,n,lock)

put data into the fifo using a spinlock for locking

Parameters

fifo

address of the fifo to be used

buf

the data to be added

n

number of elements to be added

lock

pointer to the spinlock to use for locking

Description

This macro copies the given values buffer into the fifo and returns thenumber of copied elements.

kfifo_in_spinlocked_noirqsave

kfifo_in_spinlocked_noirqsave(fifo,buf,n,lock)

put data into fifo using a spinlock for locking, don’t disable interrupts

Parameters

fifo

address of the fifo to be used

buf

the data to be added

n

number of elements to be added

lock

pointer to the spinlock to use for locking

Description

This is a variant ofkfifo_in_spinlocked() but uses spin_lock/unlock()for locking and doesn’t disable interrupts.

kfifo_out

kfifo_out(fifo,buf,n)

get data from the fifo

Parameters

fifo

address of the fifo to be used

buf

pointer to the storage buffer

n

max. number of elements to get

Description

This macro gets some data from the fifo and returns the numbers of elementscopied.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_out_spinlocked

kfifo_out_spinlocked(fifo,buf,n,lock)

get data from the fifo using a spinlock for locking

Parameters

fifo

address of the fifo to be used

buf

pointer to the storage buffer

n

max. number of elements to get

lock

pointer to the spinlock to use for locking

Description

This macro gets the data from the fifo and returns the numbers of elementscopied.

kfifo_out_spinlocked_noirqsave

kfifo_out_spinlocked_noirqsave(fifo,buf,n,lock)

get data from the fifo using a spinlock for locking, don’t disable interrupts

Parameters

fifo

address of the fifo to be used

buf

pointer to the storage buffer

n

max. number of elements to get

lock

pointer to the spinlock to use for locking

Description

This is a variant ofkfifo_out_spinlocked() which uses spin_lock/unlock()for locking and doesn’t disable interrupts.

kfifo_from_user

kfifo_from_user(fifo,from,len,copied)

puts some data from user space into the fifo

Parameters

fifo

address of the fifo to be used

from

pointer to the data to be added

len

the length of the data to be added

copied

pointer to output variable to store the number of copied bytes

Description

This macro copies at mostlen bytes from thefrom into thefifo, depending of the available space and returns -EFAULT/0.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_to_user

kfifo_to_user(fifo,to,len,copied)

copies data from the fifo into user space

Parameters

fifo

address of the fifo to be used

to

where the data must be copied

len

the size of the destination buffer

copied

pointer to output variable to store the number of copied bytes

Description

This macro copies at mostlen bytes from the fifo into theto buffer and returns -EFAULT/0.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_dma_in_prepare_mapped

kfifo_dma_in_prepare_mapped(fifo,sgl,nents,len,dma)

setup a scatterlist for DMA input

Parameters

fifo

address of the fifo to be used

sgl

pointer to the scatterlist array

nents

number of entries in the scatterlist array

len

number of elements to transfer

dma

mapped dma address to fill intosgl

Description

This macro fills a scatterlist for DMA input.It returns the number entries in the scatterlist array.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.

kfifo_dma_in_finish

kfifo_dma_in_finish(fifo,len)

finish a DMA IN operation

Parameters

fifo

address of the fifo to be used

len

number of bytes to received

Description

This macro finishes a DMA IN operation. The in counter will be updated bythe len parameter. No error checking will be done.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.

kfifo_dma_out_prepare_mapped

kfifo_dma_out_prepare_mapped(fifo,sgl,nents,len,dma)

setup a scatterlist for DMA output

Parameters

fifo

address of the fifo to be used

sgl

pointer to the scatterlist array

nents

number of entries in the scatterlist array

len

number of elements to transfer

dma

mapped dma address to fill intosgl

Description

This macro fills a scatterlist for DMA output which at mostlen bytesto transfer.It returns the number entries in the scatterlist array.A zero means there is no space available and the scatterlist is not filled.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.

kfifo_dma_out_finish

kfifo_dma_out_finish(fifo,len)

finish a DMA OUT operation

Parameters

fifo

address of the fifo to be used

len

number of bytes transferred

Description

This macro finishes a DMA OUT operation. The out counter will be updated bythe len parameter. No error checking will be done.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macros.

kfifo_out_peek

kfifo_out_peek(fifo,buf,n)

gets some data from the fifo

Parameters

fifo

address of the fifo to be used

buf

pointer to the storage buffer

n

max. number of elements to get

Description

This macro gets the data from the fifo and returns the numbers of elementscopied. The data is not removed from the fifo.

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_out_linear

kfifo_out_linear(fifo,tail,n)

gets a tail of/offset to available data

Parameters

fifo

address of the fifo to be used

tail

pointer to an unsigned int to store the value of tail

n

max. number of elements to point at

Description

This macro obtains the offset (tail) to the available data in the fifobuffer and returns thenumbers of elements available. It returns the available count till the endof data or till the end of the buffer. So that it can be used for lineardata processing (likememcpy() of (fifo->data +tail) with countreturned).

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

kfifo_out_linear_ptr

kfifo_out_linear_ptr(fifo,ptr,n)

gets a pointer to the available data

Parameters

fifo

address of the fifo to be used

ptr

pointer to data to store the pointer to tail

n

max. number of elements to point at

Description

Similarly tokfifo_out_linear(), this macro obtains the pointer to theavailable data in the fifo buffer and returns the numbers of elementsavailable. It returns the available count till the end of available data ortill the end of the buffer. So that it can be used for linear dataprocessing (likememcpy() ofptr with count returned).

Note that with only one concurrent reader and one concurrentwriter, you don’t need extra locking to use these macro.

relay interface support

Relay interface support is designed to provide an efficient mechanismfor tools and facilities to relay large amounts of data from kernelspace to user space.

relay interface

intrelay_buf_full(structrchan_buf*buf)

boolean, is the channel buffer full?

Parameters

structrchan_buf*buf

channel buffer

Returns 1 if the buffer is full, 0 otherwise.

voidrelay_reset(structrchan*chan)

reset the channel

Parameters

structrchan*chan

the channel

This has the effect of erasing all data from all channel buffersand restarting the channel in its initial state. The buffersare not freed, so any mappings are still in effect.

NOTE. Care should be taken that the channel isn’t actuallybeing used by anything when this call is made.

structrchan*relay_open(constchar*base_filename,structdentry*parent,size_tsubbuf_size,size_tn_subbufs,conststructrchan_callbacks*cb,void*private_data)

create a new relay channel

Parameters

constchar*base_filename

base name of files to create

structdentry*parent

dentry of parent directory,NULL for root directory or buffer

size_tsubbuf_size

size of sub-buffers

size_tn_subbufs

number of sub-buffers

conststructrchan_callbacks*cb

client callback functions

void*private_data

user-defined data

Returns channel pointer if successful,NULL otherwise.

Creates a channel buffer for each cpu using the sizes andattributes specified. The created channel buffer fileswill be named base_filename0...base_filenameN-1. Filepermissions will beS_IRUSR.

size_trelay_switch_subbuf(structrchan_buf*buf,size_tlength)

switch to a new sub-buffer

Parameters

structrchan_buf*buf

channel buffer

size_tlength

size of current event

Returns either the length passed in or 0 if full.

Performs sub-buffer-switch tasks such as invoking callbacks,updating padding counts, waking up readers, etc.

voidrelay_subbufs_consumed(structrchan*chan,unsignedintcpu,size_tsubbufs_consumed)

update the buffer’s sub-buffers-consumed count

Parameters

structrchan*chan

the channel

unsignedintcpu

the cpu associated with the channel buffer to update

size_tsubbufs_consumed

number of sub-buffers to add to current buf’s count

Adds to the channel buffer’s consumed sub-buffer count.subbufs_consumed should be the number of sub-buffers newly consumed,not the total consumed.

NOTE. Kernel clients don’t need to call this function if the channelmode is ‘overwrite’.

voidrelay_close(structrchan*chan)

close the channel

Parameters

structrchan*chan

the channel

Closes all channel buffers and frees the channel.

voidrelay_flush(structrchan*chan)

close the channel

Parameters

structrchan*chan

the channel

Flushes all channel buffers, i.e. forces buffer switch.

intrelay_mmap_buf(structrchan_buf*buf,structvm_area_struct*vma)
  • mmap channel buffer to process address space

Parameters

structrchan_buf*buf

relay channel buffer

structvm_area_struct*vma

vm_area_struct describing memory to be mapped

Returns 0 if ok, negative on error

Caller should already have grabbed mmap_lock.

void*relay_alloc_buf(structrchan_buf*buf,size_t*size)

allocate a channel buffer

Parameters

structrchan_buf*buf

the buffer struct

size_t*size

total size of the buffer

Returns a pointer to the resulting buffer,NULL if unsuccessful. Thepassed in size will get page aligned, if it isn’t already.

structrchan_buf*relay_create_buf(structrchan*chan)

allocate and initialize a channel buffer

Parameters

structrchan*chan

the relay channel

Returns channel buffer if successful,NULL otherwise.

voidrelay_destroy_channel(structkref*kref)

free the channel struct

Parameters

structkref*kref

target kernel reference that contains the relay channel

Should only be called fromkref_put().

voidrelay_destroy_buf(structrchan_buf*buf)

destroy an rchan_buf struct and associated buffer

Parameters

structrchan_buf*buf

the buffer struct

voidrelay_remove_buf(structkref*kref)

remove a channel buffer

Parameters

structkref*kref

target kernel reference that contains the relay buffer

Removes the file from the filesystem, which also frees therchan_buf_struct and the channel buffer. Should only be called fromkref_put().

intrelay_buf_empty(structrchan_buf*buf)

boolean, is the channel buffer empty?

Parameters

structrchan_buf*buf

channel buffer

Returns 1 if the buffer is empty, 0 otherwise.

voidwakeup_readers(structirq_work*work)

wake up readers waiting on a channel

Parameters

structirq_work*work

contains the channel buffer

This is the function used to defer reader waking

void__relay_reset(structrchan_buf*buf,unsignedintinit)

reset a channel buffer

Parameters

structrchan_buf*buf

the channel buffer

unsignedintinit

1 if this is a first-time initialization

Seerelay_reset() for description of effect.

voidrelay_close_buf(structrchan_buf*buf)

close a channel buffer

Parameters

structrchan_buf*buf

channel buffer

Marks the buffer finalized and restores the default callbacks.The channel buffer and channel buffer data structure are then freedautomatically when the last reference is given up.

intrelay_file_open(structinode*inode,structfile*filp)

open file op for relay files

Parameters

structinode*inode

the inode

structfile*filp

the file

Increments the channel buffer refcount.

intrelay_file_mmap(structfile*filp,structvm_area_struct*vma)

mmap file op for relay files

Parameters

structfile*filp

the file

structvm_area_struct*vma

the vma describing what to map

Calls uponrelay_mmap_buf() to map the file into user space.

__poll_trelay_file_poll(structfile*filp,poll_table*wait)

poll file op for relay files

Parameters

structfile*filp

the file

poll_table*wait

poll table

Poll implemention.

intrelay_file_release(structinode*inode,structfile*filp)

release file op for relay files

Parameters

structinode*inode

the inode

structfile*filp

the file

Decrements the channel refcount, as the filesystem isno longer using it.

size_trelay_file_read_subbuf_avail(size_tread_pos,structrchan_buf*buf)

return bytes available in sub-buffer

Parameters

size_tread_pos

file read position

structrchan_buf*buf

relay channel buffer

size_trelay_file_read_start_pos(structrchan_buf*buf)

find the first available byte to read

Parameters

structrchan_buf*buf

relay channel buffer

If the read_pos is in the middle of padding, return theposition of the first actually available byte, otherwisereturn the original value.

size_trelay_file_read_end_pos(structrchan_buf*buf,size_tread_pos,size_tcount)

return the new read position

Parameters

structrchan_buf*buf

relay channel buffer

size_tread_pos

file read position

size_tcount

number of bytes to be read

Module Support

Kernel module auto-loading

int__request_module(boolwait,constchar*fmt,...)

try to load a kernel module

Parameters

boolwait

wait (or not) for the operation to complete

constchar*fmt

printf style format string for the name of the module

...

arguments as specified in the format string

Description

Load a module using the user mode module loader. The function returnszero on success or a negative errno code or positive exit code from“modprobe” on failure. Note that a successful module load does not meanthe module did not then unload and exit on an error of its own. Callersmust check that the service they requested is now available not blindlyinvoke it.

If module auto-loading support is disabled then this functionsimply returns -ENOENT.

Module debugging

Enabling CONFIG_MODULE_STATS enables module debugging statistics whichare useful to monitor and root cause memory pressure issues with moduleloading. These statistics are useful to allow us to improve productionworkloads.

The current module debugging statistics supported help keep track of moduleloading failures to enable improvements either for kernel module auto-loadingusage (request_module()) or interactions with userspace. Statistics areprovided to track all possible failures in the finit_module() path and memorywasted in this process space. Each of the failure counters are associatedto a type of module loading failure which is known to incur a certain amountof memory allocation loss. In the worst case loading a module will fail aftera 3 step memory allocation process:

  1. memory allocated with kernel_read_file_from_fd()

  2. module decompression processes the file read fromkernel_read_file_from_fd(), andvmap() is used to mapthe decompressed module to a new local buffer which representsa copy of the decompressed module passed from userspace. The bufferfrom kernel_read_file_from_fd() is freed right away.

  3. layout_and_allocate() allocates space for the final restingplace where we would keep the module if it were to be processedsuccessfully.

If a failure occurs after these three different allocations only onecounter will be incremented with the summation of the allocated bytes freedincurred during this failure. Likewise, if module loading failed only afterstep b) a separate counter is used and incremented for the bytes freed andnot used during both of those allocations.

Virtual memory space can be limited, for example on x86 virtual memory sizedefaults to 128 MiB. We should strive to limit and avoid wasting virtualmemory allocations when possible. These module debugging statistics helpto evaluate how much memory is being wasted on bootup due to module loadingfailures.

All counters are designed to be incremental. Atomic counters are used so toremain simple and avoid delays and deadlocks.

dup_failed_modules - tracks duplicate failed modules

Linked list of modules which failed to be loaded because an already existingmodule with the same name was already being processed or already loaded.The finit_module() system call incurs heavy virtual memory allocations. Inthe worst case an finit_module() system call can end up allocating virtualmemory 3 times:

  1. kernel_read_file_from_fd() call uses vmalloc()

  2. optional module decompression usesvmap()

  3. layout_and allocate() can use vzalloc() or an arch specific variation ofvmalloc to deal with ELF sections requiring special permissions

In practice on a typical boot today most finit_module() calls fail due tothe module with the same name already being loaded or about to be processed.All virtual memory allocated to these failed modules will be freed withno functional use.

To help with this the dup_failed_modules allows us to track modules whichfailed to load due to the fact that a module was already loaded or beingprocessed. There are only two points at which we can fail such calls,we list them below along with the number of virtual memory allocationcalls:

  1. FAIL_DUP_MOD_BECOMING: at the end of early_mod_check() beforelayout_and_allocate().- with module decompression: 2 virtual memory allocation calls- without module decompression: 1 virtual memory allocation calls

  2. FAIL_DUP_MOD_LOAD: after layout_and_allocate() on add_unformed_module()- with module decompression 3 virtual memory allocation calls- without module decompression 2 virtual memory allocation calls

We should strive to get this list to be as small as possible. If this listis not empty it is a reflection of possible work or optimizations possibleeither in-kernel or in userspace.

module statistics debugfs counters

The total amount of wasted virtual memory allocation space during moduleloading can be computed by adding the total from the summation:

  • invalid_kread_bytes +invalid_decompress_bytes +invalid_becoming_bytes +invalid_mod_bytes

The following debugfs counters are available to inspect module loadingfailures:

  • total_mod_size: total bytes ever used by all modules we’ve dealt with onthis system

  • total_text_size: total bytes of the .text and .init.text ELF sectionsizes we’ve dealt with on this system

  • invalid_kread_bytes: bytes allocated and then freed on failures whichhappen due to the initial kernel_read_file_from_fd(). kernel_read_file_from_fd()uses vmalloc(). These should typically not happen unless your system isunder memory pressure.

  • invalid_decompress_bytes: number of bytes allocated and freed due tomemory allocations in the module decompression path that usevmap().These typically should not happen unless your system is under memorypressure.

  • invalid_becoming_bytes: total number of bytes allocated and freed usedto read the kernel module userspace wants us to read before wepromote it to be processed to be added to ourmodules linked list. Thesefailures can happen if we had a check in between a successful kernel_read_file_from_fd()call and right before we allocate the our private memory for the modulewhich would be kept if the module is successfully loaded. The most commonreason for this failure is when userspace is racing to load a modulewhich it does not yet see loaded. The first module to succeed inadd_unformed_module() will add a module to ourmodules list andsubsequent loads of modules with the same name will error out at theend of early_mod_check(). The check for module_patient_check_exists()at the end of early_mod_check() prevents duplicate allocationson layout_and_allocate() for modules already being processed. Theseduplicate failed modules are non-fatal, however they typically areindicative of userspace not seeing a module in userspace loaded yet andunnecessarily trying to load a module before the kernel even has a chanceto begin to process prior requests. Although duplicate failures can benon-fatal, we should try to reduce vmalloc() pressure proactively, soideally after boot this will be close to as 0 as possible. If moduledecompression was used we also add to this counter the cost of theinitial kernel_read_file_from_fd() of the compressed module. If moduledecompression was not used the value represents the total allocated andfreed bytes in kernel_read_file_from_fd() calls for these type offailures. These failures can occur because:

  • module_sig_check() - module signature checks

  • elf_validity_cache_copy() - some ELF validation issue

  • early_mod_check():

    • blacklisting

    • failed to rewrite section headers

    • version magic

    • live patch requirements didn’t check out

    • the module was detected as being already present

  • invalid_mod_bytes: these are the total number of bytes allocated andfreed due to failures after we did all the sanity checks of the modulewhich userspace passed to us and after our first check that the moduleis unique. A module can still fail to load if we detect the module isloaded after we allocate space for it with layout_and_allocate(), we dothis check right before processing the module as live and run itsinitialization routines. Note that you have a failure of this type italso means the respective kernel_read_file_from_fd() memory space wasalso freed and not used, and so we increment this counter with twicethe size of the module. Additionally if you used module decompressionthe size of the compressed module is also added to this counter.

  • modcount: how many modules we’ve loaded in our kernel life time

  • failed_kreads: how many modules failed due to failed kernel_read_file_from_fd()

  • failed_decompress: how many failed module decompression attempts we’ve had.These really should not happen unless your compression / decompressionmight be broken.

  • failed_becoming: how many modules failed after we kernel_read_file_from_fd()it and before we allocate memory for it with layout_and_allocate(). Thiscounter is never incremented if you manage to validate the module andcall layout_and_allocate() for it.

  • failed_load_modules: how many modules failed once we’ve allocated ourprivate space for our module using layout_and_allocate(). These failuresshould hopefully mostly be dealt with already. Races in theory couldstill exist here, but it would just mean the kernel had started processingtwo threads concurrently up to early_mod_check() and one thread won.These failures are good signs the kernel or userspace is doing somethingseriously stupid or that could be improved. We should strive to fix these,but it is perhaps not easy to fix them. A recent example are the modulesrequests incurred for frequency modules, a separate module request wasbeing issued for each CPU on a system.

Inter Module support

Refer to the files in kernel/module/ for more information.

Hardware Interfaces

DMA Channels

intrequest_dma(unsignedintdmanr,constchar*device_id)

request and reserve a system DMA channel

Parameters

unsignedintdmanr

DMA channel number

constchar*device_id

reserving device ID string, used in /proc/dma

voidfree_dma(unsignedintdmanr)

free a reserved system DMA channel

Parameters

unsignedintdmanr

DMA channel number

Resources Management

structresource*request_resource_conflict(structresource*root,structresource*new)

request and reserve an I/O or memory resource

Parameters

structresource*root

root resource descriptor

structresource*new

resource descriptor desired by caller

Description

Returns 0 for success, conflict resource on error.

intfind_next_iomem_res(resource_size_tstart,resource_size_tend,unsignedlongflags,unsignedlongdesc,structresource*res)

Finds the lowest iomem resource that covers part of [start..**end**].

Parameters

resource_size_tstart

start address of the resource searched for

resource_size_tend

end address of same resource

unsignedlongflags

flags which the resource must have

unsignedlongdesc

descriptor the resource must have

structresource*res

return ptr, if resource found

Description

If a resource is found, returns 0 and***res is overwritten with the partof the resource that’s within [**start..**end**]; if none is found, returns-ENODEV. Returns -EINVAL for invalid parameters.

The caller must specifystart,end,flags, anddesc(which may be IORES_DESC_NONE).

intreallocate_resource(structresource*root,structresource*old,resource_size_tnewsize,structresource_constraint*constraint)

allocate a slot in the resource tree given range & alignment. The resource will be relocated if the new size cannot be reallocated in the current location.

Parameters

structresource*root

root resource descriptor

structresource*old

resource descriptor desired by caller

resource_size_tnewsize

new size of the resource descriptor

structresource_constraint*constraint

the memory range and alignment constraints to be met.

structresource*lookup_resource(structresource*root,resource_size_tstart)

find an existing resource by a resource start address

Parameters

structresource*root

root resource descriptor

resource_size_tstart

resource start address

Description

Returns a pointer to the resource if found, NULL otherwise

structresource*insert_resource_conflict(structresource*parent,structresource*new)

Inserts resource in the resource tree

Parameters

structresource*parent

parent of the new resource

structresource*new

new resource to insert

Description

Returns 0 on success, conflict resource if the resource can’t be inserted.

This function is equivalent to request_resource_conflict when no conflicthappens. If a conflict happens, and the conflicting resourcesentirely fit within the range of the new resource, then the newresource is inserted and the conflicting resources become children ofthe new resource.

This function is intended for producers of resources, such as FW modulesand bus drivers.

resource_size_tresource_alignment(structresource*res)

calculate resource’s alignment

Parameters

structresource*res

resource pointer

Description

Returns alignment on success, 0 (invalid alignment) on failure.

voidrelease_mem_region_adjustable(resource_size_tstart,resource_size_tsize)

release a previously reserved memory region

Parameters

resource_size_tstart

resource start address

resource_size_tsize

resource region size

Description

This interface is intended for memory hot-delete. The requested regionis released from a currently busy memory resource. The requested regionmust either match exactly or fit into a single busy resource entry. Inthe latter case, the remaining resource is adjusted accordingly.Existing children of the busy memory resource must be immutable in therequest.

Note

  • Additional release conditions, such as overlapping region, can besupported after they are confirmed as valid cases.

  • When a busy memory resource gets split into two entries, the codeassumes that all children remain in the lower address entry forsimplicity. Enhance this logic when necessary.

voidmerge_system_ram_resource(structresource*res)

mark the System RAM resource mergeable and try to merge it with adjacent, mergeable resources

Parameters

structresource*res

resource descriptor

Description

This interface is intended for memory hotplug, whereby lots of contiguoussystem ram resources are added (e.g., via add_memory*()) by a driver, andthe actual resource boundaries are not of interest (e.g., it might berelevant for DIMMs). Only resources that are marked mergeable, that have thesame parent, and that don’t have any children are considered. All mergeableresources must be immutable during the request.

Note

  • The caller has to make sure that no pointers to resources that aremarked mergeable are used anymore after this call - the resource mightbe freed and the pointer might be stale!

  • release_mem_region_adjustable() will split on demand on memory hotunplug

intrequest_resource(structresource*root,structresource*new)

request and reserve an I/O or memory resource

Parameters

structresource*root

root resource descriptor

structresource*new

resource descriptor desired by caller

Description

Returns 0 for success, negative error code on error.

intrelease_resource(structresource*old)

release a previously reserved resource

Parameters

structresource*old

resource pointer

intwalk_iomem_res_desc(unsignedlongdesc,unsignedlongflags,u64start,u64end,void*arg,int(*func)(structresource*,void*))

Walks through iomem resources and calls func() with matching resource ranges. *

Parameters

unsignedlongdesc

I/O resource descriptor. Use IORES_DESC_NONE to skipdesc check.

unsignedlongflags

I/O resource flags

u64start

start addr

u64end

end addr

void*arg

function argument for the callbackfunc

int(*func)(structresource*,void*)

callback function that is called for each qualifying resource area

Description

All the memory ranges which overlap start,end and also match flags anddesc are valid candidates.

NOTE

For a new descriptor search, define a new IORES_DESC in<linux/ioport.h> and set it in ‘desc’ of a target resource entry.

intregion_intersects(resource_size_tstart,size_tsize,unsignedlongflags,unsignedlongdesc)

determine intersection of region with known resources

Parameters

resource_size_tstart

region start address

size_tsize

size of region

unsignedlongflags

flags of resource (in iomem_resource)

unsignedlongdesc

descriptor of resource (in iomem_resource) or IORES_DESC_NONE

Description

Check if the specified region partially overlaps or fully eclipses aresource identified byflags anddesc (optional with IORES_DESC_NONE).Return REGION_DISJOINT if the region does not overlapflags/desc,return REGION_MIXED if the region overlapsflags/desc and anotherresource, and return REGION_INTERSECTS if the region overlapsflags/descand no other defined resource. Note that REGION_INTERSECTS is alsoreturned in the case when the specified region overlaps RAM and undefinedmemory holes.

region_intersect() is used by memory remapping functions to ensurethe user is not remapping RAM and is a vast speed up over walkingthrough the resource table page by page.

intfind_resource_space(structresource*root,structresource*new,resource_size_tsize,structresource_constraint*constraint)

Find empty space in the resource tree

Parameters

structresource*root

Root resource descriptor

structresource*new

Resource descriptor awaiting an empty resource space

resource_size_tsize

The minimum size of the empty space

structresource_constraint*constraint

The range and alignment constraints to be met

Description

Finds an empty space underroot in the resource tree satisfying range andalignmentconstraints.

Return

  • 0 - if successful,new members start, end, and flags are altered.

  • -EBUSY - if no empty space was found.

intallocate_resource(structresource*root,structresource*new,resource_size_tsize,resource_size_tmin,resource_size_tmax,resource_size_talign,resource_alignfalignf,void*alignf_data)

allocate empty slot in the resource tree given range & alignment. The resource will be reallocated with a new size if it was already allocated

Parameters

structresource*root

root resource descriptor

structresource*new

resource descriptor desired by caller

resource_size_tsize

requested resource region size

resource_size_tmin

minimum boundary to allocate

resource_size_tmax

maximum boundary to allocate

resource_size_talign

alignment requested, in bytes

resource_alignfalignf

alignment function, optional, called if not NULL

void*alignf_data

arbitrary data to pass to thealignf function

intinsert_resource(structresource*parent,structresource*new)

Inserts a resource in the resource tree

Parameters

structresource*parent

parent of the new resource

structresource*new

new resource to insert

Description

Returns 0 on success, -EBUSY if the resource can’t be inserted.

This function is intended for producers of resources, such as FW modulesand bus drivers.

voidinsert_resource_expand_to_fit(structresource*root,structresource*new)

Insert a resource into the resource tree

Parameters

structresource*root

root resource descriptor

structresource*new

new resource to insert

Description

Insert a resource into the resource tree, possibly expanding it in orderto make it encompass any conflicting resources.

intremove_resource(structresource*old)

Remove a resource in the resource tree

Parameters

structresource*old

resource to remove

Description

Returns 0 on success, -EINVAL if the resource is not valid.

This function removes a resource previously inserted byinsert_resource()orinsert_resource_conflict(), and moves the children (if any) up towhere they were before.insert_resource() andinsert_resource_conflict()insert a new resource, and move any conflicting resources down to thechildren of the new resource.

insert_resource(),insert_resource_conflict() andremove_resource() areintended for producers of resources, such as FW modules and bus drivers.

intadjust_resource(structresource*res,resource_size_tstart,resource_size_tsize)

modify a resource’s start and size

Parameters

structresource*res

resource to modify

resource_size_tstart

new start value

resource_size_tsize

new size

Description

Given an existing resource, change its start and size to match thearguments. Returns 0 on success, -EBUSY if it can’t fit.Existing children of the resource are assumed to be immutable.

structresource*__request_region(structresource*parent,resource_size_tstart,resource_size_tn,constchar*name,intflags)

create a new busy resource region

Parameters

structresource*parent

parent resource descriptor

resource_size_tstart

resource start address

resource_size_tn

resource region size

constchar*name

reserving caller’s ID string

intflags

IO resource flags

void__release_region(structresource*parent,resource_size_tstart,resource_size_tn)

release a previously reserved resource region

Parameters

structresource*parent

parent resource descriptor

resource_size_tstart

resource start address

resource_size_tn

resource region size

Description

The described resource region must match a currently busy region.

intdevm_request_resource(structdevice*dev,structresource*root,structresource*new)

request and reserve an I/O or memory resource

Parameters

structdevice*dev

device for which to request the resource

structresource*root

root of the resource tree from which to request the resource

structresource*new

descriptor of the resource to request

Description

This is a device-managed version ofrequest_resource(). There is usuallyno need to release resources requested by this function explicitly sincethat will be taken care of when the device is unbound from its driver.If for some reason the resource needs to be released explicitly, becauseof ordering issues for example, drivers must calldevm_release_resource()rather than the regularrelease_resource().

When a conflict is detected between any existing resources and the newlyrequested resource, an error message will be printed.

Returns 0 on success or a negative error code on failure.

voiddevm_release_resource(structdevice*dev,structresource*new)

release a previously requested resource

Parameters

structdevice*dev

device for which to release the resource

structresource*new

descriptor of the resource to release

Description

Releases a resource previously requested usingdevm_request_resource().

structresource*devm_request_free_mem_region(structdevice*dev,structresource*base,unsignedlongsize)

find free region for device private memory

Parameters

structdevice*dev

device struct to bind the resource to

structresource*base

resource tree to look in

unsignedlongsize

size in bytes of the device memory to add

Description

This function tries to find an empty range of physical address big enough tocontain the new resource, so that it can later be hotplugged as ZONE_DEVICEmemory, which in turn allocates struct pages.

structresource*alloc_free_mem_region(structresource*base,unsignedlongsize,unsignedlongalign,constchar*name)

find a free region relative tobase

Parameters

structresource*base

resource that will parent the new resource

unsignedlongsize

size in bytes of memory to allocate frombase

unsignedlongalign

alignment requirements for the allocation

constchar*name

resource name

Description

Buses like CXL, that can dynamically instantiate new memory regions,need a method to allocate physical address space for those regions.Allocate and insert a new resource to cover a free, unclaimed by adescendant ofbase, range in the span ofbase.

MTRR Handling

intarch_phys_wc_add(unsignedlongbase,unsignedlongsize)

add a WC MTRR and handle errors if PAT is unavailable

Parameters

unsignedlongbase

Physical base address

unsignedlongsize

Size of region

Description

If PAT is available, this does nothing. If PAT is unavailable, itattempts to add a WC MTRR covering size bytes starting at base andlogs an error if this fails.

The called should provide a power of two size on an equivalentpower of two boundary.

Drivers must store the return value to pass to mtrr_del_wc_if_needed,but drivers should not try to interpret that return value.

Security Framework

intsecurity_init(void)

initializes the security framework

Parameters

void

no arguments

Description

This should be called early in the kernel initialization sequence.

voidsecurity_add_hooks(structsecurity_hook_list*hooks,intcount,conststructlsm_id*lsmid)

Add a modules hooks to the hook lists.

Parameters

structsecurity_hook_list*hooks

the hooks to add

intcount

the number of hooks to add

conststructlsm_id*lsmid

the identification information for the security module

Description

Each LSM has to register its hooks with the infrastructure.

intlsm_blob_alloc(void**dest,size_tsize,gfp_tgfp)

allocate a composite blob

Parameters

void**dest

the destination for the blob

size_tsize

the size of the blob

gfp_tgfp

allocation type

Description

Allocate a blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_cred_alloc(structcred*cred,gfp_tgfp)

allocate a composite cred blob

Parameters

structcred*cred

the cred that needs a blob

gfp_tgfp

allocation type

Description

Allocate the cred blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

voidlsm_early_cred(structcred*cred)

during initialization allocate a composite cred blob

Parameters

structcred*cred

the cred that needs a blob

Description

Allocate the cred blob for all the modules

intlsm_file_alloc(structfile*file)

allocate a composite file blob

Parameters

structfile*file

the file that needs a blob

Description

Allocate the file blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_inode_alloc(structinode*inode,gfp_tgfp)

allocate a composite inode blob

Parameters

structinode*inode

the inode that needs a blob

gfp_tgfp

allocation flags

Description

Allocate the inode blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_task_alloc(structtask_struct*task)

allocate a composite task blob

Parameters

structtask_struct*task

the task that needs a blob

Description

Allocate the task blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_ipc_alloc(structkern_ipc_perm*kip)

allocate a composite ipc blob

Parameters

structkern_ipc_perm*kip

the ipc that needs a blob

Description

Allocate the ipc blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_key_alloc(structkey*key)

allocate a composite key blob

Parameters

structkey*key

the key that needs a blob

Description

Allocate the key blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_msg_msg_alloc(structmsg_msg*mp)

allocate a composite msg_msg blob

Parameters

structmsg_msg*mp

the msg_msg that needs a blob

Description

Allocate the ipc blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_bdev_alloc(structblock_device*bdev)

allocate a composite block_device blob

Parameters

structblock_device*bdev

the block_device that needs a blob

Description

Allocate the block_device blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

voidlsm_early_task(structtask_struct*task)

during initialization allocate a composite task blob

Parameters

structtask_struct*task

the task that needs a blob

Description

Allocate the task blob for all the modules

intlsm_superblock_alloc(structsuper_block*sb)

allocate a composite superblock blob

Parameters

structsuper_block*sb

the superblock that needs a blob

Description

Allocate the superblock blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intlsm_fill_user_ctx(structlsm_ctx__user*uctx,u32*uctx_len,void*val,size_tval_len,u64id,u64flags)

Fill a user space lsm_ctx structure

Parameters

structlsm_ctx__user*uctx

a userspace LSM context to be filled

u32*uctx_len

available uctx size (input), used uctx size (output)

void*val

the new LSM context value

size_tval_len

the size of the new LSM context value

u64id

LSM id

u64flags

LSM defined flags

Description

Fill all of the fields in a userspace lsm_ctx structure. Ifuctx is NULLsimply calculate the required size to output viautc_len and returnsuccess.

Returns 0 on success, -E2BIG if userspace buffer is not large enough,-EFAULT on a copyout error, -ENOMEM if memory can’t be allocated.

intsecurity_binder_set_context_mgr(conststructcred*mgr)

Check if becoming binder ctx mgr is ok

Parameters

conststructcred*mgr

task credentials of current binder process

Description

Check whethermgr is allowed to be the binder context manager.

Return

Return 0 if permission is granted.

intsecurity_binder_transaction(conststructcred*from,conststructcred*to)

Check if a binder transaction is allowed

Parameters

conststructcred*from

sending process

conststructcred*to

receiving process

Description

Check whetherfrom is allowed to invoke a binder transaction call toto.

Return

Returns 0 if permission is granted.

intsecurity_binder_transfer_binder(conststructcred*from,conststructcred*to)

Check if a binder transfer is allowed

Parameters

conststructcred*from

sending process

conststructcred*to

receiving process

Description

Check whetherfrom is allowed to transfer a binder reference toto.

Return

Returns 0 if permission is granted.

intsecurity_binder_transfer_file(conststructcred*from,conststructcred*to,conststructfile*file)

Check if a binder file xfer is allowed

Parameters

conststructcred*from

sending process

conststructcred*to

receiving process

conststructfile*file

file being transferred

Description

Check whetherfrom is allowed to transferfile toto.

Return

Returns 0 if permission is granted.

intsecurity_ptrace_access_check(structtask_struct*child,unsignedintmode)

Check if tracing is allowed

Parameters

structtask_struct*child

target process

unsignedintmode

PTRACE_MODE flags

Description

Check permission before allowing the current process to trace thechildprocess. Security modules may also want to perform a process tracing checkduring an execve in the set_security or apply_creds hooks of tracing checkduring an execve in the bprm_set_creds hook of binprm_security_ops if theprocess is being traced and its security attributes would be changed by theexecve.

Return

Returns 0 if permission is granted.

intsecurity_ptrace_traceme(structtask_struct*parent)

Check if tracing is allowed

Parameters

structtask_struct*parent

tracing process

Description

Check that theparent process has sufficient permission to trace thecurrent process before allowing the current process to present itself to theparent process for tracing.

Return

Returns 0 if permission is granted.

intsecurity_capget(conststructtask_struct*target,kernel_cap_t*effective,kernel_cap_t*inheritable,kernel_cap_t*permitted)

Get the capability sets for a process

Parameters

conststructtask_struct*target

target process

kernel_cap_t*effective

effective capability set

kernel_cap_t*inheritable

inheritable capability set

kernel_cap_t*permitted

permitted capability set

Description

Get theeffective,inheritable, andpermitted capability sets for thetarget process. The hook may also perform permission checking to determineif the current process is allowed to see the capability sets of thetargetprocess.

Return

Returns 0 if the capability sets were successfully obtained.

intsecurity_capset(structcred*new,conststructcred*old,constkernel_cap_t*effective,constkernel_cap_t*inheritable,constkernel_cap_t*permitted)

Set the capability sets for a process

Parameters

structcred*new

new credentials for the target process

conststructcred*old

current credentials of the target process

constkernel_cap_t*effective

effective capability set

constkernel_cap_t*inheritable

inheritable capability set

constkernel_cap_t*permitted

permitted capability set

Description

Set theeffective,inheritable, andpermitted capability sets for thecurrent process.

Return

Returns 0 and updatenew if permission is granted.

intsecurity_capable(conststructcred*cred,structuser_namespace*ns,intcap,unsignedintopts)

Check if a process has the necessary capability

Parameters

conststructcred*cred

credentials to examine

structuser_namespace*ns

user namespace

intcap

capability requested

unsignedintopts

capability check options

Description

Check whether thetsk process has thecap capability in the indicatedcredentials.cap contains the capability <include/linux/capability.h>.opts contains options for the capable check <include/linux/security.h>.

Return

Returns 0 if the capability is granted.

intsecurity_quotactl(intcmds,inttype,intid,conststructsuper_block*sb)

Check if a quotactl() syscall is allowed for this fs

Parameters

intcmds

commands

inttype

type

intid

id

conststructsuper_block*sb

filesystem

Description

Check whether the quotactl syscall is allowed for thissb.

Return

Returns 0 if permission is granted.

intsecurity_quota_on(structdentry*dentry)

Check if QUOTAON is allowed for a dentry

Parameters

structdentry*dentry

dentry

Description

Check whether QUOTAON is allowed fordentry.

Return

Returns 0 if permission is granted.

intsecurity_syslog(inttype)

Check if accessing the kernel message ring is allowed

Parameters

inttype

SYSLOG_ACTION_* type

Description

Check permission before accessing the kernel message ring or changinglogging to the console. See the syslog(2) manual page for an explanation ofthetype values.

Return

Return 0 if permission is granted.

intsecurity_settime64(conststructtimespec64*ts,conststructtimezone*tz)

Check if changing the system time is allowed

Parameters

conststructtimespec64*ts

new time

conststructtimezone*tz

timezone

Description

Check permission to change the system time, struct timespec64 is defined in<include/linux/time64.h> and timezone is defined in <include/linux/time.h>.

Return

Returns 0 if permission is granted.

intsecurity_vm_enough_memory_mm(structmm_struct*mm,longpages)

Check if allocating a new mem map is allowed

Parameters

structmm_struct*mm

mm struct

longpages

number of pages

Description

Check permissions for allocating a new virtual mapping. If all LSMs returna positive value, __vm_enough_memory() will be called with cap_sys_adminset. If at least one LSM returns 0 or negative, __vm_enough_memory() will becalled with cap_sys_admin cleared.

Return

Returns 0 if permission is granted by the LSM infrastructure to the

caller.

intsecurity_bprm_creds_for_exec(structlinux_binprm*bprm)

Prepare the credentials for exec()

Parameters

structlinux_binprm*bprm

binary program information

Description

If the setup in prepare_exec_creds did not setupbprm->cred->securityproperly for executingbprm->file, update the LSM’s portion ofbprm->cred->security to be what commit_creds needs to install for the newprogram. This hook may also optionally check permissions (e.g. fortransitions between security domains). The hook must setbprm->secureexecto 1 if AT_SECURE should be set to request libc enable secure mode.bprmcontains the linux_binprm structure.

If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check isset. The result must be the same as without this flag even if the executionwill never really happen andbprm will always be dropped.

This hook must not change current->cred, onlybprm->cred.

Return

Returns 0 if the hook is successful and permission is granted.

intsecurity_bprm_creds_from_file(structlinux_binprm*bprm,conststructfile*file)

Update linux_binprm creds based on file

Parameters

structlinux_binprm*bprm

binary program information

conststructfile*file

associated file

Description

Iffile is setpcap, suid, sgid or otherwise marked to change privilege uponexec, updatebprm->cred to reflect that change. This is called afterfinding the binary that will be executed without an interpreter. Thisensures that the credentials will not be derived from a script that thebinary will need to reopen, which when reopend may end up being a completelydifferent file. This hook may also optionally check permissions (e.g. fortransitions between security domains). The hook must setbprm->secureexecto 1 if AT_SECURE should be set to request libc enable secure mode. Thehook must add tobprm->per_clear any personality flags that should becleared from current->personality.bprm contains the linux_binprmstructure.

Return

Returns 0 if the hook is successful and permission is granted.

intsecurity_bprm_check(structlinux_binprm*bprm)

Mediate binary handler search

Parameters

structlinux_binprm*bprm

binary program information

Description

This hook mediates the point when a search for a binary handler will begin.It allows a check against thebprm->cred->security value which was set inthe preceding creds_for_exec call. The argv list and envp list are reliablyavailable inbprm. This hook may be called multiple times during a singleexecve.bprm contains the linux_binprm structure.

Return

Returns 0 if the hook is successful and permission is granted.

voidsecurity_bprm_committing_creds(conststructlinux_binprm*bprm)

Install creds for a process during exec()

Parameters

conststructlinux_binprm*bprm

binary program information

Description

Prepare to install the new security attributes of a process beingtransformed by an execve operation, based on the old credentials pointed tobycurrent->cred and the information set inbprm->cred by thebprm_creds_for_exec hook.bprm points to the linux_binprm structure. Thishook is a good place to perform state changes on the process such as closingopen file descriptors to which access will no longer be granted when theattributes are changed. This is called immediately before commit_creds().

voidsecurity_bprm_committed_creds(conststructlinux_binprm*bprm)

Tidy up after cred install during exec()

Parameters

conststructlinux_binprm*bprm

binary program information

Description

Tidy up after the installation of the new security attributes of a processbeing transformed by an execve operation. The new credentials have, by thispoint, been set tocurrent->cred.bprm points to the linux_binprmstructure. This hook is a good place to perform state changes on theprocess such as clearing out non-inheritable signal state. This is calledimmediately after commit_creds().

intsecurity_fs_context_submount(structfs_context*fc,structsuper_block*reference)

Initialise fc->security

Parameters

structfs_context*fc

new filesystem context

structsuper_block*reference

dentry reference for submount/remount

Description

Fill out the ->security field for a new fs_context.

Return

Returns 0 on success or negative error code on failure.

intsecurity_fs_context_dup(structfs_context*fc,structfs_context*src_fc)

Duplicate a fs_context LSM blob

Parameters

structfs_context*fc

destination filesystem context

structfs_context*src_fc

source filesystem context

Description

Allocate and attach a security structure to sc->security. This pointer isinitialised to NULL by the caller.fc indicates the new filesystem context.src_fc indicates the original filesystem context.

Return

Returns 0 on success or a negative error code on failure.

intsecurity_fs_context_parse_param(structfs_context*fc,structfs_parameter*param)

Configure a filesystem context

Parameters

structfs_context*fc

filesystem context

structfs_parameter*param

filesystem parameter

Description

Userspace provided a parameter to configure a superblock. The LSM canconsume the parameter or return it to the caller for use elsewhere.

Return

If the parameter is used by the LSM it should return 0, if it is

returned to the caller -ENOPARAM is returned, otherwise a negativeerror code is returned.

intsecurity_sb_alloc(structsuper_block*sb)

Allocate a super_block LSM blob

Parameters

structsuper_block*sb

filesystem superblock

Description

Allocate and attach a security structure to the sb->s_security field. Thes_security field is initialized to NULL when the structure is allocated.sb contains the super_block structure to be modified.

Return

Returns 0 if operation was successful.

voidsecurity_sb_delete(structsuper_block*sb)

Release super_block LSM associated objects

Parameters

structsuper_block*sb

filesystem superblock

Description

Release objects tied to a superblock (e.g. inodes).sb contains thesuper_block structure being released.

voidsecurity_sb_free(structsuper_block*sb)

Free a super_block LSM blob

Parameters

structsuper_block*sb

filesystem superblock

Description

Deallocate and clear the sb->s_security field.sb contains the super_blockstructure to be modified.

intsecurity_sb_kern_mount(conststructsuper_block*sb)

Check if a kernel mount is allowed

Parameters

conststructsuper_block*sb

filesystem superblock

Description

Mount thissb if allowed by permissions.

Return

Returns 0 if permission is granted.

intsecurity_sb_show_options(structseq_file*m,structsuper_block*sb)

Output the mount options for a superblock

Parameters

structseq_file*m

output file

structsuper_block*sb

filesystem superblock

Description

Show (print onm) mount options for thissb.

Return

Returns 0 on success, negative values on failure.

intsecurity_sb_statfs(structdentry*dentry)

Check if accessing fs stats is allowed

Parameters

structdentry*dentry

superblock handle

Description

Check permission before obtaining filesystem statistics for themntmountpoint.dentry is a handle on the superblock for the filesystem.

Return

Returns 0 if permission is granted.

intsecurity_sb_mount(constchar*dev_name,conststructpath*path,constchar*type,unsignedlongflags,void*data)

Check permission for mounting a filesystem

Parameters

constchar*dev_name

filesystem backing device

conststructpath*path

mount point

constchar*type

filesystem type

unsignedlongflags

mount flags

void*data

filesystem specific data

Description

Check permission before an object specified bydev_name is mounted on themount point named bynd. For an ordinary mount,dev_name identifies adevice if the file system type requires a device. For a remount(flags & MS_REMOUNT),dev_name is irrelevant. For a loopback/bind mount(flags & MS_BIND),dev_name identifies the pathname of the object beingmounted.

Return

Returns 0 if permission is granted.

intsecurity_sb_umount(structvfsmount*mnt,intflags)

Check permission for unmounting a filesystem

Parameters

structvfsmount*mnt

mounted filesystem

intflags

unmount flags

Description

Check permission before themnt file system is unmounted.

Return

Returns 0 if permission is granted.

intsecurity_sb_pivotroot(conststructpath*old_path,conststructpath*new_path)

Check permissions for pivoting the rootfs

Parameters

conststructpath*old_path

new location for current rootfs

conststructpath*new_path

location of the new rootfs

Description

Check permission before pivoting the root filesystem.

Return

Returns 0 if permission is granted.

intsecurity_move_mount(conststructpath*from_path,conststructpath*to_path)

Check permissions for moving a mount

Parameters

conststructpath*from_path

source mount point

conststructpath*to_path

destination mount point

Description

Check permission before a mount is moved.

Return

Returns 0 if permission is granted.

intsecurity_path_notify(conststructpath*path,u64mask,unsignedintobj_type)

Check if setting a watch is allowed

Parameters

conststructpath*path

file path

u64mask

event mask

unsignedintobj_type

file path type

Description

Check permissions before setting a watch on events as defined bymask, onan object atpath, whose type is defined byobj_type.

Return

Returns 0 if permission is granted.

intsecurity_inode_alloc(structinode*inode,gfp_tgfp)

Allocate an inode LSM blob

Parameters

structinode*inode

the inode

gfp_tgfp

allocation flags

Description

Allocate and attach a security structure toinode->i_security. Thei_security field is initialized to NULL when the inode structure isallocated.

Return

Return 0 if operation was successful.

voidsecurity_inode_free(structinode*inode)

Free an inode’s LSM blob

Parameters

structinode*inode

the inode

Description

Release any LSM resources associated withinode, although due to theinode’s RCU protections it is possible that the resources will not befully released until after the current RCU grace period has elapsed.

It is important for LSMs to note that despite being present in a call tosecurity_inode_free(),inode may still be referenced in a VFS path walkand calls tosecurity_inode_permission() may be made during, or after,a call tosecurity_inode_free(). For this reason the inode->i_securityfield is released via acall_rcu() callback and any LSMs which need toretain inode state for use insecurity_inode_permission() should onlyrelease that state in the inode_free_security_rcu() LSM hook callback.

intsecurity_inode_init_security_anon(structinode*inode,conststructqstr*name,conststructinode*context_inode)

Initialize an anonymous inode

Parameters

structinode*inode

the inode

conststructqstr*name

the anonymous inode class

conststructinode*context_inode

an optional related inode

Description

Set up the incore security field for the new anonymous inode and returnwhether the inode creation is permitted by the security module or not.

Return

Returns 0 on success, -EACCES if the security module denies thecreation of this inode, or another -errno upon other errors.

voidsecurity_path_post_mknod(structmnt_idmap*idmap,structdentry*dentry)

Update inode security after reg file creation

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

new file

Description

Update inode security field after a regular file has been created.

intsecurity_path_rmdir(conststructpath*dir,structdentry*dentry)

Check if removing a directory is allowed

Parameters

conststructpath*dir

parent directory

structdentry*dentry

directory to remove

Description

Check the permission to remove a directory.

Return

Returns 0 if permission is granted.

intsecurity_path_symlink(conststructpath*dir,structdentry*dentry,constchar*old_name)

Check if creating a symbolic link is allowed

Parameters

conststructpath*dir

parent directory

structdentry*dentry

symbolic link

constchar*old_name

file pathname

Description

Check the permission to create a symbolic link to a file.

Return

Returns 0 if permission is granted.

intsecurity_path_link(structdentry*old_dentry,conststructpath*new_dir,structdentry*new_dentry)

Check if creating a hard link is allowed

Parameters

structdentry*old_dentry

existing file

conststructpath*new_dir

new parent directory

structdentry*new_dentry

new link

Description

Check permission before creating a new hard link to a file.

Return

Returns 0 if permission is granted.

intsecurity_path_truncate(conststructpath*path)

Check if truncating a file is allowed

Parameters

conststructpath*path

file

Description

Check permission before truncating the file indicated by path. Note thattruncation permissions may also be checked based on already opened files,using thesecurity_file_truncate() hook.

Return

Returns 0 if permission is granted.

intsecurity_path_chmod(conststructpath*path,umode_tmode)

Check if changing the file’s mode is allowed

Parameters

conststructpath*path

file

umode_tmode

new mode

Description

Check for permission to change a mode of the filepath. The new mode isspecified inmode which is a bitmask of constants from<include/uapi/linux/stat.h>.

Return

Returns 0 if permission is granted.

intsecurity_path_chown(conststructpath*path,kuid_tuid,kgid_tgid)

Check if changing the file’s owner/group is allowed

Parameters

conststructpath*path

file

kuid_tuid

file owner

kgid_tgid

file group

Description

Check for permission to change owner/group of a file or directory.

Return

Returns 0 if permission is granted.

intsecurity_path_chroot(conststructpath*path)

Check if changing the root directory is allowed

Parameters

conststructpath*path

directory

Description

Check for permission to change root directory.

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_create_tmpfile(structmnt_idmap*idmap,structinode*inode)

Update inode security of new tmpfile

Parameters

structmnt_idmap*idmap

idmap of the mount

structinode*inode

inode of the new tmpfile

Description

Update inode security data after a tmpfile has been created.

intsecurity_inode_link(structdentry*old_dentry,structinode*dir,structdentry*new_dentry)

Check if creating a hard link is allowed

Parameters

structdentry*old_dentry

existing file

structinode*dir

new parent directory

structdentry*new_dentry

new link

Description

Check permission before creating a new hard link to a file.

Return

Returns 0 if permission is granted.

intsecurity_inode_unlink(structinode*dir,structdentry*dentry)

Check if removing a hard link is allowed

Parameters

structinode*dir

parent directory

structdentry*dentry

file

Description

Check the permission to remove a hard link to a file.

Return

Returns 0 if permission is granted.

intsecurity_inode_symlink(structinode*dir,structdentry*dentry,constchar*old_name)

Check if creating a symbolic link is allowed

Parameters

structinode*dir

parent directory

structdentry*dentry

symbolic link

constchar*old_name

existing filename

Description

Check the permission to create a symbolic link to a file.

Return

Returns 0 if permission is granted.

intsecurity_inode_rmdir(structinode*dir,structdentry*dentry)

Check if removing a directory is allowed

Parameters

structinode*dir

parent directory

structdentry*dentry

directory to be removed

Description

Check the permission to remove a directory.

Return

Returns 0 if permission is granted.

intsecurity_inode_mknod(structinode*dir,structdentry*dentry,umode_tmode,dev_tdev)

Check if creating a special file is allowed

Parameters

structinode*dir

parent directory

structdentry*dentry

new file

umode_tmode

new file mode

dev_tdev

device number

Description

Check permissions when creating a special file (or a socket or a fifo filecreated via the mknod system call). Note that if mknod operation is beingdone for a regular file, then the create hook will be called and not thishook.

Return

Returns 0 if permission is granted.

intsecurity_inode_rename(structinode*old_dir,structdentry*old_dentry,structinode*new_dir,structdentry*new_dentry,unsignedintflags)

Check if renaming a file is allowed

Parameters

structinode*old_dir

parent directory of the old file

structdentry*old_dentry

the old file

structinode*new_dir

parent directory of the new file

structdentry*new_dentry

the new file

unsignedintflags

flags

Description

Check for permission to rename a file or directory.

Return

Returns 0 if permission is granted.

intsecurity_inode_readlink(structdentry*dentry)

Check if reading a symbolic link is allowed

Parameters

structdentry*dentry

link

Description

Check the permission to read the symbolic link.

Return

Returns 0 if permission is granted.

intsecurity_inode_follow_link(structdentry*dentry,structinode*inode,boolrcu)

Check if following a symbolic link is allowed

Parameters

structdentry*dentry

link dentry

structinode*inode

link inode

boolrcu

true if in RCU-walk mode

Description

Check permission to follow a symbolic link when looking up a pathname. Ifrcu is true,inode is not stable.

Return

Returns 0 if permission is granted.

intsecurity_inode_permission(structinode*inode,intmask)

Check if accessing an inode is allowed

Parameters

structinode*inode

inode

intmask

access mask

Description

Check permission before accessing an inode. This hook is called by theexisting Linux permission function, so a security module can use it toprovide additional checking for existing Linux permission checks. Noticethat this hook is called when a file is opened (as well as many otheroperations), whereas the file_security_ops permission hook is called whenthe actual read/write operations are performed.

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_setattr(structmnt_idmap*idmap,structdentry*dentry,intia_valid)

Update the inode after a setattr operation

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

file

intia_valid

file attributes set

Description

Update inode security field after successful setting file attributes.

intsecurity_inode_getattr(conststructpath*path)

Check if getting file attributes is allowed

Parameters

conststructpath*path

file

Description

Check permission before obtaining file attributes.

Return

Returns 0 if permission is granted.

intsecurity_inode_setxattr(structmnt_idmap*idmap,structdentry*dentry,constchar*name,constvoid*value,size_tsize,intflags)

Check if setting file xattrs is allowed

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

file

constchar*name

xattr name

constvoid*value

xattr value

size_tsize

size of xattr value

intflags

flags

Description

This hook performs the desired permission checks before setting the extendedattributes (xattrs) ondentry. It is important to note that we have someadditional logic before the main LSM implementation calls to detect if weneed to perform an additional capability check at the LSM layer.

Normally we enforce a capability check prior to executing the various LSMhook implementations, but if a LSM wants to avoid this capability check,it can register a ‘inode_xattr_skipcap’ hook and return a value of 1 forxattrs that it wants to avoid the capability check, leaving the LSM fullyresponsible for enforcing the access control for the specific xattr. If allof the enabled LSMs refrain from registering a ‘inode_xattr_skipcap’ hook,or return a 0 (the default return value), the capability check is stillperformed. If no ‘inode_xattr_skipcap’ hooks are registered the capabilitycheck is performed.

Return

Returns 0 if permission is granted.

intsecurity_inode_set_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name,structposix_acl*kacl)

Check if setting posix acls is allowed

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

file

constchar*acl_name

acl name

structposix_acl*kacl

acl struct

Description

Check permission before setting posix acls, the posix acls inkacl areidentified byacl_name.

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_set_acl(structdentry*dentry,constchar*acl_name,structposix_acl*kacl)

Update inode security from posix acls set

Parameters

structdentry*dentry

file

constchar*acl_name

acl name

structposix_acl*kacl

acl struct

Description

Update inode security data after successfully setting posix acls ondentry.The posix acls inkacl are identified byacl_name.

intsecurity_inode_get_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)

Check if reading posix acls is allowed

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

file

constchar*acl_name

acl name

Description

Check permission before getting osix acls, the posix acls are identified byacl_name.

Return

Returns 0 if permission is granted.

intsecurity_inode_remove_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)

Check if removing a posix acl is allowed

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

file

constchar*acl_name

acl name

Description

Check permission before removing posix acls, the posix acls are identifiedbyacl_name.

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_remove_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)

Update inode security after rm posix acls

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

file

constchar*acl_name

acl name

Description

Update inode security data after successfully removing posix acls ondentry inidmap. The posix acls are identified byacl_name.

voidsecurity_inode_post_setxattr(structdentry*dentry,constchar*name,constvoid*value,size_tsize,intflags)

Update the inode after a setxattr operation

Parameters

structdentry*dentry

file

constchar*name

xattr name

constvoid*value

xattr value

size_tsize

xattr value size

intflags

flags

Description

Update inode security field after successful setxattr operation.

intsecurity_inode_getxattr(structdentry*dentry,constchar*name)

Check if xattr access is allowed

Parameters

structdentry*dentry

file

constchar*name

xattr name

Description

Check permission before obtaining the extended attributes identified byname fordentry.

Return

Returns 0 if permission is granted.

intsecurity_inode_listxattr(structdentry*dentry)

Check if listing xattrs is allowed

Parameters

structdentry*dentry

file

Description

Check permission before obtaining the list of extended attribute names fordentry.

Return

Returns 0 if permission is granted.

intsecurity_inode_removexattr(structmnt_idmap*idmap,structdentry*dentry,constchar*name)

Check if removing an xattr is allowed

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

file

constchar*name

xattr name

Description

This hook performs the desired permission checks before setting the extendedattributes (xattrs) ondentry. It is important to note that we have someadditional logic before the main LSM implementation calls to detect if weneed to perform an additional capability check at the LSM layer.

Normally we enforce a capability check prior to executing the various LSMhook implementations, but if a LSM wants to avoid this capability check,it can register a ‘inode_xattr_skipcap’ hook and return a value of 1 forxattrs that it wants to avoid the capability check, leaving the LSM fullyresponsible for enforcing the access control for the specific xattr. If allof the enabled LSMs refrain from registering a ‘inode_xattr_skipcap’ hook,or return a 0 (the default return value), the capability check is stillperformed. If no ‘inode_xattr_skipcap’ hooks are registered the capabilitycheck is performed.

Return

Returns 0 if permission is granted.

voidsecurity_inode_post_removexattr(structdentry*dentry,constchar*name)

Update the inode after a removexattr op

Parameters

structdentry*dentry

file

constchar*name

xattr name

Description

Update the inode after a successful removexattr operation.

intsecurity_inode_need_killpriv(structdentry*dentry)

Check ifsecurity_inode_killpriv() required

Parameters

structdentry*dentry

associated dentry

Description

Called when an inode has been changed to determine ifsecurity_inode_killpriv() should be called.

Return

Return <0 on error to abort the inode change operation, return 0 if

security_inode_killpriv() does not need to be called, return >0 ifsecurity_inode_killpriv() does need to be called.

intsecurity_inode_killpriv(structmnt_idmap*idmap,structdentry*dentry)

The setuid bit is removed, update LSM state

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

associated dentry

Description

Thedentry’s setuid bit is being removed. Remove similar security labels.Called with the dentry->d_inode->i_mutex held.

Return

Return 0 on success. If error is returned, then the operation

causing setuid bit removal is failed.

intsecurity_inode_getsecurity(structmnt_idmap*idmap,structinode*inode,constchar*name,void**buffer,boolalloc)

Get the xattr security label of an inode

Parameters

structmnt_idmap*idmap

idmap of the mount

structinode*inode

inode

constchar*name

xattr name

void**buffer

security label buffer

boolalloc

allocation flag

Description

Retrieve a copy of the extended attribute representation of the securitylabel associated withname forinode viabuffer. Note thatname is theremainder of the attribute name after the security prefix has been removed.alloc is used to specify if the call should return a value via the bufferor just the value length.

Return

Returns size of buffer on success.

intsecurity_inode_setsecurity(structinode*inode,constchar*name,constvoid*value,size_tsize,intflags)

Set the xattr security label of an inode

Parameters

structinode*inode

inode

constchar*name

xattr name

constvoid*value

security label

size_tsize

length of security label

intflags

flags

Description

Set the security label associated withname forinode from the extendedattribute valuevalue.size indicates the size of thevalue in bytes.flags may be XATTR_CREATE, XATTR_REPLACE, or 0. Note thatname is theremainder of the attribute name after the security. prefix has been removed.

Return

Returns 0 on success.

voidsecurity_inode_getlsmprop(structinode*inode,structlsm_prop*prop)

Get an inode’s LSM data

Parameters

structinode*inode

inode

structlsm_prop*prop

lsm specific information to return

Description

Get the lsm specific information associated with the node.

intsecurity_kernfs_init_security(structkernfs_node*kn_dir,structkernfs_node*kn)

Init LSM context for a kernfs node

Parameters

structkernfs_node*kn_dir

parent kernfs node

structkernfs_node*kn

the kernfs node to initialize

Description

Initialize the security context of a newly created kernfs node based on itsown and its parent’s attributes.

Return

Returns 0 if permission is granted.

intsecurity_file_permission(structfile*file,intmask)

Check file permissions

Parameters

structfile*file

file

intmask

requested permissions

Description

Check file permissions before accessing an open file. This hook is calledby various operations that read or write files. A security module can usethis hook to perform additional checking on these operations, e.g. torevalidate permissions on use to support privilege bracketing or policychanges. Notice that this hook is used when the actual read/writeoperations are performed, whereas the inode_security_ops hook is called whena file is opened (as well as many other operations). Although this hook canbe used to revalidate permissions for various system call operations thatread or write files, it does not address the revalidation of permissions formemory-mapped files. Security modules must handle this separately if theyneed such revalidation.

Return

Returns 0 if permission is granted.

intsecurity_file_alloc(structfile*file)

Allocate and init a file’s LSM blob

Parameters

structfile*file

the file

Description

Allocate and attach a security structure to the file->f_security field. Thesecurity field is initialized to NULL when the structure is first created.

Return

Return 0 if the hook is successful and permission is granted.

voidsecurity_file_release(structfile*file)

Perform actions before releasing the file ref

Parameters

structfile*file

the file

Description

Perform actions before releasing the last reference to a file.

voidsecurity_file_free(structfile*file)

Free a file’s LSM blob

Parameters

structfile*file

the file

Description

Deallocate and free any security structures stored in file->f_security.

intsecurity_mmap_file(structfile*file,unsignedlongprot,unsignedlongflags)

Check if mmap’ing a file is allowed

Parameters

structfile*file

file

unsignedlongprot

protection applied by the kernel

unsignedlongflags

flags

Description

Check permissions for a mmap operation. Thefile may be NULL, e.g. ifmapping anonymous memory.

Return

Returns 0 if permission is granted.

intsecurity_mmap_addr(unsignedlongaddr)

Check if mmap’ing an address is allowed

Parameters

unsignedlongaddr

address

Description

Check permissions for a mmap operation ataddr.

Return

Returns 0 if permission is granted.

intsecurity_file_mprotect(structvm_area_struct*vma,unsignedlongreqprot,unsignedlongprot)

Check if changing memory protections is allowed

Parameters

structvm_area_struct*vma

memory region

unsignedlongreqprot

application requested protection

unsignedlongprot

protection applied by the kernel

Description

Check permissions before changing memory access permissions.

Return

Returns 0 if permission is granted.

intsecurity_file_lock(structfile*file,unsignedintcmd)

Check if a file lock is allowed

Parameters

structfile*file

file

unsignedintcmd

lock operation (e.g. F_RDLCK, F_WRLCK)

Description

Check permission before performing file locking operations. Note the hookmediates both flock and fcntl style locks.

Return

Returns 0 if permission is granted.

intsecurity_file_fcntl(structfile*file,unsignedintcmd,unsignedlongarg)

Check if fcntl() op is allowed

Parameters

structfile*file

file

unsignedintcmd

fcntl command

unsignedlongarg

command argument

Description

Check permission before allowing the file operation specified bycmd frombeing performed on the filefile. Note thatarg sometimes represents auser space pointer; in other cases, it may be a simple integer value. Whenarg represents a user space pointer, it should never be used by thesecurity module.

Return

Returns 0 if permission is granted.

voidsecurity_file_set_fowner(structfile*file)

Set the file owner info in the LSM blob

Parameters

structfile*file

the file

Description

Save owner security information (typically from current->security) infile->f_security for later use by the send_sigiotask hook.

This hook is called with file->f_owner.lock held.

Return

Returns 0 on success.

intsecurity_file_send_sigiotask(structtask_struct*tsk,structfown_struct*fown,intsig)

Check if sending SIGIO/SIGURG is allowed

Parameters

structtask_struct*tsk

target task

structfown_struct*fown

signal sender

intsig

signal to be sent, SIGIO is sent if 0

Description

Check permission for the file ownerfown to send SIGIO or SIGURG to theprocesstsk. Note that this hook is sometimes called from interrupt. Notethat the fown_struct,fown, is never outside the context of astructfile,so the file structure (and associated security information) can always beobtained: container_of(fown,structfile, f_owner).

Return

Returns 0 if permission is granted.

intsecurity_file_receive(structfile*file)

Check if receiving a file via IPC is allowed

Parameters

structfile*file

file being received

Description

This hook allows security modules to control the ability of a process toreceive an open file descriptor via socket IPC.

Return

Returns 0 if permission is granted.

intsecurity_file_open(structfile*file)

Save open() time state for late use by the LSM

Parameters

structfile*file

Description

Save open-time permission checking state for later use upon file_permission,and recheck access if anything has changed since inode_permission.

We can check if a file is opened for execution (e.g. execve(2) call), eitherdirectly or indirectly (e.g. ELF’s ld.so) by checking file->f_flags &__FMODE_EXEC .

Return

Returns 0 if permission is granted.

intsecurity_file_truncate(structfile*file)

Check if truncating a file is allowed

Parameters

structfile*file

file

Description

Check permission before truncating a file, i.e. using ftruncate. Note thattruncation permission may also be checked based on the path, using thepath_truncate hook.

Return

Returns 0 if permission is granted.

intsecurity_task_alloc(structtask_struct*task,unsignedlongclone_flags)

Allocate a task’s LSM blob

Parameters

structtask_struct*task

the task

unsignedlongclone_flags

flags indicating what is being shared

Description

Handle allocation of task-related resources.

Return

Returns a zero on success, negative values on failure.

voidsecurity_task_free(structtask_struct*task)

Free a task’s LSM blob and related resources

Parameters

structtask_struct*task

task

Description

Handle release of task-related resources. Note that this can be called frominterrupt context.

intsecurity_cred_alloc_blank(structcred*cred,gfp_tgfp)

Allocate the min memory to allow cred_transfer

Parameters

structcred*cred

credentials

gfp_tgfp

gfp flags

Description

Only allocate sufficient memory and attach tocred such thatcred_transfer() will not get ENOMEM.

Return

Returns 0 on success, negative values on failure.

voidsecurity_cred_free(structcred*cred)

Free the cred’s LSM blob and associated resources

Parameters

structcred*cred

credentials

Description

Deallocate and clear the cred->security field in a set of credentials.

intsecurity_prepare_creds(structcred*new,conststructcred*old,gfp_tgfp)

Prepare a new set of credentials

Parameters

structcred*new

new credentials

conststructcred*old

original credentials

gfp_tgfp

gfp flags

Description

Prepare a new set of credentials by copying the data from the old set.

Return

Returns 0 on success, negative values on failure.

voidsecurity_transfer_creds(structcred*new,conststructcred*old)

Transfer creds

Parameters

structcred*new

target credentials

conststructcred*old

original credentials

Description

Transfer data from original creds to new creds.

intsecurity_kernel_act_as(structcred*new,u32secid)

Set the kernel credentials to act as secid

Parameters

structcred*new

credentials

u32secid

secid

Description

Set the credentials for a kernel service to act as (subjective context).The current task must be the one that nominatedsecid.

Return

Returns 0 if successful.

intsecurity_kernel_create_files_as(structcred*new,structinode*inode)

Set file creation context using an inode

Parameters

structcred*new

target credentials

structinode*inode

reference inode

Description

Set the file creation context in a set of credentials to be the same as theobjective context of the specified inode. The current task must be the onethat nominatedinode.

Return

Returns 0 if successful.

intsecurity_kernel_module_request(char*kmod_name)

Check if loading a module is allowed

Parameters

char*kmod_name

module name

Description

Ability to trigger the kernel to automatically upcall to userspace foruserspace to load a kernel module with the given name.

Return

Returns 0 if successful.

intsecurity_task_fix_setuid(structcred*new,conststructcred*old,intflags)

Update LSM with new user id attributes

Parameters

structcred*new

updated credentials

conststructcred*old

credentials being replaced

intflags

LSM_SETID_* flag values

Description

Update the module’s state after setting one or more of the user identityattributes of the current process. Theflags parameter indicates which ofthe set*uid system calls invoked this hook. Ifnew is the set ofcredentials that will be installed. Modifications should be made to thisrather than tocurrent->cred.

Return

Returns 0 on success.

intsecurity_task_fix_setgid(structcred*new,conststructcred*old,intflags)

Update LSM with new group id attributes

Parameters

structcred*new

updated credentials

conststructcred*old

credentials being replaced

intflags

LSM_SETID_* flag value

Description

Update the module’s state after setting one or more of the group identityattributes of the current process. Theflags parameter indicates which ofthe set*gid system calls invoked this hook.new is the set of credentialsthat will be installed. Modifications should be made to this rather than tocurrent->cred.

Return

Returns 0 on success.

intsecurity_task_fix_setgroups(structcred*new,conststructcred*old)

Update LSM with new supplementary groups

Parameters

structcred*new

updated credentials

conststructcred*old

credentials being replaced

Description

Update the module’s state after setting the supplementary group identityattributes of the current process.new is the set of credentials that willbe installed. Modifications should be made to this rather than tocurrent->cred.

Return

Returns 0 on success.

intsecurity_task_setpgid(structtask_struct*p,pid_tpgid)

Check if setting the pgid is allowed

Parameters

structtask_struct*p

task being modified

pid_tpgid

new pgid

Description

Check permission before setting the process group identifier of the processp topgid.

Return

Returns 0 if permission is granted.

intsecurity_task_getpgid(structtask_struct*p)

Check if getting the pgid is allowed

Parameters

structtask_struct*p

task

Description

Check permission before getting the process group identifier of the processp.

Return

Returns 0 if permission is granted.

intsecurity_task_getsid(structtask_struct*p)

Check if getting the session id is allowed

Parameters

structtask_struct*p

task

Description

Check permission before getting the session identifier of the processp.

Return

Returns 0 if permission is granted.

intsecurity_task_setnice(structtask_struct*p,intnice)

Check if setting a task’s nice value is allowed

Parameters

structtask_struct*p

target task

intnice

nice value

Description

Check permission before setting the nice value ofp tonice.

Return

Returns 0 if permission is granted.

intsecurity_task_setioprio(structtask_struct*p,intioprio)

Check if setting a task’s ioprio is allowed

Parameters

structtask_struct*p

target task

intioprio

ioprio value

Description

Check permission before setting the ioprio value ofp toioprio.

Return

Returns 0 if permission is granted.

intsecurity_task_getioprio(structtask_struct*p)

Check if getting a task’s ioprio is allowed

Parameters

structtask_struct*p

task

Description

Check permission before getting the ioprio value ofp.

Return

Returns 0 if permission is granted.

intsecurity_task_prlimit(conststructcred*cred,conststructcred*tcred,unsignedintflags)

Check if get/setting resources limits is allowed

Parameters

conststructcred*cred

current task credentials

conststructcred*tcred

target task credentials

unsignedintflags

LSM_PRLIMIT_* flag bits indicating a get/set/both

Description

Check permission before getting and/or setting the resource limits ofanother task.

Return

Returns 0 if permission is granted.

intsecurity_task_setrlimit(structtask_struct*p,unsignedintresource,structrlimit*new_rlim)

Check if setting a new rlimit value is allowed

Parameters

structtask_struct*p

target task’s group leader

unsignedintresource

resource whose limit is being set

structrlimit*new_rlim

new resource limit

Description

Check permission before setting the resource limits of processp forresource tonew_rlim. The old resource limit values can be examined bydereferencing (p->signal->rlim + resource).

Return

Returns 0 if permission is granted.

intsecurity_task_setscheduler(structtask_struct*p)

Check if setting sched policy/param is allowed

Parameters

structtask_struct*p

target task

Description

Check permission before setting scheduling policy and/or parameters ofprocessp.

Return

Returns 0 if permission is granted.

intsecurity_task_getscheduler(structtask_struct*p)

Check if getting scheduling info is allowed

Parameters

structtask_struct*p

target task

Description

Check permission before obtaining scheduling information for processp.

Return

Returns 0 if permission is granted.

intsecurity_task_movememory(structtask_struct*p)

Check if moving memory is allowed

Parameters

structtask_struct*p

task

Description

Check permission before moving memory owned by processp.

Return

Returns 0 if permission is granted.

intsecurity_task_kill(structtask_struct*p,structkernel_siginfo*info,intsig,conststructcred*cred)

Check if sending a signal is allowed

Parameters

structtask_struct*p

target process

structkernel_siginfo*info

signal information

intsig

signal value

conststructcred*cred

credentials of the signal sender, NULL ifcurrent

Description

Check permission before sending signalsig top.info can be NULL, theconstant 1, or a pointer to a kernel_siginfo structure. Ifinfo is 1 orSI_FROMKERNEL(info) is true, then the signal should be viewed as coming fromthe kernel and should typically be permitted. SIGIO signals are handledseparately by the send_sigiotask hook in file_security_ops.

Return

Returns 0 if permission is granted.

intsecurity_task_prctl(intoption,unsignedlongarg2,unsignedlongarg3,unsignedlongarg4,unsignedlongarg5)

Check if a prctl op is allowed

Parameters

intoption

operation

unsignedlongarg2

argument

unsignedlongarg3

argument

unsignedlongarg4

argument

unsignedlongarg5

argument

Description

Check permission before performing a process control operation on thecurrent process.

Return

Return -ENOSYS if no-one wanted to handle this op, any other value

to cause prctl() to return immediately with that value.

voidsecurity_task_to_inode(structtask_struct*p,structinode*inode)

Set the security attributes of a task’s inode

Parameters

structtask_struct*p

task

structinode*inode

inode

Description

Set the security attributes for an inode based on an associated task’ssecurity attributes, e.g. for /proc/pid inodes.

intsecurity_create_user_ns(conststructcred*cred)

Check if creating a new userns is allowed

Parameters

conststructcred*cred

prepared creds

Description

Check permission prior to creating a new user namespace.

Return

Returns 0 if successful, otherwise < 0 error code.

intsecurity_ipc_permission(structkern_ipc_perm*ipcp,shortflag)

Check if sysv ipc access is allowed

Parameters

structkern_ipc_perm*ipcp

ipc permission structure

shortflag

requested permissions

Description

Check permissions for access to IPC.

Return

Returns 0 if permission is granted.

voidsecurity_ipc_getlsmprop(structkern_ipc_perm*ipcp,structlsm_prop*prop)

Get the sysv ipc object LSM data

Parameters

structkern_ipc_perm*ipcp

ipc permission structure

structlsm_prop*prop

pointer to lsm information

Description

Get the lsm information associated with the ipc object.

intsecurity_msg_msg_alloc(structmsg_msg*msg)

Allocate a sysv ipc message LSM blob

Parameters

structmsg_msg*msg

message structure

Description

Allocate and attach a security structure to the msg->security field. Thesecurity field is initialized to NULL when the structure is first created.

Return

Return 0 if operation was successful and permission is granted.

voidsecurity_msg_msg_free(structmsg_msg*msg)

Free a sysv ipc message LSM blob

Parameters

structmsg_msg*msg

message structure

Description

Deallocate the security structure for this message.

intsecurity_msg_queue_alloc(structkern_ipc_perm*msq)

Allocate a sysv ipc msg queue LSM blob

Parameters

structkern_ipc_perm*msq

sysv ipc permission structure

Description

Allocate and attach a security structure tomsg. The security field isinitialized to NULL when the structure is first created.

Return

Returns 0 if operation was successful and permission is granted.

voidsecurity_msg_queue_free(structkern_ipc_perm*msq)

Free a sysv ipc msg queue LSM blob

Parameters

structkern_ipc_perm*msq

sysv ipc permission structure

Description

Deallocate security fieldperm->security for the message queue.

intsecurity_msg_queue_associate(structkern_ipc_perm*msq,intmsqflg)

Check if a msg queue operation is allowed

Parameters

structkern_ipc_perm*msq

sysv ipc permission structure

intmsqflg

operation flags

Description

Check permission when a message queue is requested through the msgget systemcall. This hook is only called when returning the message queue identifierfor an existing message queue, not when a new message queue is created.

Return

Return 0 if permission is granted.

intsecurity_msg_queue_msgctl(structkern_ipc_perm*msq,intcmd)

Check if a msg queue operation is allowed

Parameters

structkern_ipc_perm*msq

sysv ipc permission structure

intcmd

operation

Description

Check permission when a message control operation specified bycmd is to beperformed on the message queue with permissions.

Return

Returns 0 if permission is granted.

intsecurity_msg_queue_msgsnd(structkern_ipc_perm*msq,structmsg_msg*msg,intmsqflg)

Check if sending a sysv ipc message is allowed

Parameters

structkern_ipc_perm*msq

sysv ipc permission structure

structmsg_msg*msg

message

intmsqflg

operation flags

Description

Check permission before a message,msg, is enqueued on the message queuewith permissions specified inmsq.

Return

Returns 0 if permission is granted.

intsecurity_msg_queue_msgrcv(structkern_ipc_perm*msq,structmsg_msg*msg,structtask_struct*target,longtype,intmode)

Check if receiving a sysv ipc msg is allowed

Parameters

structkern_ipc_perm*msq

sysv ipc permission structure

structmsg_msg*msg

message

structtask_struct*target

target task

longtype

type of message requested

intmode

operation flags

Description

Check permission before a message,msg, is removed from the message queue.Thetarget task structure contains a pointer to the process that will bereceiving the message (not equal to the current process when inline receivesare being performed).

Return

Returns 0 if permission is granted.

intsecurity_shm_alloc(structkern_ipc_perm*shp)

Allocate a sysv shm LSM blob

Parameters

structkern_ipc_perm*shp

sysv ipc permission structure

Description

Allocate and attach a security structure to theshp security field. Thesecurity field is initialized to NULL when the structure is first created.

Return

Returns 0 if operation was successful and permission is granted.

voidsecurity_shm_free(structkern_ipc_perm*shp)

Free a sysv shm LSM blob

Parameters

structkern_ipc_perm*shp

sysv ipc permission structure

Description

Deallocate the security structureperm->security for the memory segment.

intsecurity_shm_associate(structkern_ipc_perm*shp,intshmflg)

Check if a sysv shm operation is allowed

Parameters

structkern_ipc_perm*shp

sysv ipc permission structure

intshmflg

operation flags

Description

Check permission when a shared memory region is requested through the shmgetsystem call. This hook is only called when returning the shared memoryregion identifier for an existing region, not when a new shared memoryregion is created.

Return

Returns 0 if permission is granted.

intsecurity_shm_shmctl(structkern_ipc_perm*shp,intcmd)

Check if a sysv shm operation is allowed

Parameters

structkern_ipc_perm*shp

sysv ipc permission structure

intcmd

operation

Description

Check permission when a shared memory control operation specified bycmd isto be performed on the shared memory region with permissions inshp.

Return

Return 0 if permission is granted.

intsecurity_shm_shmat(structkern_ipc_perm*shp,char__user*shmaddr,intshmflg)

Check if a sysv shm attach operation is allowed

Parameters

structkern_ipc_perm*shp

sysv ipc permission structure

char__user*shmaddr

address of memory region to attach

intshmflg

operation flags

Description

Check permissions prior to allowing the shmat system call to attach theshared memory segment with permissionsshp to the data segment of thecalling process. The attaching address is specified byshmaddr.

Return

Returns 0 if permission is granted.

intsecurity_sem_alloc(structkern_ipc_perm*sma)

Allocate a sysv semaphore LSM blob

Parameters

structkern_ipc_perm*sma

sysv ipc permission structure

Description

Allocate and attach a security structure to thesma security field. Thesecurity field is initialized to NULL when the structure is first created.

Return

Returns 0 if operation was successful and permission is granted.

voidsecurity_sem_free(structkern_ipc_perm*sma)

Free a sysv semaphore LSM blob

Parameters

structkern_ipc_perm*sma

sysv ipc permission structure

Description

Deallocate security structuresma->security for the semaphore.

intsecurity_sem_associate(structkern_ipc_perm*sma,intsemflg)

Check if a sysv semaphore operation is allowed

Parameters

structkern_ipc_perm*sma

sysv ipc permission structure

intsemflg

operation flags

Description

Check permission when a semaphore is requested through the semget systemcall. This hook is only called when returning the semaphore identifier foran existing semaphore, not when a new one must be created.

Return

Returns 0 if permission is granted.

intsecurity_sem_semctl(structkern_ipc_perm*sma,intcmd)

Check if a sysv semaphore operation is allowed

Parameters

structkern_ipc_perm*sma

sysv ipc permission structure

intcmd

operation

Description

Check permission when a semaphore operation specified bycmd is to beperformed on the semaphore.

Return

Returns 0 if permission is granted.

intsecurity_sem_semop(structkern_ipc_perm*sma,structsembuf*sops,unsignednsops,intalter)

Check if a sysv semaphore operation is allowed

Parameters

structkern_ipc_perm*sma

sysv ipc permission structure

structsembuf*sops

operations to perform

unsignednsops

number of operations

intalter

flag indicating changes will be made

Description

Check permissions before performing operations on members of the semaphoreset. If thealter flag is nonzero, the semaphore set may be modified.

Return

Returns 0 if permission is granted.

intsecurity_getselfattr(unsignedintattr,structlsm_ctx__user*uctx,u32__user*size,u32flags)

Read an LSM attribute of the current process.

Parameters

unsignedintattr

which attribute to return

structlsm_ctx__user*uctx

the user-space destination for the information, or NULL

u32__user*size

pointer to the size of space available to receive the data

u32flags

special handling options. LSM_FLAG_SINGLE indicates that onlyattributes associated with the LSM identified in the passedctx bereported.

Description

A NULL value foructx can be used to get both the number of attributesand the size of the data.

Returns the number of attributes found on success, negative valueon error.size is reset to the total size of the data.Ifsize is insufficient to contain the data -E2BIG is returned.

intsecurity_setselfattr(unsignedintattr,structlsm_ctx__user*uctx,u32size,u32flags)

Set an LSM attribute on the current process.

Parameters

unsignedintattr

which attribute to set

structlsm_ctx__user*uctx

the user-space source for the information

u32size

the size of the data

u32flags

reserved for future use, must be 0

Description

Set an LSM attribute for the current process. The LSM, attributeand new value are included inuctx.

Returns 0 on success, -EINVAL if the input is inconsistent, -EFAULTif the user buffer is inaccessible, E2BIG if size is too big, or anLSM specific failure.

intsecurity_getprocattr(structtask_struct*p,intlsmid,constchar*name,char**value)

Read an attribute for a task

Parameters

structtask_struct*p

the task

intlsmid

LSM identification

constchar*name

attribute name

char**value

attribute value

Description

Read attributename for taskp and store it intovalue if allowed.

Return

Returns the length ofvalue on success, a negative value otherwise.

intsecurity_setprocattr(intlsmid,constchar*name,void*value,size_tsize)

Set an attribute for a task

Parameters

intlsmid

LSM identification

constchar*name

attribute name

void*value

attribute value

size_tsize

attribute value size

Description

Write (set) the current task’s attributename tovalue, sizesize ifallowed.

Return

Returns bytes written on success, a negative value otherwise.

intsecurity_post_notification(conststructcred*w_cred,conststructcred*cred,structwatch_notification*n)

Check if a watch notification can be posted

Parameters

conststructcred*w_cred

credentials of the task that set the watch

conststructcred*cred

credentials of the task which triggered the watch

structwatch_notification*n

the notification

Description

Check to see if a watch notification can be posted to a particular queue.

Return

Returns 0 if permission is granted.

intsecurity_watch_key(structkey*key)

Check if a task is allowed to watch for key events

Parameters

structkey*key

the key to watch

Description

Check to see if a process is allowed to watch for event notifications froma key or keyring.

Return

Returns 0 if permission is granted.

intsecurity_netlink_send(structsock*sk,structsk_buff*skb)

Save info and check if netlink sending is allowed

Parameters

structsock*sk

sending socket

structsk_buff*skb

netlink message

Description

Save security information for a netlink message so that permission checkingcan be performed when the message is processed. The security informationcan be saved using the eff_cap field of the netlink_skb_parms structure.Also may be used to provide fine grained control over message transmission.

Return

Returns 0 if the information was successfully saved and message is

allowed to be transmitted.

intsecurity_socket_create(intfamily,inttype,intprotocol,intkern)

Check if creating a new socket is allowed

Parameters

intfamily

protocol family

inttype

communications type

intprotocol

requested protocol

intkern

set to 1 if a kernel socket is requested

Description

Check permissions prior to creating a new socket.

Return

Returns 0 if permission is granted.

intsecurity_socket_post_create(structsocket*sock,intfamily,inttype,intprotocol,intkern)

Initialize a newly created socket

Parameters

structsocket*sock

socket

intfamily

protocol family

inttype

communications type

intprotocol

requested protocol

intkern

set to 1 if a kernel socket is requested

Description

This hook allows a module to update or allocate a per-socket securitystructure. Note that the security field was not added directly to the socketstructure, but rather, the socket security information is stored in theassociated inode. Typically, the inode alloc_security hook will allocateand attach security information to SOCK_INODE(sock)->i_security. This hookmay be used to update the SOCK_INODE(sock)->i_security field with additionalinformation that wasn’t available when the inode was allocated.

Return

Returns 0 if permission is granted.

intsecurity_socket_bind(structsocket*sock,structsockaddr*address,intaddrlen)

Check if a socket bind operation is allowed

Parameters

structsocket*sock

socket

structsockaddr*address

requested bind address

intaddrlen

length of address

Description

Check permission before socket protocol layer bind operation is performedand the socketsock is bound to the address specified in theaddressparameter.

Return

Returns 0 if permission is granted.

intsecurity_socket_connect(structsocket*sock,structsockaddr*address,intaddrlen)

Check if a socket connect operation is allowed

Parameters

structsocket*sock

socket

structsockaddr*address

address of remote connection point

intaddrlen

length of address

Description

Check permission before socket protocol layer connect operation attempts toconnect socketsock to a remote address,address.

Return

Returns 0 if permission is granted.

intsecurity_socket_listen(structsocket*sock,intbacklog)

Check if a socket is allowed to listen

Parameters

structsocket*sock

socket

intbacklog

connection queue size

Description

Check permission before socket protocol layer listen operation.

Return

Returns 0 if permission is granted.

intsecurity_socket_accept(structsocket*sock,structsocket*newsock)

Check if a socket is allowed to accept connections

Parameters

structsocket*sock

listening socket

structsocket*newsock

newly creation connection socket

Description

Check permission before accepting a new connection. Note that the newsocket,newsock, has been created and some information copied to it, butthe accept operation has not actually been performed.

Return

Returns 0 if permission is granted.

intsecurity_socket_sendmsg(structsocket*sock,structmsghdr*msg,intsize)

Check if sending a message is allowed

Parameters

structsocket*sock

sending socket

structmsghdr*msg

message to send

intsize

size of message

Description

Check permission before transmitting a message to another socket.

Return

Returns 0 if permission is granted.

intsecurity_socket_recvmsg(structsocket*sock,structmsghdr*msg,intsize,intflags)

Check if receiving a message is allowed

Parameters

structsocket*sock

receiving socket

structmsghdr*msg

message to receive

intsize

size of message

intflags

operational flags

Description

Check permission before receiving a message from a socket.

Return

Returns 0 if permission is granted.

intsecurity_socket_getsockname(structsocket*sock)

Check if reading the socket addr is allowed

Parameters

structsocket*sock

socket

Description

Check permission before reading the local address (name) of the socketobject.

Return

Returns 0 if permission is granted.

intsecurity_socket_getpeername(structsocket*sock)

Check if reading the peer’s addr is allowed

Parameters

structsocket*sock

socket

Description

Check permission before the remote address (name) of a socket object.

Return

Returns 0 if permission is granted.

intsecurity_socket_getsockopt(structsocket*sock,intlevel,intoptname)

Check if reading a socket option is allowed

Parameters

structsocket*sock

socket

intlevel

option’s protocol level

intoptname

option name

Description

Check permissions before retrieving the options associated with socketsock.

Return

Returns 0 if permission is granted.

intsecurity_socket_setsockopt(structsocket*sock,intlevel,intoptname)

Check if setting a socket option is allowed

Parameters

structsocket*sock

socket

intlevel

option’s protocol level

intoptname

option name

Description

Check permissions before setting the options associated with socketsock.

Return

Returns 0 if permission is granted.

intsecurity_socket_shutdown(structsocket*sock,inthow)

Checks if shutting down the socket is allowed

Parameters

structsocket*sock

socket

inthow

flag indicating how sends and receives are handled

Description

Checks permission before all or part of a connection on the socketsock isshut down.

Return

Returns 0 if permission is granted.

intsecurity_socket_getpeersec_stream(structsocket*sock,sockptr_toptval,sockptr_toptlen,unsignedintlen)

Get the remote peer label

Parameters

structsocket*sock

socket

sockptr_toptval

destination buffer

sockptr_toptlen

size of peer label copied into the buffer

unsignedintlen

maximum size of the destination buffer

Description

This hook allows the security module to provide peer socket security statefor unix or connected tcp sockets to userspace via getsockopt SO_GETPEERSEC.For tcp sockets this can be meaningful if the socket is associated with anipsec SA.

Return

Returns 0 if all is well, otherwise, typical getsockopt return

values.

intlsm_sock_alloc(structsock*sock,gfp_tgfp)

allocate a composite sock blob

Parameters

structsock*sock

the sock that needs a blob

gfp_tgfp

allocation mode

Description

Allocate the sock blob for all the modules

Returns 0, or -ENOMEM if memory can’t be allocated.

intsecurity_sk_alloc(structsock*sk,intfamily,gfp_tpriority)

Allocate and initialize a sock’s LSM blob

Parameters

structsock*sk

sock

intfamily

protocol family

gfp_tpriority

gfp flags

Description

Allocate and attach a security structure to the sk->sk_security field, whichis used to copy security attributes between local stream sockets.

Return

Returns 0 on success, error on failure.

voidsecurity_sk_free(structsock*sk)

Free the sock’s LSM blob

Parameters

structsock*sk

sock

Description

Deallocate security structure.

voidsecurity_inet_csk_clone(structsock*newsk,conststructrequest_sock*req)

Set new sock LSM state based on request_sock

Parameters

structsock*newsk

new sock

conststructrequest_sock*req

connection request_sock

Description

Set that LSM state ofsock using the LSM state fromreq.

intsecurity_mptcp_add_subflow(structsock*sk,structsock*ssk)

Inherit the LSM label from the MPTCP socket

Parameters

structsock*sk

the owning MPTCP socket

structsock*ssk

the new subflow

Description

Update the labeling for the given MPTCP subflow, to match the one of theowning MPTCP socket. This hook has to be called after the socket creation andinitialization via thesecurity_socket_create() andsecurity_socket_post_create() LSM hooks.

Return

Returns 0 on success or a negative error code on failure.

intsecurity_xfrm_policy_clone(structxfrm_sec_ctx*old_ctx,structxfrm_sec_ctx**new_ctxp)

Clone xfrm policy LSM state

Parameters

structxfrm_sec_ctx*old_ctx

xfrm security context

structxfrm_sec_ctx**new_ctxp

target xfrm security context

Description

Allocate a security structure in new_ctxp that contains the information fromthe old_ctx structure.

Return

Return 0 if operation was successful.

intsecurity_xfrm_policy_delete(structxfrm_sec_ctx*ctx)

Check if deleting a xfrm policy is allowed

Parameters

structxfrm_sec_ctx*ctx

xfrm security context

Description

Authorize deletion of a SPD entry.

Return

Returns 0 if permission is granted.

intsecurity_xfrm_state_alloc_acquire(structxfrm_state*x,structxfrm_sec_ctx*polsec,u32secid)

Allocate a xfrm state LSM blob

Parameters

structxfrm_state*x

xfrm state being added to the SAD

structxfrm_sec_ctx*polsec

associated policy’s security context

u32secid

secid from the flow

Description

Allocate a security structure to the x->security field; the security fieldis initialized to NULL when the xfrm_state is allocated. Set the context tocorrespond to secid.

Return

Returns 0 if operation was successful.

voidsecurity_xfrm_state_free(structxfrm_state*x)

Free a xfrm state

Parameters

structxfrm_state*x

xfrm state

Description

Deallocate x->security.

intsecurity_xfrm_policy_lookup(structxfrm_sec_ctx*ctx,u32fl_secid)

Check if using a xfrm policy is allowed

Parameters

structxfrm_sec_ctx*ctx

target xfrm security context

u32fl_secid

flow secid used to authorize access

Description

Check permission when a flow selects a xfrm_policy for processing XFRMs on apacket. The hook is called when selecting either a per-socket policy or ageneric xfrm policy.

Return

Return 0 if permission is granted, -ESRCH otherwise, or -errno on

other errors.

intsecurity_xfrm_state_pol_flow_match(structxfrm_state*x,structxfrm_policy*xp,conststructflowi_common*flic)

Check for a xfrm match

Parameters

structxfrm_state*x

xfrm state to match

structxfrm_policy*xp

xfrm policy to check for a match

conststructflowi_common*flic

flow to check for a match.

Description

Checkxp andflic for a match withx.

Return

Returns 1 if there is a match.

intsecurity_xfrm_decode_session(structsk_buff*skb,u32*secid)

Determine the xfrm secid for a packet

Parameters

structsk_buff*skb

xfrm packet

u32*secid

secid

Description

Decode the packet inskb and return the security label insecid.

Return

Return 0 if all xfrms used have the same secid.

intsecurity_key_alloc(structkey*key,conststructcred*cred,unsignedlongflags)

Allocate and initialize a kernel key LSM blob

Parameters

structkey*key

key

conststructcred*cred

credentials

unsignedlongflags

allocation flags

Description

Permit allocation of a key and assign security data. Note that key does nothave a serial number assigned at this point.

Return

Return 0 if permission is granted, -ve error otherwise.

voidsecurity_key_free(structkey*key)

Free a kernel key LSM blob

Parameters

structkey*key

key

Description

Notification of destruction; free security data.

intsecurity_key_permission(key_ref_tkey_ref,conststructcred*cred,enumkey_need_permneed_perm)

Check if a kernel key operation is allowed

Parameters

key_ref_tkey_ref

key reference

conststructcred*cred

credentials of actor requesting access

enumkey_need_permneed_perm

requested permissions

Description

See whether a specific operational right is granted to a process on a key.

Return

Return 0 if permission is granted, -ve error otherwise.

intsecurity_key_getsecurity(structkey*key,char**buffer)

Get the key’s security label

Parameters

structkey*key

key

char**buffer

security label buffer

Description

Get a textual representation of the security context attached to a key forthe purposes of honouring KEYCTL_GETSECURITY. This function allocates thestorage for the NUL-terminated string and the caller should free it.

Return

Returns the length ofbuffer (including terminating NUL) or -ve if

an error occurs. May also return 0 (and a NULL buffer pointer) ifthere is no security label assigned to the key.

voidsecurity_key_post_create_or_update(structkey*keyring,structkey*key,constvoid*payload,size_tpayload_len,unsignedlongflags,boolcreate)

Notification of key create or update

Parameters

structkey*keyring

keyring to which the key is linked to

structkey*key

created or updated key

constvoid*payload

data used to instantiate or update the key

size_tpayload_len

length of payload

unsignedlongflags

key flags

boolcreate

flag indicating whether the key was created or updated

Description

Notify the caller of a key creation or update.

intsecurity_audit_rule_init(u32field,u32op,char*rulestr,void**lsmrule,gfp_tgfp)

Allocate and init an LSM audit rule struct

Parameters

u32field

audit action

u32op

rule operator

char*rulestr

rule context

void**lsmrule

receive buffer for audit rule struct

gfp_tgfp

GFP flag used for kmalloc

Description

Allocate and initialize an LSM audit rule structure.

Return

Return 0 iflsmrule has been successfully set, -EINVAL in case of

an invalid rule.

intsecurity_audit_rule_known(structaudit_krule*krule)

Check if an audit rule contains LSM fields

Parameters

structaudit_krule*krule

audit rule

Description

Specifies whether givenkrule contains any fields related to the currentLSM.

Return

Returns 1 in case of relation found, 0 otherwise.

voidsecurity_audit_rule_free(void*lsmrule)

Free an LSM audit rule struct

Parameters

void*lsmrule

audit rule struct

Description

Deallocate the LSM audit rule structure previously allocated byaudit_rule_init().

intsecurity_audit_rule_match(structlsm_prop*prop,u32field,u32op,void*lsmrule)

Check if a label matches an audit rule

Parameters

structlsm_prop*prop

security label

u32field

LSM audit field

u32op

matching operator

void*lsmrule

audit rule

Description

Determine if givensecid matches a rule previously approved bysecurity_audit_rule_known().

Return

Returns 1 if secid matches the rule, 0 if it does not, -ERRNO on

failure.

intsecurity_bpf(intcmd,unionbpf_attr*attr,unsignedintsize,boolkernel)

Check if the bpf syscall operation is allowed

Parameters

intcmd

command

unionbpf_attr*attr

bpf attribute

unsignedintsize

size

boolkernel

whether or not call originated from kernel

Description

Do a initial check for all bpf syscalls after the attribute is copied intothe kernel. The actual security module can implement their own rules tocheck the specific cmd they need.

Return

Returns 0 if permission is granted.

intsecurity_bpf_map(structbpf_map*map,fmode_tfmode)

Check if access to a bpf map is allowed

Parameters

structbpf_map*map

bpf map

fmode_tfmode

mode

Description

Do a check when the kernel generates and returns a file descriptor for eBPFmaps.

Return

Returns 0 if permission is granted.

intsecurity_bpf_prog(structbpf_prog*prog)

Check if access to a bpf program is allowed

Parameters

structbpf_prog*prog

bpf program

Description

Do a check when the kernel generates and returns a file descriptor for eBPFprograms.

Return

Returns 0 if permission is granted.

intsecurity_bpf_map_create(structbpf_map*map,unionbpf_attr*attr,structbpf_token*token,boolkernel)

Check if BPF map creation is allowed

Parameters

structbpf_map*map

BPF map object

unionbpf_attr*attr

BPF syscall attributes used to create BPF map

structbpf_token*token

BPF token used to grant user access

boolkernel

whether or not call originated from kernel

Description

Do a check when the kernel creates a new BPF map. This is also thepoint where LSM blob is allocated for LSMs that need them.

Return

Returns 0 on success, error on failure.

intsecurity_bpf_prog_load(structbpf_prog*prog,unionbpf_attr*attr,structbpf_token*token,boolkernel)

Check if loading of BPF program is allowed

Parameters

structbpf_prog*prog

BPF program object

unionbpf_attr*attr

BPF syscall attributes used to create BPF program

structbpf_token*token

BPF token used to grant user access to BPF subsystem

boolkernel

whether or not call originated from kernel

Description

Perform an access control check when the kernel loads a BPF program andallocates associated BPF program object. This hook is also responsible forallocating any required LSM state for the BPF program.

Return

Returns 0 on success, error on failure.

intsecurity_bpf_token_create(structbpf_token*token,unionbpf_attr*attr,conststructpath*path)

Check if creating of BPF token is allowed

Parameters

structbpf_token*token

BPF token object

unionbpf_attr*attr

BPF syscall attributes used to create BPF token

conststructpath*path

path pointing to BPF FS mount point from which BPF token is created

Description

Do a check when the kernel instantiates a new BPF token object from BPF FSinstance. This is also the point where LSM blob can be allocated for LSMs.

Return

Returns 0 on success, error on failure.

intsecurity_bpf_token_cmd(conststructbpf_token*token,enumbpf_cmdcmd)

Check if BPF token is allowed to delegate requested BPF syscall command

Parameters

conststructbpf_token*token

BPF token object

enumbpf_cmdcmd

BPF syscall command requested to be delegated by BPF token

Description

Do a check when the kernel decides whether provided BPF token should allowdelegation of requested BPF syscall command.

Return

Returns 0 on success, error on failure.

intsecurity_bpf_token_capable(conststructbpf_token*token,intcap)

Check if BPF token is allowed to delegate requested BPF-related capability

Parameters

conststructbpf_token*token

BPF token object

intcap

capabilities requested to be delegated by BPF token

Description

Do a check when the kernel decides whether provided BPF token should allowdelegation of requested BPF-related capabilities.

Return

Returns 0 on success, error on failure.

voidsecurity_bpf_map_free(structbpf_map*map)

Free a bpf map’s LSM blob

Parameters

structbpf_map*map

bpf map

Description

Clean up the security information stored inside bpf map.

voidsecurity_bpf_prog_free(structbpf_prog*prog)

Free a BPF program’s LSM blob

Parameters

structbpf_prog*prog

BPF program struct

Description

Clean up the security information stored inside BPF program.

voidsecurity_bpf_token_free(structbpf_token*token)

Free a BPF token’s LSM blob

Parameters

structbpf_token*token

BPF token struct

Description

Clean up the security information stored inside BPF token.

intsecurity_perf_event_open(inttype)

Check if a perf event open is allowed

Parameters

inttype

type of event

Description

Check whether thetype of perf_event_open syscall is allowed.

Return

Returns 0 if permission is granted.

intsecurity_perf_event_alloc(structperf_event*event)

Allocate a perf event LSM blob

Parameters

structperf_event*event

perf event

Description

Allocate and save perf_event security info.

Return

Returns 0 on success, error on failure.

voidsecurity_perf_event_free(structperf_event*event)

Free a perf event LSM blob

Parameters

structperf_event*event

perf event

Description

Release (free) perf_event security info.

intsecurity_perf_event_read(structperf_event*event)

Check if reading a perf event label is allowed

Parameters

structperf_event*event

perf event

Description

Read perf_event security info if allowed.

Return

Returns 0 if permission is granted.

intsecurity_perf_event_write(structperf_event*event)

Check if writing a perf event label is allowed

Parameters

structperf_event*event

perf event

Description

Write perf_event security info if allowed.

Return

Returns 0 if permission is granted.

intsecurity_uring_override_creds(conststructcred*new)

Check if overriding creds is allowed

Parameters

conststructcred*new

new credentials

Description

Check if the current task, executing an io_uring operation, is allowed tooverride it’s credentials withnew.

Return

Returns 0 if permission is granted.

intsecurity_uring_sqpoll(void)

Check if IORING_SETUP_SQPOLL is allowed

Parameters

void

no arguments

Description

Check whether the current task is allowed to spawn a io_uring polling thread(IORING_SETUP_SQPOLL).

Return

Returns 0 if permission is granted.

intsecurity_uring_cmd(structio_uring_cmd*ioucmd)

Check if a io_uring passthrough command is allowed

Parameters

structio_uring_cmd*ioucmd

command

Description

Check whether the file_operations uring_cmd is allowed to run.

Return

Returns 0 if permission is granted.

intsecurity_uring_allowed(void)

Check if io_uring_setup() is allowed

Parameters

void

no arguments

Description

Check whether the current task is allowed to call io_uring_setup().

Return

Returns 0 if permission is granted.

voidsecurity_initramfs_populated(void)

Notify LSMs that initramfs has been loaded

Parameters

void

no arguments

Description

Tells the LSMs the initramfs has been unpacked into the rootfs.

structdentry*securityfs_create_file(constchar*name,umode_tmode,structdentry*parent,void*data,conststructfile_operations*fops)

create a file in the securityfs filesystem

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the securityfs filesystem.

void*data

a pointer to something that the caller will want to get to lateron. The inode.i_private pointer will point to this value onthe open() call.

conststructfile_operations*fops

a pointer to a struct file_operations that should be used forthis file.

Description

This function creates a file in securityfs with the givenname.

This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove() function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).

If securityfs is not enabled in the kernel, the value-ENODEV isreturned.

structdentry*securityfs_create_dir(constchar*name,structdentry*parent)

create a directory in the securityfs filesystem

Parameters

constchar*name

a pointer to a string containing the name of the directory tocreate.

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thedirectory will be created in the root of the securityfs filesystem.

Description

This function creates a directory in securityfs with the givenname.

This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove() function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).

If securityfs is not enabled in the kernel, the value-ENODEV isreturned.

structdentry*securityfs_create_symlink(constchar*name,structdentry*parent,constchar*target,conststructinode_operations*iops)

create a symlink in the securityfs filesystem

Parameters

constchar*name

a pointer to a string containing the name of the symlink tocreate.

structdentry*parent

a pointer to the parent dentry for the symlink. This should be adirectory dentry if set. If this parameter isNULL, then thedirectory will be created in the root of the securityfs filesystem.

constchar*target

a pointer to a string containing the name of the symlink’s target.If this parameter isNULL, then theiops parameter needs to besetup to handle .readlink and .get_link inode_operations.

conststructinode_operations*iops

a pointer to the struct inode_operations to use for the symlink. Ifthis parameter isNULL, then the default simple_symlink_inodeoperations will be used.

Description

This function creates a symlink in securityfs with the givenname.

This function returns a pointer to a dentry if it succeeds. Thispointer must be passed to thesecurityfs_remove() function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here). If an error occurs, the function will returnthe error value (via ERR_PTR).

If securityfs is not enabled in the kernel, the value-ENODEV isreturned.

voidsecurityfs_remove(structdentry*dentry)

removes a file or directory from the securityfs filesystem

Parameters

structdentry*dentry

a pointer to a the dentry of the file or directory to be removed.

Description

This function removes a file or directory in securityfs that was previouslycreated with a call to another securityfs function (likesecurityfs_create_file() or variants thereof.)

This function is required to be called in order for the file to beremoved. No automatic cleanup of files will happen when a module isremoved; you are responsible here.

voidsecurityfs_recursive_remove(structdentry*dentry)

recursively removes a file or directory

Parameters

structdentry*dentry

a pointer to a the dentry of the file or directory to be removed.

Description

This function recursively removes a file or directory in securityfs that waspreviously created with a call to another securityfs function (likesecurityfs_create_file() or variants thereof.)

Audit Interfaces

structaudit_buffer*audit_log_start(structaudit_context*ctx,gfp_tgfp_mask,inttype)

obtain an audit buffer

Parameters

structaudit_context*ctx

audit_context (may be NULL)

gfp_tgfp_mask

type of allocation

inttype

audit message type

Description

Returns audit_buffer pointer on success or NULL on error.

Obtain an audit buffer. This routine does locking to obtain theaudit buffer, but then no locking is required for calls toaudit_log_*format. If the task (ctx) is a task that is currently in asyscall, then the syscall is marked as auditable and an audit recordwill be written at syscall exit. If there is no associated task, thentask context (ctx) should be NULL.

voidaudit_log_format(structaudit_buffer*ab,constchar*fmt,...)

format a message into the audit buffer.

Parameters

structaudit_buffer*ab

audit_buffer

constchar*fmt

format string

...

optional parameters matchingfmt string

Description

All the work is done in audit_log_vformat.

voidaudit_log_end(structaudit_buffer*ab)

end one audit record

Parameters

structaudit_buffer*ab

the audit_buffer

Description

We can not do a netlink send inside an irq context because it blocks (lastarg, flags, is not set to MSG_DONTWAIT), so the audit buffer is placed on aqueue and a kthread is scheduled to remove them from the queue outside theirq context. May be called in any context.

voidaudit_log(structaudit_context*ctx,gfp_tgfp_mask,inttype,constchar*fmt,...)

Log an audit record

Parameters

structaudit_context*ctx

audit context

gfp_tgfp_mask

type of allocation

inttype

audit message type

constchar*fmt

format string to use

...

variable parameters matching the format string

Description

This is a convenience function that calls audit_log_start,audit_log_vformat, and audit_log_end. It may be calledin any context.

int__audit_filter_op(structtask_struct*tsk,structaudit_context*ctx,structlist_head*list,structaudit_names*name,unsignedlongop)

common filter helper for operations (syscall/uring/etc)

Parameters

structtask_struct*tsk

associated task

structaudit_context*ctx

audit context

structlist_head*list

audit filter list

structaudit_names*name

audit_name (can be NULL)

unsignedlongop

current syscall/uring_op

Description

Run the udit filters specified inlist againsttsk usingctx,name, andop, as necessary; the caller is responsible for ensuringthat the call is made while the RCU read lock is held. Thenameparameter can be NULL, but all others must be specified.Returns 1/true if the filter finds a match, 0/false if none are found.

voidaudit_filter_uring(structtask_struct*tsk,structaudit_context*ctx)

apply filters to an io_uring operation

Parameters

structtask_struct*tsk

associated task

structaudit_context*ctx

audit context

voidaudit_reset_context(structaudit_context*ctx)

reset a audit_context structure

Parameters

structaudit_context*ctx

the audit_context to reset

Description

All fields in the audit_context will be reset to an initial state, allreferences held by fields will be dropped, and private memory will bereleased. When this function returns the audit_context will be suitablefor reuse, so long as the passed context is not NULL or a dummy context.

intaudit_alloc(structtask_struct*tsk)

allocate an audit context block for a task

Parameters

structtask_struct*tsk

task

Description

Filter on the task information and allocate a per-task audit contextif necessary. Doing so turns on system call auditing for thespecified task. This is called from copy_process, so no lock isneeded.

voidaudit_log_uring(structaudit_context*ctx)

generate a AUDIT_URINGOP record

Parameters

structaudit_context*ctx

the audit context

void__audit_free(structtask_struct*tsk)

free a per-task audit context

Parameters

structtask_struct*tsk

task whose audit context block to free

Description

Called from copy_process, do_exit, and the io_uring code

voidaudit_return_fixup(structaudit_context*ctx,intsuccess,longcode)

fixup the return codes in the audit_context

Parameters

structaudit_context*ctx

the audit_context

intsuccess

true/false value to indicate if the operation succeeded or not

longcode

operation return code

Description

We need to fixup the return code in the audit logs if the actual returncodes are later going to be fixed by the arch specific signal handlers.

void__audit_uring_entry(u8op)

prepare the kernel task’s audit context for io_uring

Parameters

u8op

the io_uring opcode

Description

This is similar to audit_syscall_entry() but is intended for use by io_uringoperations. This function should only ever be called fromaudit_uring_entry() as we rely on the audit context checking present in thatfunction.

void__audit_uring_exit(intsuccess,longcode)

wrap up the kernel task’s audit context after io_uring

Parameters

intsuccess

true/false value to indicate if the operation succeeded or not

longcode

operation return code

Description

This is similar to audit_syscall_exit() but is intended for use by io_uringoperations. This function should only ever be called fromaudit_uring_exit() as we rely on the audit context checking present in thatfunction.

void__audit_syscall_entry(intmajor,unsignedlonga1,unsignedlonga2,unsignedlonga3,unsignedlonga4)

fill in an audit record at syscall entry

Parameters

intmajor

major syscall type (function)

unsignedlonga1

additional syscall register 1

unsignedlonga2

additional syscall register 2

unsignedlonga3

additional syscall register 3

unsignedlonga4

additional syscall register 4

Description

Fill in audit context at syscall entry. This only happens if theaudit context was created when the task was created and the state orfilters demand the audit context be built. If the state from theper-task filter or from the per-syscall filter is AUDIT_STATE_RECORD,then the record will be written at syscall exit time (otherwise, itwill only be written if another part of the kernel requests that itbe written).

void__audit_syscall_exit(intsuccess,longreturn_code)

deallocate audit context after a system call

Parameters

intsuccess

success value of the syscall

longreturn_code

return value of the syscall

Description

Tear down after system call. If the audit context has been marked asauditable (either because of the AUDIT_STATE_RECORD state fromfiltering, or because some other part of the kernel wrote an auditmessage), then write out the syscall information. In call cases,free the names stored from getname().

structfilename*__audit_reusename(__userconstchar*uptr)

fill out filename with info from existing entry

Parameters

const__userchar*uptr

userland ptr to pathname

Description

Search the audit_names list for the current audit context. If there is anexisting entry with a matching “uptr” then return the filenameassociated with that audit_name. If not, return NULL.

void__audit_getname(structfilename*name)

add a name to the list

Parameters

structfilename*name

name to add

Description

Add a name to the list of audit names for this context.Called from fs/namei.c:getname().

void__audit_inode(structfilename*name,conststructdentry*dentry,unsignedintflags)

store the inode and device from a lookup

Parameters

structfilename*name

name being audited

conststructdentry*dentry

dentry being audited

unsignedintflags

attributes for this particular entry

intauditsc_get_stamp(structaudit_context*ctx,structtimespec64*t,unsignedint*serial)

get local copies of audit_context values

Parameters

structaudit_context*ctx

audit_context for the task

structtimespec64*t

timespec64 to store time recorded in the audit_context

unsignedint*serial

serial value that is recorded in the audit_context

Description

Also sets the context as auditable.

void__audit_mq_open(intoflag,umode_tmode,structmq_attr*attr)

record audit data for a POSIX MQ open

Parameters

intoflag

open flag

umode_tmode

mode bits

structmq_attr*attr

queue attributes

void__audit_mq_sendrecv(mqd_tmqdes,size_tmsg_len,unsignedintmsg_prio,conststructtimespec64*abs_timeout)

record audit data for a POSIX MQ timed send/receive

Parameters

mqd_tmqdes

MQ descriptor

size_tmsg_len

Message length

unsignedintmsg_prio

Message priority

conststructtimespec64*abs_timeout

Message timeout in absolute time

void__audit_mq_notify(mqd_tmqdes,conststructsigevent*notification)

record audit data for a POSIX MQ notify

Parameters

mqd_tmqdes

MQ descriptor

conststructsigevent*notification

Notification event

void__audit_mq_getsetattr(mqd_tmqdes,structmq_attr*mqstat)

record audit data for a POSIX MQ get/set attribute

Parameters

mqd_tmqdes

MQ descriptor

structmq_attr*mqstat

MQ flags

void__audit_ipc_obj(structkern_ipc_perm*ipcp)

record audit data for ipc object

Parameters

structkern_ipc_perm*ipcp

ipc permissions

void__audit_ipc_set_perm(unsignedlongqbytes,uid_tuid,gid_tgid,umode_tmode)

record audit data for new ipc permissions

Parameters

unsignedlongqbytes

msgq bytes

uid_tuid

msgq user id

gid_tgid

msgq group id

umode_tmode

msgq mode (permissions)

Description

Called only after audit_ipc_obj().

int__audit_socketcall(intnargs,unsignedlong*args)

record audit data for sys_socketcall

Parameters

intnargs

number of args, which should not be more than AUDITSC_ARGS.

unsignedlong*args

args array

void__audit_fd_pair(intfd1,intfd2)

record audit data for pipe and socketpair

Parameters

intfd1

the first file descriptor

intfd2

the second file descriptor

int__audit_sockaddr(intlen,void*a)

record audit data for sys_bind, sys_connect, sys_sendto

Parameters

intlen

data length in user space

void*a

data address in kernel space

Description

Returns 0 for success or NULL context or < 0 on error.

intaudit_signal_info_syscall(structtask_struct*t)

record signal info for syscalls

Parameters

structtask_struct*t

task being signaled

Description

If the audit subsystem is being terminated, record the task (pid)and uid that is doing that.

int__audit_log_bprm_fcaps(structlinux_binprm*bprm,conststructcred*new,conststructcred*old)

store information about a loading bprm and relevant fcaps

Parameters

structlinux_binprm*bprm

pointer to the bprm being processed

conststructcred*new

the proposed new credentials

conststructcred*old

the old credentials

Description

Simply check if the proc already has the caps given by the file and if notstore the priv escalation info for later auditing at the end of the syscall

-Eric

void__audit_log_capset(conststructcred*new,conststructcred*old)

store information about the arguments to the capset syscall

Parameters

conststructcred*new

the new credentials

conststructcred*old

the old (current) credentials

Description

Record the arguments userspace sent to sys_capset for later printing by theaudit system if applicable

voidaudit_core_dumps(longsignr)

record information about processes that end abnormally

Parameters

longsignr

signal value

Description

If a process ends with a core dump, something fishy is going on and weshould record the event for investigation.

voidaudit_seccomp(unsignedlongsyscall,longsignr,intcode)

record information about a seccomp action

Parameters

unsignedlongsyscall

syscall number

longsignr

signal value

intcode

the seccomp action

Description

Record the information associated with a seccomp action. Event filtering forseccomp actions that are not to be logged is done in seccomp_log().Therefore, this function forces auditing independent of the audit_enabledand dummy context state because seccomp actions should be logged even whenaudit is not in use.

intaudit_rule_change(inttype,intseq,void*data,size_tdatasz)

apply all rules to the specified message type

Parameters

inttype

audit message type

intseq

netlink audit message sequence (serial) number

void*data

payload data

size_tdatasz

size of payload data

intaudit_list_rules_send(structsk_buff*request_skb,intseq)

list the audit rules

Parameters

structsk_buff*request_skb

skb of request we are replying to (used to target the reply)

intseq

netlink audit message sequence (serial) number

intparent_len(constchar*path)

find the length of the parent portion of a pathname

Parameters

constchar*path

pathname of which to determine length

intaudit_compare_dname_path(conststructqstr*dname,constchar*path,intparentlen)

compare given dentry name with last component in given path. Return of 0 indicates a match.

Parameters

conststructqstr*dname

dentry name that we’re comparing

constchar*path

full pathname that we’re comparing

intparentlen

length of the parent if known. Passing in AUDIT_NAME_FULLhere indicates that we must compute this value.

Accounting Framework

longsys_acct(constchar__user*name)

enable/disable process accounting

Parameters

constchar__user*name

file name for accounting records or NULL to shutdown accounting

Description

sys_acct() is the only system call needed to implement processaccounting. It takes the name of the file where accounting recordsshould be written. If the filename is NULL, accounting will beshutdown.

Return

0 for success or negative errno values for failure.

voidacct_collect(longexitcode,intgroup_dead)

collect accounting information into pacct_struct

Parameters

longexitcode

task exit code

intgroup_dead

not 0, if this thread is the last one in the process.

voidacct_process(void)

handles process accounting for an exiting task

Parameters

void

no arguments

Block Devices

voidbio_advance(structbio*bio,unsignedintnbytes)

increment/complete a bio by some number of bytes

Parameters

structbio*bio

bio to advance

unsignedintnbytes

number of bytes to complete

Description

This updates bi_sector, bi_size and bi_idx; if the number of bytes tocomplete doesn’t align with a bvec boundary, then bv_len and bv_offset willbe updated on the last bvec as well.

bio will then represent the remaining, uncompleted portion of the io.

structfolio_iter

State for iterating all folios in a bio.

Definition:

struct folio_iter {    struct folio *folio;    size_t offset;    size_t length;};

Members

folio

The current folio we’re iterating. NULL after the last folio.

offset

The byte offset within the current folio.

length

The number of bytes in this iteration (will not cross folioboundary).

bio_for_each_folio_all

bio_for_each_folio_all(fi,bio)

Iterate over each folio in a bio.

Parameters

fi

structfolio_iter which is updated for each folio.

bio

struct bio to iterate over.

structbio*bio_next_split(structbio*bio,intsectors,gfp_tgfp,structbio_set*bs)

get nextsectors from a bio, splitting if necessary

Parameters

structbio*bio

bio to split

intsectors

number of sectors to split from the front ofbio

gfp_tgfp

gfp mask

structbio_set*bs

bio set to allocate from

Return

a bio representing the nextsectors ofbio - if the bio is smallerthansectors, returns the original bio unchanged.

unsignedintbio_add_max_vecs(void*kaddr,unsignedintlen)

number of bio_vecs needed to add data to a bio

Parameters

void*kaddr

kernel virtual address to add

unsignedintlen

length in bytes to add

Description

Calculate how many bio_vecs need to be allocated to add the kernel virtualaddress range in [kaddr:len] in the worse case.

boolbio_is_zone_append(structbio*bio)

is this a zone append bio?

Parameters

structbio*bio

bio to check

Description

Check ifbio is a zone append operation. Core block layer code and end_iohandlers must use this instead of an open coded REQ_OP_ZONE_APPEND checkbecause the block layer can rewrite REQ_OP_ZONE_APPEND to REQ_OP_WRITE ifit is not natively supported.

voidblk_queue_flag_set(unsignedintflag,structrequest_queue*q)

atomically set a queue flag

Parameters

unsignedintflag

flag to be set

structrequest_queue*q

request queue

voidblk_queue_flag_clear(unsignedintflag,structrequest_queue*q)

atomically clear a queue flag

Parameters

unsignedintflag

flag to be cleared

structrequest_queue*q

request queue

constchar*blk_op_str(enumreq_opop)

Return string XXX in the REQ_OP_XXX.

Parameters

enumreq_opop

REQ_OP_XXX.

Description

Centralize block layer function to convert REQ_OP_XXX intostring format. Useful in the debugging and tracing bio or request. Forinvalid REQ_OP_XXX it returns string “UNKNOWN”.

voidblk_sync_queue(structrequest_queue*q)

cancel any pending callbacks on a queue

Parameters

structrequest_queue*q

the queue

Description

The block layer may perform asynchronous callback activityon a queue, such as calling the unplug function after a timeout.A block device may call blk_sync_queue to ensure that anysuch activity is cancelled, thus allowing it to release resourcesthat the callbacks might use. The caller must already have made surethat its ->submit_bio will not re-add plugging prior to callingthis function.

This function does not cancel any asynchronous activity arisingout of elevator or throttling code. That would require elevator_exit()and blkcg_exit_queue() to be called with queue lock initialized.

voidblk_set_pm_only(structrequest_queue*q)

increment pm_only counter

Parameters

structrequest_queue*q

request queue pointer

voidblk_put_queue(structrequest_queue*q)

decrement the request_queue refcount

Parameters

structrequest_queue*q

the request_queue structure to decrement the refcount for

Description

Decrements the refcount of the request_queue and free it when the refcountreaches 0.

boolblk_get_queue(structrequest_queue*q)

increment the request_queue refcount

Parameters

structrequest_queue*q

the request_queue structure to increment the refcount for

Description

Increment the refcount of the request_queue kobject.

Context

Any context.

voidsubmit_bio_noacct(structbio*bio)

re-submit a bio to the block device layer for I/O

Parameters

structbio*bio

The bio describing the location in memory and on the device.

Description

This is a version ofsubmit_bio() that shall only be used for I/O that isresubmitted to lower level drivers by stacking block drivers. All filesystems and other upper level users of the block layer should usesubmit_bio() instead.

voidsubmit_bio(structbio*bio)

submit a bio to the block device layer for I/O

Parameters

structbio*bio

Thestructbio which describes the I/O

Description

submit_bio() is used to submit I/O requests to block devices. It is passed afully set upstructbio that describes the I/O that needs to be done. Thebio will be send to the device described by the bi_bdev field.

The success/failure status of the request, along with notification ofcompletion, is delivered asynchronously through the ->bi_end_io() callbackinbio. The bio must NOT be touched by the caller until ->bi_end_io() hasbeen called.

intbio_poll(structbio*bio,structio_comp_batch*iob,unsignedintflags)

poll for BIO completions

Parameters

structbio*bio

bio to poll for

structio_comp_batch*iob

batches of IO

unsignedintflags

BLK_POLL_* flags that control the behavior

Description

Poll for completions on queue associated with the bio. Returns number ofcompleted entries found.

Note

the caller must either be the context that submittedbio, orbe in a RCU critical section to prevent freeing ofbio.

unsignedlongbio_start_io_acct(structbio*bio)

start I/O accounting for bio based drivers

Parameters

structbio*bio

bio to start account for

Description

Returns the start time that should be passed back to bio_end_io_acct().

intblk_lld_busy(structrequest_queue*q)

Check if underlying low-level drivers of a device are busy

Parameters

structrequest_queue*q

the queue of the device being checked

Description

Check if underlying low-level drivers of a device are busy.If the drivers want to export their busy state, they must set ownexporting function using blk_queue_lld_busy() first.

Basically, this function is used only by request stacking driversto stop dispatching requests to underlying devices when underlyingdevices are busy. This behavior helps more I/O merging on the queueof the request stacking driver and prevents I/O throughput regressionon burst I/O load.

Return

0 - Not busy (The request stacking driver should dispatch request)1 - Busy (The request stacking driver should stop dispatching request)

voidblk_start_plug(structblk_plug*plug)

initialize blk_plug and track it inside the task_struct

Parameters

structblk_plug*plug

Thestructblk_plug that needs to be initialized

Description

blk_start_plug() indicates to the block layer an intent by the callerto submit multiple I/O requests in a batch. The block layer may usethis hint to defer submitting I/Os from the caller untilblk_finish_plug()is called. However, the block layer may choose to submit requestsbefore a call toblk_finish_plug() if the number of queued I/OsexceedsBLK_MAX_REQUEST_COUNT, or if the size of the I/O is larger thanBLK_PLUG_FLUSH_SIZE. The queued I/Os may also be submitted early ifthe task schedules (see below).

Tracking blk_plug inside the task_struct will help with auto-flushing thepending I/O should the task end up blocking betweenblk_start_plug() andblk_finish_plug(). This is important from a performance perspective, butalso ensures that we don’t deadlock. For instance, if the task is blockingfor a memory allocation, memory reclaim could end up wanting to free apage belonging to that request that is currently residing in our privateplug. By flushing the pending I/O when the process goes to sleep, we avoidthis kind of deadlock.

voidblk_finish_plug(structblk_plug*plug)

mark the end of a batch of submitted I/O

Parameters

structblk_plug*plug

Thestructblk_plug passed toblk_start_plug()

Description

Indicate that a batch of I/O submissions is complete. This functionmust be paired with an initial call toblk_start_plug(). The intentis to allow the block layer to optimize I/O submission. See thedocumentation forblk_start_plug() for more information.

intblk_queue_enter(structrequest_queue*q,blk_mq_req_flags_tflags)

try to increase q->q_usage_counter

Parameters

structrequest_queue*q

request queue pointer

blk_mq_req_flags_tflags

BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PM

intblk_rq_map_user_iov(structrequest_queue*q,structrequest*rq,structrq_map_data*map_data,conststructiov_iter*iter,gfp_tgfp_mask)

map user data to a request, for passthrough requests

Parameters

structrequest_queue*q

request queue where request should be inserted

structrequest*rq

request to map data to

structrq_map_data*map_data

pointer to the rq_map_data holding pages (if necessary)

conststructiov_iter*iter

iovec iterator

gfp_tgfp_mask

memory allocation flags

Description

Data will be mapped directly for zero copy I/O, if possible. Otherwisea kernel bounce buffer is used.

A matchingblk_rq_unmap_user() must be issued at the end of I/O, whilestill in process context.

intblk_rq_unmap_user(structbio*bio)

unmap a request with user data

Parameters

structbio*bio

start of bio list

Description

Unmap a rq previously mapped by blk_rq_map_user(). The caller mustsupply the original rq->bio from the blk_rq_map_user() return, sincethe I/O completion may have changed rq->bio.

intblk_rq_map_kern(structrequest*rq,void*kbuf,unsignedintlen,gfp_tgfp_mask)

map kernel data to a request, for passthrough requests

Parameters

structrequest*rq

request to fill

void*kbuf

the kernel buffer

unsignedintlen

length of user data

gfp_tgfp_mask

memory allocation flags

Description

Data will be mapped directly if possible. Otherwise a bouncebuffer is used. Can be called multiple times to append multiplebuffers.

intblk_register_queue(structgendisk*disk)

register a block layer queue with sysfs

Parameters

structgendisk*disk

Disk of which the request queue should be registered with sysfs.

voidblk_unregister_queue(structgendisk*disk)

counterpart ofblk_register_queue()

Parameters

structgendisk*disk

Disk of which the request queue should be unregistered from sysfs.

Note

the caller is responsible for guaranteeing that this function is calledafterblk_register_queue() has finished.

voidblk_set_stacking_limits(structqueue_limits*lim)

set default limits for stacking devices

Parameters

structqueue_limits*lim

the queue_limits structure to reset

Description

Prepare queue limits for applying limits from underlying devices usingblk_stack_limits().

intqueue_limits_commit_update(structrequest_queue*q,structqueue_limits*lim)

commit an atomic update of queue limits

Parameters

structrequest_queue*q

queue to update

structqueue_limits*lim

limits to apply

Description

Apply the limits inlim that were obtained from queue_limits_start_update()and updated by the caller toq. The caller must have frozen the queue orensure that there are no outstanding I/Os by other means.

Returns 0 if successful, else a negative error code.

intqueue_limits_commit_update_frozen(structrequest_queue*q,structqueue_limits*lim)

commit an atomic update of queue limits

Parameters

structrequest_queue*q

queue to update

structqueue_limits*lim

limits to apply

Description

Apply the limits inlim that were obtained from queue_limits_start_update()and updated with the new values by the caller toq. Freezes the queuebefore the update and unfreezes it after.

Returns 0 if successful, else a negative error code.

intqueue_limits_set(structrequest_queue*q,structqueue_limits*lim)

apply queue limits to queue

Parameters

structrequest_queue*q

queue to update

structqueue_limits*lim

limits to apply

Description

Apply the limits inlim that were freshly initialized toq.To update existing limits use queue_limits_start_update() andqueue_limits_commit_update() instead.

Returns 0 if successful, else a negative error code.

intblk_stack_limits(structqueue_limits*t,structqueue_limits*b,sector_tstart)

adjust queue_limits for stacked devices

Parameters

structqueue_limits*t

the stacking driver limits (top device)

structqueue_limits*b

the underlying queue limits (bottom, component device)

sector_tstart

first data sector within component device

Description

This function is used by stacking drivers like MD and DM to ensurethat all component devices have compatible block sizes andalignments. The stacking driver must provide a queue_limitsstruct (top) and then iteratively call the stacking function forall component (bottom) devices. The stacking function willattempt to combine the values and ensure proper alignment.

Returns 0 if the top and bottom queue_limits are compatible. Thetop device’s block sizes and alignment offsets may be adjusted toensure alignment with the bottom device. If no compatible sizesand alignments exist, -1 is returned and the resulting topqueue_limits will have the misaligned flag set to indicate thatthe alignment_offset is undefined.

voidqueue_limits_stack_bdev(structqueue_limits*t,structblock_device*bdev,sector_toffset,constchar*pfx)

adjust queue_limits for stacked devices

Parameters

structqueue_limits*t

the stacking driver limits (top device)

structblock_device*bdev

the underlying block device (bottom)

sector_toffset

offset to beginning of data within component device

constchar*pfx

prefix to use for warnings logged

Description

This function is used by stacking drivers like MD and DM to ensurethat all component devices have compatible block sizes andalignments. The stacking driver must provide a queue_limitsstruct (top) and then iteratively call the stacking function forall component (bottom) devices. The stacking function willattempt to combine the values and ensure proper alignment.

boolqueue_limits_stack_integrity(structqueue_limits*t,structqueue_limits*b)

stack integrity profile

Parameters

structqueue_limits*t

target queue limits

structqueue_limits*b

base queue limits

Description

Check if the integrity profile in theb can be stacked into thetargett. Stacking is possible if either:

  1. does not have any integrity information stacked into it yet

  2. the integrity profile inb is identical to the one int

Ifb can be stacked intot, returntrue. Else returnfalse and clear theintegrity information int.

voidblk_set_queue_depth(structrequest_queue*q,unsignedintdepth)

tell the block layer about the device queue depth

Parameters

structrequest_queue*q

the request queue for the device

unsignedintdepth

queue depth

intblkdev_issue_flush(structblock_device*bdev)

queue a flush

Parameters

structblock_device*bdev

blockdev to issue flush for

Description

Issue a flush for the block device in question.

intblkdev_issue_discard(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask)

queue a discard

Parameters

structblock_device*bdev

blockdev to issue discard for

sector_tsector

start sector

sector_tnr_sects

number of sectors to discard

gfp_tgfp_mask

memory allocation flags (for bio_alloc)

Description

Issue a discard request for the sectors in question.

int__blkdev_issue_zeroout(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask,structbio**biop,unsignedflags)

generate number of zero filed write bios

Parameters

structblock_device*bdev

blockdev to issue

sector_tsector

start sector

sector_tnr_sects

number of sectors to write

gfp_tgfp_mask

memory allocation flags (for bio_alloc)

structbio**biop

pointer to anchor bio

unsignedflags

controls detailed behavior

Description

Zero-fill a block range, either using hardware offload or by explicitlywriting zeroes to the device.

If a device is using logical block provisioning, the underlying space willnot be released ifflags contains BLKDEV_ZERO_NOUNMAP.

Ifflags contains BLKDEV_ZERO_NOFALLBACK, the function will return-EOPNOTSUPP if no explicit hardware offload for zeroing is provided.

intblkdev_issue_zeroout(structblock_device*bdev,sector_tsector,sector_tnr_sects,gfp_tgfp_mask,unsignedflags)

zero-fill a block range

Parameters

structblock_device*bdev

blockdev to write

sector_tsector

start sector

sector_tnr_sects

number of sectors to write

gfp_tgfp_mask

memory allocation flags (for bio_alloc)

unsignedflags

controls detailed behavior

Description

Zero-fill a block range, either using hardware offload or by explicitlywriting zeroes to the device. See__blkdev_issue_zeroout() for thevalid values forflags.

intblk_rq_map_integrity_sg(structrequest*rq,structscatterlist*sglist)

Map integrity metadata into a scatterlist

Parameters

structrequest*rq

request to map

structscatterlist*sglist

target scatterlist

Description

Map the integrity vectors in request into ascatterlist. The scatterlist must be big enough to hold allelements. I.e. sized using blk_rq_count_integrity_sg() orrq->nr_integrity_segments.

intblk_trace_ioctl(structblock_device*bdev,unsignedcmd,char__user*arg)

handle the ioctls associated with tracing

Parameters

structblock_device*bdev

the block device

unsignedcmd

the ioctl cmd

char__user*arg

the argument data, if any

voidblk_trace_shutdown(structrequest_queue*q)

stop and cleanup trace structures

Parameters

structrequest_queue*q

the request queue associated with the device

voidblk_add_trace_rq(structrequest*rq,blk_status_terror,unsignedintnr_bytes,u32what,u64cgid)

Add a trace for a request oriented action

Parameters

structrequest*rq

the source request

blk_status_terror

return status to log

unsignedintnr_bytes

number of completed bytes

u32what

the action

u64cgid

the cgroup info

Description

Records an action against a request. Will log the bio offset + size.

voidblk_add_trace_bio(structrequest_queue*q,structbio*bio,u32what,interror)

Add a trace for a bio oriented action

Parameters

structrequest_queue*q

queue the io is for

structbio*bio

the source bio

u32what

the action

interror

error, if any

Description

Records an action against a bio. Will log the bio offset + size.

voidblk_add_trace_bio_remap(void*ignore,structbio*bio,dev_tdev,sector_tfrom)

Add a trace for a bio-remap operation

Parameters

void*ignore

trace callback data parameter (not used)

structbio*bio

the source bio

dev_tdev

source device

sector_tfrom

source sector

Description

Called after a bio is remapped to a different device and/or sector.

voidblk_add_trace_rq_remap(void*ignore,structrequest*rq,dev_tdev,sector_tfrom)

Add a trace for a request-remap operation

Parameters

void*ignore

trace callback data parameter (not used)

structrequest*rq

the source request

dev_tdev

target device

sector_tfrom

source sector

Description

Device mapper remaps request to other devices.Add a trace for that action.

voiddisk_release(structdevice*dev)

releases all allocated resources of the gendisk

Parameters

structdevice*dev

the device representing this disk

Description

This function releases all allocated resources of the gendisk.

Drivers which used __device_add_disk() have a gendisk with a request_queueassigned. Since the request_queue sits on top of the gendisk for thesedrivers we also callblk_put_queue() for them, and we expect therequest_queue refcount to reach 0 at this point, and so the request_queuewill also be freed prior to the disk.

Context

can sleep

unsignedintbdev_count_inflight(structblock_device*part)

get the number of inflight IOs for a block device.

Parameters

structblock_device*part

the block device.

Description

Inflight here means started IO accounting, from bdev_start_io_acct() forbio-based block device, and from blk_account_io_start() for rq-based blockdevice.

int__register_blkdev(unsignedintmajor,constchar*name,void(*probe)(dev_tdevt))

register a new block device

Parameters

unsignedintmajor

the requested major device number [1..BLKDEV_MAJOR_MAX-1]. Ifmajor = 0, try to allocate any unused major number.

constchar*name

the name of the new block device as a zero terminated string

void(*probe)(dev_tdevt)

pre-devtmpfs / pre-udev callback used to create disks when theirpre-created device node is accessed. When a probe call usesadd_disk() and it fails the driver must cleanup resources. Thisinterface may soon be removed.

Description

Thename must be unique within the system.

The return value depends on themajor input parameter:

  • if a major device number was requested in range [1..BLKDEV_MAJOR_MAX-1]then the function returns zero on success, or a negative error code

  • if any unused major number was requested withmajor = 0 parameterthen the return value is the allocated major number in range[1..BLKDEV_MAJOR_MAX-1] or a negative error code otherwise

SeeLinux allocated devices (4.x+ version) for the list of allocatedmajor numbers.

Use register_blkdev instead for any new code.

intadd_disk_fwnode(structdevice*parent,structgendisk*disk,conststructattribute_group**groups,structfwnode_handle*fwnode)

add disk information to kernel list with fwnode

Parameters

structdevice*parent

parent device for the disk

structgendisk*disk

per-device partitioning information

conststructattribute_group**groups

Additional per-device sysfs groups

structfwnode_handle*fwnode

attached disk fwnode

Description

This function registers the partitioning information indiskwith the kernel. Also attach a fwnode to the disk device.

intdevice_add_disk(structdevice*parent,structgendisk*disk,conststructattribute_group**groups)

add disk information to kernel list

Parameters

structdevice*parent

parent device for the disk

structgendisk*disk

per-device partitioning information

conststructattribute_group**groups

Additional per-device sysfs groups

Description

This function registers the partitioning information indiskwith the kernel.

voidblk_mark_disk_dead(structgendisk*disk)

mark a disk as dead

Parameters

structgendisk*disk

disk to mark as dead

Description

Mark as disk as dead (e.g. surprise removed) and don’t accept any new I/Oto this disk.

voiddel_gendisk(structgendisk*disk)

remove the gendisk

Parameters

structgendisk*disk

the struct gendisk to remove

Description

Removes the gendisk and all its associated resources. This deletes thepartitions associated with the gendisk, and unregisters the associatedrequest_queue.

This is the counter to the respective __device_add_disk() call.

The final removal of the struct gendisk happens when its refcount reaches 0withput_disk(), which should be called afterdel_gendisk(), if__device_add_disk() was used.

Drivers exist which depend on the release of the gendisk to be synchronous,it should not be deferred.

Context

can sleep

voidinvalidate_disk(structgendisk*disk)

invalidate the disk

Parameters

structgendisk*disk

the struct gendisk to invalidate

Description

A helper to invalidates the disk. It will clean the disk’s associatedbuffer/page caches and reset its internal states so that the diskcan be reused by the drivers.

Context

can sleep

voidput_disk(structgendisk*disk)

decrements the gendisk refcount

Parameters

structgendisk*disk

the struct gendisk to decrement the refcount for

Description

This decrements the refcount for the struct gendisk. When this reaches 0we’ll havedisk_release() called.

Note

for blk-mq disk put_disk must be called before freeing the tag_setwhen handling probe errors (that is before add_disk() is called).

Context

Any context, but the last reference must not be dropped fromatomic context.

voidset_disk_ro(structgendisk*disk,boolread_only)

set a gendisk read-only

Parameters

structgendisk*disk

gendisk to operate on

boolread_only

true to set the disk read-only,false set the disk read/write

Description

This function is used to indicate whether a given disk device should have itsread-only flag set.set_disk_ro() is typically used by device drivers toindicate whether the underlying physical device is write-protected.

intbdev_validate_blocksize(structblock_device*bdev,intblock_size)

check that this block size is acceptable

Parameters

structblock_device*bdev

blockdevice to check

intblock_size

block size to check

Description

For block device users that do not use buffer heads or the block devicepage cache, make sure that this block size can be used with the device.

Return

On success zero is returned, negative error code on failure.

intbdev_freeze(structblock_device*bdev)

lock a filesystem and force it into a consistent state

Parameters

structblock_device*bdev

blockdevice to lock

Description

If a superblock is found on this device, we take the s_umount semaphoreon it to make sure nobody unmounts until the snapshot creation is done.The reference counter (bd_fsfreeze_count) guarantees that only the lastunfreeze process can unfreeze the frozen filesystem actually when multiplefreeze requests arrive simultaneously. It counts up inbdev_freeze() andcount down inbdev_thaw(). When it becomes 0, thaw_bdev() will unfreezeactually.

Return

On success zero is returned, negative error code on failure.

intbdev_thaw(structblock_device*bdev)

unlock filesystem

Parameters

structblock_device*bdev

blockdevice to unlock

Description

Unlocks the filesystem and marks it writeable again afterbdev_freeze().

Return

On success zero is returned, negative error code on failure.

intbd_prepare_to_claim(structblock_device*bdev,void*holder,conststructblk_holder_ops*hops)

claim a block device

Parameters

structblock_device*bdev

block device of interest

void*holder

holder trying to claimbdev

conststructblk_holder_ops*hops

holder ops.

Description

Claimbdev. This function fails ifbdev is already claimed by anotherholder and waits if another claiming is in progress. return, the callerhas ownership of bd_claiming and bd_holder[s].

Return

0 ifbdev can be claimed, -EBUSY otherwise.

voidbd_abort_claiming(structblock_device*bdev,void*holder)

abort claiming of a block device

Parameters

structblock_device*bdev

block device of interest

void*holder

holder that has claimedbdev

Description

Abort claiming of a block device when the exclusive open failed. This can bealso used when exclusive open is not actually desired and we just neededto block other exclusive openers for a while.

voidbdev_fput(structfile*bdev_file)

yield claim to the block device and put the file

Parameters

structfile*bdev_file

open block device

Description

Yield claim on the block device and put the file. Ensure that theblock device can be reclaimed before the file is closed which is adeferred operation.

intlookup_bdev(constchar*pathname,dev_t*dev)

Look up a struct block_device by name.

Parameters

constchar*pathname

Name of the block device in the filesystem.

dev_t*dev

Pointer to the block device’s dev_t, if found.

Description

Lookup the block device’s dev_t atpathname in the currentnamespace if possible and return it indev.

Context

May sleep.

Return

0 if succeeded, negative errno otherwise.

voidbdev_mark_dead(structblock_device*bdev,boolsurprise)

mark a block device as dead

Parameters

structblock_device*bdev

block device to operate on

boolsurprise

indicate a surprise removal

Description

Tell the file system that this devices or media is dead. Ifsurprise is settotrue the device or media is already gone, if not we are preparing for anorderly removal.

This calls into the file system, which then typicall syncs out all dirty dataand writes back inodes and then invalidates any cached data in the inodes onthe file system. In addition we also invalidate the block device mapping.

Char devices

intregister_chrdev_region(dev_tfrom,unsignedcount,constchar*name)

register a range of device numbers

Parameters

dev_tfrom

the first in the desired range of device numbers; must includethe major number.

unsignedcount

the number of consecutive device numbers required

constchar*name

the name of the device or driver.

Description

Return value is zero on success, a negative error code on failure.

intalloc_chrdev_region(dev_t*dev,unsignedbaseminor,unsignedcount,constchar*name)

register a range of char device numbers

Parameters

dev_t*dev

output parameter for first assigned number

unsignedbaseminor

first of the requested range of minor numbers

unsignedcount

the number of minor numbers required

constchar*name

the name of the associated device or driver

Description

Allocates a range of char device numbers. The major number will bechosen dynamically, and returned (along with the first minor number)indev. Returns zero or a negative error code.

int__register_chrdev(unsignedintmajor,unsignedintbaseminor,unsignedintcount,constchar*name,conststructfile_operations*fops)

create and register a cdev occupying a range of minors

Parameters

unsignedintmajor

major device number or 0 for dynamic allocation

unsignedintbaseminor

first of the requested range of minor numbers

unsignedintcount

the number of minor numbers required

constchar*name

name of this range of devices

conststructfile_operations*fops

file operations associated with this devices

Description

Ifmajor == 0 this functions will dynamically allocate a major and returnits number.

Ifmajor > 0 this function will attempt to reserve a device with the givenmajor number and will return zero on success.

Returns a -ve errno on failure.

The name of this device has nothing to do with the name of the device in/dev. It only helps to keep track of the different owners of devices. Ifyour module name has only one type of devices it’s ok to use e.g. the nameof the module here.

voidunregister_chrdev_region(dev_tfrom,unsignedcount)

unregister a range of device numbers

Parameters

dev_tfrom

the first in the range of numbers to unregister

unsignedcount

the number of device numbers to unregister

Description

This function will unregister a range ofcount device numbers,starting withfrom. The caller should normally be the one whoallocated those numbers in the first place...

void__unregister_chrdev(unsignedintmajor,unsignedintbaseminor,unsignedintcount,constchar*name)

unregister and destroy a cdev

Parameters

unsignedintmajor

major device number

unsignedintbaseminor

first of the range of minor numbers

unsignedintcount

the number of minor numbers this cdev is occupying

constchar*name

name of this range of devices

Description

Unregister and destroy the cdev occupying the region described bymajor,baseminor andcount. This function undoes what__register_chrdev() did.

intcdev_add(structcdev*p,dev_tdev,unsignedcount)

add a char device to the system

Parameters

structcdev*p

the cdev structure for the device

dev_tdev

the first device number for which this device is responsible

unsignedcount

the number of consecutive minor numbers corresponding to thisdevice

Description

cdev_add() adds the device represented byp to the system, making itlive immediately. A negative error code is returned on failure.

voidcdev_set_parent(structcdev*p,structkobject*kobj)

set the parent kobject for a char device

Parameters

structcdev*p

the cdev structure

structkobject*kobj

the kobject to take a reference to

Description

cdev_set_parent() sets a parent kobject which will be referencedappropriately so the parent is not freed before the cdev. Thisshould be called before cdev_add.

intcdev_device_add(structcdev*cdev,structdevice*dev)

add a char device and it’s correspondingstructdevice, linkink

Parameters

structcdev*cdev

the cdev structure

structdevice*dev

the device structure

Description

cdev_device_add() adds the char device represented bycdev to the system,just as cdev_add does. It then addsdev to the system using device_addThe dev_t for the char device will be taken from thestructdevice whichneeds to be initialized first. This helper function correctly takes areference to the parent device so the parent will not get released untilall references to the cdev are released.

This helper uses dev->devt for the device number. If it is not setit will not add the cdev and it will be equivalent to device_add.

This function should be used whenever the struct cdev and thestructdevice are members of the same structure whose lifetime ismanaged by thestructdevice.

NOTE

Callers must assume that userspace was able to open the cdev andcan call cdev fops callbacks at any time, even if this function fails.

voidcdev_device_del(structcdev*cdev,structdevice*dev)

inverse of cdev_device_add

Parameters

structcdev*cdev

the cdev structure

structdevice*dev

the device structure

Description

cdev_device_del() is a helper function to call cdev_del and device_del.It should be used whenever cdev_device_add is used.

If dev->devt is not set it will not remove the cdev and will be equivalentto device_del.

NOTE

This guarantees that associated sysfs callbacks are not runningor runnable, however any cdevs already open will remain and their fopswill still be callable even after this function returns.

voidcdev_del(structcdev*p)

remove a cdev from the system

Parameters

structcdev*p

the cdev structure to be removed

Description

cdev_del() removesp from the system, possibly freeing the structureitself.

NOTE

This guarantees that cdev device will no longer be able to beopened, however any cdevs already open will remain and their fops willstill be callable even after cdev_del returns.

structcdev*cdev_alloc(void)

allocate a cdev structure

Parameters

void

no arguments

Description

Allocates and returns a cdev structure, or NULL on failure.

voidcdev_init(structcdev*cdev,conststructfile_operations*fops)

initialize a cdev structure

Parameters

structcdev*cdev

the structure to initialize

conststructfile_operations*fops

the file_operations for this device

Description

Initializescdev, rememberingfops, making it ready to add to thesystem withcdev_add().

Clock Framework

The clock framework defines programming interfaces to support softwaremanagement of the system clock tree. This framework is widely used withSystem-On-Chip (SOC) platforms to support power management and variousdevices which may need custom clock rates. Note that these “clocks”don’t relate to timekeeping or real time clocks (RTCs), each of whichhave separate frameworks. Thesestructclkinstances may be used to manage for example a 96 MHz signal that is usedto shift bits into and out of peripherals or busses, or otherwisetrigger synchronous state machine transitions in system hardware.

Power management is supported by explicit software clock gating: unusedclocks are disabled, so the system doesn’t waste power changing thestate of transistors that aren’t in active use. On some systems this maybe backed by hardware clock gating, where clocks are gated without beingdisabled in software. Sections of chips that are powered but not clockedmay be able to retain their last state. This low power state is oftencalled aretention mode. This mode still incurs leakage currents,especially with finer circuit geometries, but for CMOS circuits power ismostly used by clocked state changes.

Power-aware drivers only enable their clocks when the device they manageis in active use. Also, system sleep states often differ according towhich clock domains are active: while a “standby” state may allow wakeupfrom several active domains, a “mem” (suspend-to-RAM) state may requirea more wholesale shutdown of clocks derived from higher speed PLLs andoscillators, limiting the number of possible wakeup event sources. Adriver’s suspend method may need to be aware of system-specific clockconstraints on the target sleep state.

Some platforms support programmable clock generators. These can be usedby external chips of various kinds, such as other CPUs, multimediacodecs, and devices with strict requirements for interface clocking.

structclk_notifier

associate a clk with a notifier

Definition:

struct clk_notifier {    struct clk                      *clk;    struct srcu_notifier_head       notifier_head;    struct list_head                node;};

Members

clk

struct clk * to associate the notifier with

notifier_head

a blocking_notifier_head for this clk

node

linked list pointers

Description

A list ofstructclk_notifier is maintained by the notifier code.An entry is created whenever code registers the first notifier on aparticularclk. Future notifiers on thatclk are added to thenotifier_head.

structclk_notifier_data

rate data to pass to the notifier callback

Definition:

struct clk_notifier_data {    struct clk              *clk;    unsigned long           old_rate;    unsigned long           new_rate;};

Members

clk

struct clk * being changed

old_rate

previous rate of this clk

new_rate

new rate of this clk

Description

For a pre-notifier, old_rate is the clk’s rate before this ratechange, and new_rate is what the rate will be in the future. For apost-notifier, old_rate and new_rate are both set to the clk’scurrent rate (this was done to optimize the implementation).

structclk_bulk_data

Data used for bulk clk operations.

Definition:

struct clk_bulk_data {    const char              *id;    struct clk              *clk;};

Members

id

clock consumer ID

clk

struct clk * to store the associated clock

Description

The CLK APIs provide a series of clk_bulk_() API calls asa convenience to consumers which require multiple clks. Thisstructure is used to manage data for these calls.

intclk_notifier_register(structclk*clk,structnotifier_block*nb)

register a clock rate-change notifier callback

Parameters

structclk*clk

clock whose rate we are interested in

structnotifier_block*nb

notifier block with callback function pointer

Description

ProTip: debugging across notifier chains can be frustrating. Make sure thatyour notifier callback function prints a nice big warning in case offailure.

intclk_notifier_unregister(structclk*clk,structnotifier_block*nb)

unregister a clock rate-change notifier callback

Parameters

structclk*clk

clock whose rate we are no longer interested in

structnotifier_block*nb

notifier block which will be unregistered

intdevm_clk_notifier_register(structdevice*dev,structclk*clk,structnotifier_block*nb)

register a managed rate-change notifier callback

Parameters

structdevice*dev

device for clock “consumer”

structclk*clk

clock whose rate we are interested in

structnotifier_block*nb

notifier block with callback function pointer

Description

Returns 0 on success, -EERROR otherwise

longclk_get_accuracy(structclk*clk)

obtain the clock accuracy in ppb (parts per billion) for a clock source.

Parameters

structclk*clk

clock source

Description

This gets the clock source accuracy expressed in ppb.A perfect clock returns 0.

intclk_set_phase(structclk*clk,intdegrees)

adjust the phase shift of a clock signal

Parameters

structclk*clk

clock signal source

intdegrees

number of degrees the signal is shifted

Description

Shifts the phase of a clock signal by the specified degrees. Returns 0 onsuccess, -EERROR otherwise.

intclk_get_phase(structclk*clk)

return the phase shift of a clock signal

Parameters

structclk*clk

clock signal source

Description

Returns the phase shift of a clock node in degrees, otherwise returns-EERROR.

intclk_set_duty_cycle(structclk*clk,unsignedintnum,unsignedintden)

adjust the duty cycle ratio of a clock signal

Parameters

structclk*clk

clock signal source

unsignedintnum

numerator of the duty cycle ratio to be applied

unsignedintden

denominator of the duty cycle ratio to be applied

Description

Adjust the duty cycle of a clock signal by the specified ratio. Returns 0 onsuccess, -EERROR otherwise.

intclk_get_scaled_duty_cycle(structclk*clk,unsignedintscale)

return the duty cycle ratio of a clock signal

Parameters

structclk*clk

clock signal source

unsignedintscale

scaling factor to be applied to represent the ratio as an integer

Description

Returns the duty cycle ratio multiplied by the scale provided, otherwisereturns -EERROR.

boolclk_is_match(conststructclk*p,conststructclk*q)

check if two clk’s point to the same hardware clock

Parameters

conststructclk*p

clk compared against q

conststructclk*q

clk compared against p

Description

Returns true if the two struct clk pointers both point to the same hardwareclock node. Put differently, returns true ifp andqshare the samestructclk_core object.

Returns false otherwise. Note that two NULL clks are treated as matching.

intclk_rate_exclusive_get(structclk*clk)

get exclusivity over the rate control of a producer

Parameters

structclk*clk

clock source

Description

This function allows drivers to get exclusive control over the rate of aprovider. It prevents any other consumer to execute, even indirectly,opereation which could alter the rate of the provider or cause glitches

If exlusivity is claimed more than once on clock, even by the same driver,the rate effectively gets locked as exclusivity can’t be preempted.

Must not be called from within atomic context.

Returns success (0) or negative errno.

intdevm_clk_rate_exclusive_get(structdevice*dev,structclk*clk)

devm variant of clk_rate_exclusive_get

Parameters

structdevice*dev

device the exclusivity is bound to

structclk*clk

clock source

Description

Callsclk_rate_exclusive_get() onclk and registers a devm cleanup handlerondev to callclk_rate_exclusive_put().

Must not be called from within atomic context.

voidclk_rate_exclusive_put(structclk*clk)

release exclusivity over the rate control of a producer

Parameters

structclk*clk

clock source

Description

This function allows drivers to release the exclusivity it previously gotfromclk_rate_exclusive_get()

The caller must balance the number ofclk_rate_exclusive_get() andclk_rate_exclusive_put() calls.

Must not be called from within atomic context.

intclk_prepare(structclk*clk)

prepare a clock source

Parameters

structclk*clk

clock source

Description

This prepares the clock source for use.

Must not be called from within atomic context.

boolclk_is_enabled_when_prepared(structclk*clk)

indicate if preparing a clock also enables it.

Parameters

structclk*clk

clock source

Description

Returns true ifclk_prepare() implicitly enables the clock, effectivelymakingclk_enable()/clk_disable() no-ops, false otherwise.

This is of interest mainly to the power management code where actuallydisabling the clock also requires unpreparing it to have any materialeffect.

Regardless of the value returned here, the caller must always invokeclk_enable() or clk_prepare_enable() and counterparts for usage countsto be right.

voidclk_unprepare(structclk*clk)

undo preparation of a clock source

Parameters

structclk*clk

clock source

Description

This undoes a previously prepared clock. The caller must balancethe number of prepare and unprepare calls.

Must not be called from within atomic context.

structclk*clk_get(structdevice*dev,constchar*id)

lookup and obtain a reference to a clock producer.

Parameters

structdevice*dev

device for clock “consumer”

constchar*id

clock consumer ID

Description

Returns a struct clk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)

Drivers must assume that the clock source is not enabled.

clk_get should not be called from within interrupt context.

intclk_bulk_get(structdevice*dev,intnum_clks,structclk_bulk_data*clks)

lookup and obtain a number of references to clock producer.

Parameters

structdevice*dev

device for clock “consumer”

intnum_clks

the number of clk_bulk_data

structclk_bulk_data*clks

the clk_bulk_data table of consumer

Description

This helper function allows drivers to get several clk consumers in oneoperation. If any of the clk cannot be acquired then any clksthat were obtained will be freed before returning to the caller.

Returns 0 if all clocks specified in clk_bulk_data table are obtainedsuccessfully, or validIS_ERR() condition containing errno.The implementation usesdev andclk_bulk_data.id to determine theclock consumer, and thereby the clock producer.The clock returned is stored in eachclk_bulk_data.clk field.

Drivers must assume that the clock source is not enabled.

clk_bulk_get should not be called from within interrupt context.

intclk_bulk_get_all(structdevice*dev,structclk_bulk_data**clks)

lookup and obtain all available references to clock producer.

Parameters

structdevice*dev

device for clock “consumer”

structclk_bulk_data**clks

pointer to the clk_bulk_data table of consumer

Description

This helper function allows drivers to get all clk consumers in oneoperation. If any of the clk cannot be acquired then any clksthat were obtained will be freed before returning to the caller.

Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.

Drivers must assume that the clock source is not enabled.

clk_bulk_get should not be called from within interrupt context.

intclk_bulk_get_optional(structdevice*dev,intnum_clks,structclk_bulk_data*clks)

lookup and obtain a number of references to clock producer

Parameters

structdevice*dev

device for clock “consumer”

intnum_clks

the number of clk_bulk_data

structclk_bulk_data*clks

the clk_bulk_data table of consumer

Description

Behaves the same asclk_bulk_get() except where there is no clock producer.In this case, instead of returning -ENOENT, the function returns 0 andNULL for a clk for which a clock producer could not be determined.

intdevm_clk_bulk_get(structdevice*dev,intnum_clks,structclk_bulk_data*clks)

managed get multiple clk consumers

Parameters

structdevice*dev

device for clock “consumer”

intnum_clks

the number of clk_bulk_data

structclk_bulk_data*clks

the clk_bulk_data table of consumer

Description

Return 0 on success, an errno on failure.

This helper function allows drivers to get several clkconsumers in one operation with management, the clks willautomatically be freed when the device is unbound.

intdevm_clk_bulk_get_optional(structdevice*dev,intnum_clks,structclk_bulk_data*clks)

managed get multiple optional consumer clocks

Parameters

structdevice*dev

device for clock “consumer”

intnum_clks

the number of clk_bulk_data

structclk_bulk_data*clks

pointer to the clk_bulk_data table of consumer

Description

Behaves the same asdevm_clk_bulk_get() except where there is no clockproducer. In this case, instead of returning -ENOENT, the function returnsNULL for given clk. It is assumed all clocks in clk_bulk_data are optional.

Returns 0 if all clocks specified in clk_bulk_data table are obtainedsuccessfully or for any clk there was no clk provider available, otherwisereturns validIS_ERR() condition containing errno.The implementation usesdev andclk_bulk_data.id to determine theclock consumer, and thereby the clock producer.The clock returned is stored in eachclk_bulk_data.clk field.

Drivers must assume that the clock source is not enabled.

clk_bulk_get should not be called from within interrupt context.

intdevm_clk_bulk_get_all(structdevice*dev,structclk_bulk_data**clks)

managed get multiple clk consumers

Parameters

structdevice*dev

device for clock “consumer”

structclk_bulk_data**clks

pointer to the clk_bulk_data table of consumer

Description

Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.

This helper function allows drivers to get several clkconsumers in one operation with management, the clks willautomatically be freed when the device is unbound.

intdevm_clk_bulk_get_all_enabled(structdevice*dev,structclk_bulk_data**clks)

Get and enable all clocks of the consumer (managed)

Parameters

structdevice*dev

device for clock “consumer”

structclk_bulk_data**clks

pointer to the clk_bulk_data table of consumer

Description

Returns a positive value for the number of clocks obtained while theclock references are stored in the clk_bulk_data table inclks field.Returns 0 if there’re none and a negative value if something failed.

This helper function allows drivers to get all clocks of theconsumer and enables them in one operation with management.The clks will automatically be disabled and freed when the deviceis unbound.

structclk*devm_clk_get(structdevice*dev,constchar*id)

lookup and obtain a managed reference to a clock producer.

Parameters

structdevice*dev

device for clock “consumer”

constchar*id

clock consumer ID

Context

May sleep.

Return

a struct clk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)

Description

Drivers must assume that the clock source is neither prepared norenabled.

The clock will automatically be freed when the device is unboundfrom the bus.

structclk*devm_clk_get_prepared(structdevice*dev,constchar*id)

devm_clk_get() +clk_prepare()

Parameters

structdevice*dev

device for clock “consumer”

constchar*id

clock consumer ID

Context

May sleep.

Return

a struct clk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)

Description

The returned clk (if valid) is prepared. Drivers must however assumethat the clock is not enabled.

The clock will automatically be unprepared and freed when the deviceis unbound from the bus.

structclk*devm_clk_get_enabled(structdevice*dev,constchar*id)

devm_clk_get() + clk_prepare_enable()

Parameters

structdevice*dev

device for clock “consumer”

constchar*id

clock consumer ID

Context

May sleep.

Return

a struct clk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. (IOW,id may be identical strings, butclk_get may return different clock producers depending ondev.)

Description

The returned clk (if valid) is prepared and enabled.

The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.

structclk*devm_clk_get_optional(structdevice*dev,constchar*id)

lookup and obtain a managed reference to an optional clock producer.

Parameters

structdevice*dev

device for clock “consumer”

constchar*id

clock consumer ID

Context

May sleep.

Return

a struct clk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get().

Description

Drivers must assume that the clock source is neither prepared norenabled.

The clock will automatically be freed when the device is unboundfrom the bus.

structclk*devm_clk_get_optional_prepared(structdevice*dev,constchar*id)

devm_clk_get_optional() +clk_prepare()

Parameters

structdevice*dev

device for clock “consumer”

constchar*id

clock consumer ID

Context

May sleep.

Return

a struct clk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_prepared().

Description

The returned clk (if valid) is prepared. Drivers must howeverassume that the clock is not enabled.

The clock will automatically be unprepared and freed when thedevice is unbound from the bus.

structclk*devm_clk_get_optional_enabled(structdevice*dev,constchar*id)

devm_clk_get_optional() + clk_prepare_enable()

Parameters

structdevice*dev

device for clock “consumer”

constchar*id

clock consumer ID

Context

May sleep.

Return

a struct clk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_enabled().

Description

The returned clk (if valid) is prepared and enabled.

The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.

structclk*devm_clk_get_optional_enabled_with_rate(structdevice*dev,constchar*id,unsignedlongrate)

devm_clk_get_optional() +clk_set_rate() + clk_prepare_enable()

Parameters

structdevice*dev

device for clock “consumer”

constchar*id

clock consumer ID

unsignedlongrate

new clock rate

Context

May sleep.

Return

a struct clk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev andid to determine the clock consumer, and therebythe clock producer. If no such clk is found, it returns NULLwhich serves as a dummy clk. That’s the only difference comparedtodevm_clk_get_enabled().

Description

The returned clk (if valid) is prepared and enabled and rate was set.

The clock will automatically be disabled, unprepared and freedwhen the device is unbound from the bus.

structclk*devm_get_clk_from_child(structdevice*dev,structdevice_node*np,constchar*con_id)

lookup and obtain a managed reference to a clock producer from child node.

Parameters

structdevice*dev

device for clock “consumer”

structdevice_node*np

pointer to clock consumer node

constchar*con_id

clock consumer ID

Description

This function parses the clocks, and uses them to look up thestruct clk from the registered list of clock providers by usingnp andcon_id

The clock will automatically be freed when the device is unboundfrom the bus.

intclk_enable(structclk*clk)

inform the system when the clock source should be running.

Parameters

structclk*clk

clock source

Description

If the clock can not be enabled/disabled, this should return success.

May be called from atomic contexts.

Returns success (0) or negative errno.

intclk_bulk_enable(intnum_clks,conststructclk_bulk_data*clks)

inform the system when the set of clks should be running.

Parameters

intnum_clks

the number of clk_bulk_data

conststructclk_bulk_data*clks

the clk_bulk_data table of consumer

Description

May be called from atomic contexts.

Returns success (0) or negative errno.

voidclk_disable(structclk*clk)

inform the system when the clock source is no longer required.

Parameters

structclk*clk

clock source

Description

Inform the system that a clock source is no longer required bya driver and may be shut down.

May be called from atomic contexts.

Implementation detail: if the clock source is shared betweenmultiple drivers,clk_enable() calls must be balanced by thesame number ofclk_disable() calls for the clock source to bedisabled.

voidclk_bulk_disable(intnum_clks,conststructclk_bulk_data*clks)

inform the system when the set of clks is no longer required.

Parameters

intnum_clks

the number of clk_bulk_data

conststructclk_bulk_data*clks

the clk_bulk_data table of consumer

Description

Inform the system that a set of clks is no longer required bya driver and may be shut down.

May be called from atomic contexts.

Implementation detail: if the set of clks is shared betweenmultiple drivers,clk_bulk_enable() calls must be balanced by thesame number ofclk_bulk_disable() calls for the clock source to bedisabled.

unsignedlongclk_get_rate(structclk*clk)

obtain the current clock rate (in Hz) for a clock source. This is only valid once the clock source has been enabled.

Parameters

structclk*clk

clock source

voidclk_put(structclk*clk)

“free” the clock source

Parameters

structclk*clk

clock source

Note

drivers must ensure that all clk_enable calls made on thisclock source are balanced by clk_disable calls prior to callingthis function.

Description

clk_put should not be called from within interrupt context.

voidclk_bulk_put(intnum_clks,structclk_bulk_data*clks)

“free” the clock source

Parameters

intnum_clks

the number of clk_bulk_data

structclk_bulk_data*clks

the clk_bulk_data table of consumer

Note

drivers must ensure that all clk_bulk_enable calls made on thisclock source are balanced by clk_bulk_disable calls prior to callingthis function.

Description

clk_bulk_put should not be called from within interrupt context.

voidclk_bulk_put_all(intnum_clks,structclk_bulk_data*clks)

“free” all the clock source

Parameters

intnum_clks

the number of clk_bulk_data

structclk_bulk_data*clks

the clk_bulk_data table of consumer

Note

drivers must ensure that all clk_bulk_enable calls made on thisclock source are balanced by clk_bulk_disable calls prior to callingthis function.

Description

clk_bulk_put_all should not be called from within interrupt context.

voiddevm_clk_put(structdevice*dev,structclk*clk)

“free” a managed clock source

Parameters

structdevice*dev

device used to acquire the clock

structclk*clk

clock source acquired withdevm_clk_get()

Note

drivers must ensure that all clk_enable calls made on thisclock source are balanced by clk_disable calls prior to callingthis function.

Description

clk_put should not be called from within interrupt context.

longclk_round_rate(structclk*clk,unsignedlongrate)

adjust a rate to the exact rate a clock can provide

Parameters

structclk*clk

clock source

unsignedlongrate

desired clock rate in Hz

Description

This answers the question “if I were to passrate toclk_set_rate(),what clock rate would I end up with?” without changing the hardwarein any way. In other words:

rate = clk_round_rate(clk, r);

and:

clk_set_rate(clk, r);rate = clk_get_rate(clk);

are equivalent except the former does not modify the clock hardwarein any way.

Returns rounded clock rate in Hz, or negative errno.

intclk_set_rate(structclk*clk,unsignedlongrate)

set the clock rate for a clock source

Parameters

structclk*clk

clock source

unsignedlongrate

desired clock rate in Hz

Description

Updating the rate starts at the top-most affected clock and thenwalks the tree down to the bottom-most clock that needs updating.

Returns success (0) or negative errno.

intclk_set_rate_exclusive(structclk*clk,unsignedlongrate)

set the clock rate and claim exclusivity over clock source

Parameters

structclk*clk

clock source

unsignedlongrate

desired clock rate in Hz

Description

This helper function allows drivers to atomically set the rate of a producerand claim exclusivity over the rate control of the producer.

It is essentially a combination ofclk_set_rate() andclk_rate_exclusite_get(). Caller must balance this call with a call toclk_rate_exclusive_put()

Returns success (0) or negative errno.

boolclk_has_parent(conststructclk*clk,conststructclk*parent)

check if a clock is a possible parent for another

Parameters

conststructclk*clk

clock source

conststructclk*parent

parent clock source

Description

This function can be used in drivers that need to check that a clock can bethe parent of another without actually changing the parent.

Returns true ifparent is a possible parent forclk, false otherwise.

intclk_set_rate_range(structclk*clk,unsignedlongmin,unsignedlongmax)

set a rate range for a clock source

Parameters

structclk*clk

clock source

unsignedlongmin

desired minimum clock rate in Hz, inclusive

unsignedlongmax

desired maximum clock rate in Hz, inclusive

Description

Returns success (0) or negative errno.

intclk_set_min_rate(structclk*clk,unsignedlongrate)

set a minimum clock rate for a clock source

Parameters

structclk*clk

clock source

unsignedlongrate

desired minimum clock rate in Hz, inclusive

Description

Returns success (0) or negative errno.

intclk_set_max_rate(structclk*clk,unsignedlongrate)

set a maximum clock rate for a clock source

Parameters

structclk*clk

clock source

unsignedlongrate

desired maximum clock rate in Hz, inclusive

Description

Returns success (0) or negative errno.

intclk_set_parent(structclk*clk,structclk*parent)

set the parent clock source for this clock

Parameters

structclk*clk

clock source

structclk*parent

parent clock source

Description

Returns success (0) or negative errno.

structclk*clk_get_parent(structclk*clk)

get the parent clock source for this clock

Parameters

structclk*clk

clock source

Description

Returns struct clk corresponding to parent clock source, orvalidIS_ERR() condition containing errno.

structclk*clk_get_sys(constchar*dev_id,constchar*con_id)

get a clock based upon the device name

Parameters

constchar*dev_id

device name

constchar*con_id

connection ID

Description

Returns a struct clk corresponding to the clock producer, orvalidIS_ERR() condition containing errno. The implementationusesdev_id andcon_id to determine the clock consumer, andthereby the clock producer. In contrast toclk_get() this functiontakes the device name instead of the device itself for identification.

Drivers must assume that the clock source is not enabled.

clk_get_sys should not be called from within interrupt context.

intclk_save_context(void)

save clock context for poweroff

Parameters

void

no arguments

Description

Saves the context of the clock register for powerstates in which thecontents of the registers will be lost. Occurs deep within the suspendcode so locking is not necessary.

voidclk_restore_context(void)

restore clock context after poweroff

Parameters

void

no arguments

Description

This occurs with all clocks enabled. Occurs deep within the resume codeso locking is not necessary.

intclk_drop_range(structclk*clk)

Reset any range set on that clock

Parameters

structclk*clk

clock source

Description

Returns success (0) or negative errno.

structclk*clk_get_optional(structdevice*dev,constchar*id)

lookup and obtain a reference to an optional clock producer.

Parameters

structdevice*dev

device for clock “consumer”

constchar*id

clock consumer ID

Description

Behaves the same asclk_get() except where there is no clock producer. Inthis case, instead of returning -ENOENT, the function returns NULL.

Synchronization Primitives

Read-Copy Update (RCU)

boolsame_state_synchronize_rcu(unsignedlongoldstate1,unsignedlongoldstate2)

Are two old-state values identical?

Parameters

unsignedlongoldstate1

First old-state value.

unsignedlongoldstate2

Second old-state value.

Description

The two old-state values must have been obtained from eitherget_state_synchronize_rcu(),start_poll_synchronize_rcu(), orget_completed_synchronize_rcu(). Returnstrue if the two values areidentical andfalse otherwise. This allows structures whose lifetimesare tracked by old-state values to push these values to a list header,allowing those structures to be slightly smaller.

boolrcu_trace_implies_rcu_gp(void)

does an RCU Tasks Trace grace period imply an RCU grace period?

Parameters

void

no arguments

Description

As an accident of implementation, an RCU Tasks Trace grace period alsoacts as an RCU grace period. However, this could change at any time.Code relying on this accident must call this function to verify thatthis accident is still happening.

You have been warned!

cond_resched_tasks_rcu_qs

cond_resched_tasks_rcu_qs()

Report potential quiescent states to RCU

Description

This macro resembles cond_resched(), except that it is defined toreport potential quiescent states to RCU-tasks even if the cond_resched()machinery were to be shut off, as some advocate for PREEMPTION kernels.

rcu_softirq_qs_periodic

rcu_softirq_qs_periodic(old_ts)

Report RCU and RCU-Tasks quiescent states

Parameters

old_ts

jiffies at start of processing.

Description

This helper is for long-running softirq handlers, such as NAPI threads innetworking. The caller should initialize the variable passed in asold_tsat the beginning of the softirq handler. When invoked frequently, this macrowill invokercu_softirq_qs() every 100 milliseconds thereafter, which willprovide both RCU and RCU-Tasks quiescent states. Note that this macromodifies its old_ts argument.

Because regions of code that have disabled softirq act as RCU read-sidecritical sections, this macro should be invoked with softirq (andpreemption) enabled.

The macro is not needed when CONFIG_PREEMPT_RT is defined. RT kernels wouldhave more chance to invoke schedule() calls and provide necessary quiescentstates. As a contrast, calling cond_resched() only won’t achieve the sameeffect because cond_resched() does not provide RCU-Tasks quiescent states.

RCU_LOCKDEP_WARN

RCU_LOCKDEP_WARN(c,s)

emit lockdep splat if specified condition is met

Parameters

c

condition to check

s

informative message

Description

This checks debug_lockdep_rcu_enabled() before checking (c) toprevent early boot splats due to lockdep not yet being initialized,and rechecks it after checking (c) to prevent false-positive splatsdue to races with lockdep being disabled. Seecommit 3066820034b5dd(“rcu: RejectRCU_LOCKDEP_WARN() false positives”) for more detail.

lockdep_assert_in_rcu_read_lock

lockdep_assert_in_rcu_read_lock()

WARN if not protected byrcu_read_lock()

Description

Splats if lockdep is enabled and there is norcu_read_lock() in effect.

lockdep_assert_in_rcu_read_lock_bh

lockdep_assert_in_rcu_read_lock_bh()

WARN if not protected byrcu_read_lock_bh()

Description

Splats if lockdep is enabled and there is norcu_read_lock_bh() in effect.Note that local_bh_disable() and friends do not suffice here, instead anactualrcu_read_lock_bh() is required.

lockdep_assert_in_rcu_read_lock_sched

lockdep_assert_in_rcu_read_lock_sched()

WARN if not protected byrcu_read_lock_sched()

Description

Splats if lockdep is enabled and there is norcu_read_lock_sched()in effect. Note that preempt_disable() and friends do not suffice here,instead an actualrcu_read_lock_sched() is required.

lockdep_assert_in_rcu_reader

lockdep_assert_in_rcu_reader()

WARN if not within some type of RCU reader

Description

Splats if lockdep is enabled and there is no RCU reader of anytype in effect. Note that regions of code protected by things likepreempt_disable, local_bh_disable(), and local_irq_disable() all qualifyas RCU readers.

Note that this will never trigger in PREEMPT_NONE or PREEMPT_VOLUNTARYkernels that are not also built with PREEMPT_COUNT. But if you havelockdep enabled, you might as well also enable PREEMPT_COUNT.

unrcu_pointer

unrcu_pointer(p)

mark a pointer as not being RCU protected

Parameters

p

pointer needing to lose its __rcu property

Description

Convertsp from an __rcu pointer to a __kernel pointer.This allows an __rcu pointer to be used with xchg() and friends.

RCU_INITIALIZER

RCU_INITIALIZER(v)

statically initialize an RCU-protected global variable

Parameters

v

The value to statically initialize with.

rcu_assign_pointer

rcu_assign_pointer(p,v)

assign to RCU-protected pointer

Parameters

p

pointer to assign to

v

value to assign (publish)

Description

Assigns the specified value to the specified RCU-protectedpointer, ensuring that any concurrent RCU readers will seeany prior initialization.

Inserts memory barriers on architectures that require them(which is most of them), and also prevents the compiler fromreordering the code that initializes the structure after the pointerassignment. More importantly, this call documents which pointerswill be dereferenced by RCU read-side code.

In some special cases, you may useRCU_INIT_POINTER() insteadofrcu_assign_pointer().RCU_INIT_POINTER() is a bit faster dueto the fact that it does not constrain either the CPU or the compiler.That said, usingRCU_INIT_POINTER() when you should have usedrcu_assign_pointer() is a very bad thing that results inimpossible-to-diagnose memory corruption. So please be careful.See theRCU_INIT_POINTER() comment header for details.

Note thatrcu_assign_pointer() evaluates each of its arguments onlyonce, appearances notwithstanding. One of the “extra” evaluationsis in typeof() and the other visible only to sparse (__CHECKER__),neither of which actually execute the argument. As with most cppmacros, this execute-arguments-only-once property is important, soplease be careful when making changes torcu_assign_pointer() and theother macros that it invokes.

rcu_replace_pointer

rcu_replace_pointer(rcu_ptr,ptr,c)

replace an RCU pointer, returning its old value

Parameters

rcu_ptr

RCU pointer, whose old value is returned

ptr

regular pointer

c

the lockdep conditions under which the dereference will take place

Description

Perform a replacement, wherercu_ptr is an RCU-annotatedpointer andc is the lockdep argument that is passed to thercu_dereference_protected() call used to read that pointer. The oldvalue ofrcu_ptr is returned, andrcu_ptr is set toptr.

rcu_access_pointer

rcu_access_pointer(p)

fetch RCU pointer with no dereferencing

Parameters

p

The pointer to read

Description

Return the value of the specified RCU-protected pointer, but omit thelockdep checks for being in an RCU read-side critical section. This isuseful when the value of this pointer is accessed, but the pointer isnot dereferenced, for example, when testing an RCU-protected pointeragainst NULL. Althoughrcu_access_pointer() may also be used in caseswhere update-side locks prevent the value of the pointer from changing,you should instead usercu_dereference_protected() for this use case.Within an RCU read-side critical section, there is little reason tousercu_access_pointer().

It is usually best to test thercu_access_pointer() return valuedirectly in order to avoid accidental dereferences being introducedby later inattentive changes. In other words, assigning thercu_access_pointer() return value to a local variable results in anaccident waiting to happen.

It is also permissible to usercu_access_pointer() when read-sideaccess to the pointer was removed at least one grace period ago, as isthe case in the context of the RCU callback that is freeing up the data,or after asynchronize_rcu() returns. This can be useful when tearingdown multi-linked structures after a grace period has elapsed. However,rcu_dereference_protected() is normally preferred for this use case.

rcu_dereference_check

rcu_dereference_check(p,c)

rcu_dereference with debug checking

Parameters

p

The pointer to read, prior to dereferencing

c

The conditions under which the dereference will take place

Description

Do anrcu_dereference(), but check that the conditions under which thedereference will take place are correct. Typically the conditionsindicate the various locking conditions that should be held at thatpoint. The check should return true if the conditions are satisfied.An implicit check for being in an RCU read-side critical section(rcu_read_lock()) is included.

For example:

bar = rcu_dereference_check(foo->bar, lockdep_is_held(foo->lock));

could be used to indicate to lockdep that foo->bar may only be dereferencedif eitherrcu_read_lock() is held, or that the lock required to replacethe bar struct at foo->bar is held.

Note that the list of conditions may also include indications of when a lockneed not be held, for example during initialisation or destruction of thetarget struct:

bar = rcu_dereference_check(foo->bar, lockdep_is_held(foo->lock) ||

atomic_read(foo->usage) == 0);

Inserts memory barriers on architectures that require them(currently only the Alpha), prevents the compiler from refetching(and from merging fetches), and, more importantly, documents exactlywhich pointers are protected by RCU and checks that the pointer isannotated as __rcu.

rcu_dereference_bh_check

rcu_dereference_bh_check(p,c)

rcu_dereference_bh with debug checking

Parameters

p

The pointer to read, prior to dereferencing

c

The conditions under which the dereference will take place

Description

This is the RCU-bh counterpart torcu_dereference_check(). However,please note that starting in v5.0 kernels, vanilla RCU grace periodswait for local_bh_disable() regions of code in addition to regions ofcode demarked byrcu_read_lock() andrcu_read_unlock(). This meansthatsynchronize_rcu(), call_rcu, and friends all take not onlyrcu_read_lock() but alsorcu_read_lock_bh() into account.

rcu_dereference_sched_check

rcu_dereference_sched_check(p,c)

rcu_dereference_sched with debug checking

Parameters

p

The pointer to read, prior to dereferencing

c

The conditions under which the dereference will take place

Description

This is the RCU-sched counterpart torcu_dereference_check().However, please note that starting in v5.0 kernels, vanilla RCU graceperiods wait for preempt_disable() regions of code in addition toregions of code demarked byrcu_read_lock() andrcu_read_unlock().This means thatsynchronize_rcu(), call_rcu, and friends all take notonlyrcu_read_lock() but alsorcu_read_lock_sched() into account.

rcu_dereference_protected

rcu_dereference_protected(p,c)

fetch RCU pointer when updates prevented

Parameters

p

The pointer to read, prior to dereferencing

c

The conditions under which the dereference will take place

Description

Return the value of the specified RCU-protected pointer, but omitthe READ_ONCE(). This is useful in cases where update-side locksprevent the value of the pointer from changing. Please note that thisprimitive doesnot prevent the compiler from repeating this referenceor combining it with other references, so it should not be used withoutprotection of appropriate locks.

This function is only for update-side use. Using this functionwhen protected only byrcu_read_lock() will result in infrequentbut very ugly failures.

rcu_dereference

rcu_dereference(p)

fetch RCU-protected pointer for dereferencing

Parameters

p

The pointer to read, prior to dereferencing

Description

This is a simple wrapper aroundrcu_dereference_check().

rcu_dereference_bh

rcu_dereference_bh(p)

fetch an RCU-bh-protected pointer for dereferencing

Parameters

p

The pointer to read, prior to dereferencing

Description

Makesrcu_dereference_check() do the dirty work.

rcu_dereference_sched

rcu_dereference_sched(p)

fetch RCU-sched-protected pointer for dereferencing

Parameters

p

The pointer to read, prior to dereferencing

Description

Makesrcu_dereference_check() do the dirty work.

rcu_pointer_handoff

rcu_pointer_handoff(p)

Hand off a pointer from RCU to other mechanism

Parameters

p

The pointer to hand off

Description

This is simply an identity function, but it documents where a pointeris handed off from RCU to some other synchronization mechanism, forexample, reference counting or locking. In C11, it would map tokill_dependency(). It could be used as follows:

rcu_read_lock();p = rcu_dereference(gp);long_lived = is_long_lived(p);if (long_lived) {        if (!atomic_inc_not_zero(p->refcnt))                long_lived = false;        else                p = rcu_pointer_handoff(p);}rcu_read_unlock();
voidrcu_read_lock(void)

mark the beginning of an RCU read-side critical section

Parameters

void

no arguments

Description

Whensynchronize_rcu() is invoked on one CPU while other CPUsare within RCU read-side critical sections, then thesynchronize_rcu() is guaranteed to block until after all the otherCPUs exit their critical sections. Similarly, ifcall_rcu() is invokedon one CPU while other CPUs are within RCU read-side criticalsections, invocation of the corresponding RCU callback is deferreduntil after the all the other CPUs exit their critical sections.

Bothsynchronize_rcu() andcall_rcu() also wait for regions of codewith preemption disabled, including regions of code with interrupts orsoftirqs disabled.

Note, however, that RCU callbacks are permitted to run concurrentlywith new RCU read-side critical sections. One way that this can happenis via the following sequence of events: (1) CPU 0 enters an RCUread-side critical section, (2) CPU 1 invokescall_rcu() to registeran RCU callback, (3) CPU 0 exits the RCU read-side critical section,(4) CPU 2 enters a RCU read-side critical section, (5) the RCUcallback is invoked. This is legal, because the RCU read-side criticalsection that was running concurrently with thecall_rcu() (and whichtherefore might be referencing something that the corresponding RCUcallback would free up) has completed before the correspondingRCU callback is invoked.

RCU read-side critical sections may be nested. Any deferred actionswill be deferred until the outermost RCU read-side critical sectioncompletes.

You can avoid reading and understanding the next paragraph byfollowing this rule: don’t put anything in anrcu_read_lock() RCUread-side critical section that would block in a !PREEMPTION kernel.But if you want the full story, read on!

In non-preemptible RCU implementations (pure TREE_RCU and TINY_RCU),it is illegal to block while in an RCU read-side critical section.In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTIONkernel builds, RCU read-side critical sections may be preempted,but explicit blocking is illegal. Finally, in preemptible RCUimplementations in real-time (with -rt patchset) kernel builds, RCUread-side critical sections may be preempted and they may also block, butonly when acquiring spinlocks that are subject to priority inheritance.

voidrcu_read_unlock(void)

marks the end of an RCU read-side critical section.

Parameters

void

no arguments

Description

In almost all situations,rcu_read_unlock() is immune from deadlock.This deadlock immunity also extends to the scheduler’s runqueueand priority-inheritance spinlocks, courtesy of the quiescent-statedeferral that is carried out whenrcu_read_unlock() is invoked withinterrupts disabled.

Seercu_read_lock() for more information.

voidrcu_read_lock_bh(void)

mark the beginning of an RCU-bh critical section

Parameters

void

no arguments

Description

This is equivalent torcu_read_lock(), but also disables softirqs.Note that anything else that disables softirqs can also serve as an RCUread-side critical section. However, please note that this equivalenceapplies only to v5.0 and later. Before v5.0,rcu_read_lock() andrcu_read_lock_bh() were unrelated.

Note thatrcu_read_lock_bh() and the matchingrcu_read_unlock_bh()must occur in the same context, for example, it is illegal to invokercu_read_unlock_bh() from one task if the matchingrcu_read_lock_bh()was invoked from some other task.

voidrcu_read_unlock_bh(void)

marks the end of a softirq-only RCU critical section

Parameters

void

no arguments

Description

Seercu_read_lock_bh() for more information.

voidrcu_read_lock_sched(void)

mark the beginning of a RCU-sched critical section

Parameters

void

no arguments

Description

This is equivalent torcu_read_lock(), but also disables preemption.Read-side critical sections can also be introduced by anything else thatdisables preemption, including local_irq_disable() and friends. However,please note that the equivalence torcu_read_lock() applies only tov5.0 and later. Before v5.0,rcu_read_lock() andrcu_read_lock_sched()were unrelated.

Note thatrcu_read_lock_sched() and the matchingrcu_read_unlock_sched()must occur in the same context, for example, it is illegal to invokercu_read_unlock_sched() from process context if the matchingrcu_read_lock_sched() was invoked from an NMI handler.

voidrcu_read_unlock_sched(void)

marks the end of a RCU-classic critical section

Parameters

void

no arguments

Description

Seercu_read_lock_sched() for more information.

RCU_INIT_POINTER

RCU_INIT_POINTER(p,v)

initialize an RCU protected pointer

Parameters

p

The pointer to be initialized.

v

The value to initialized the pointer to.

Description

Initialize an RCU-protected pointer in special cases where readersdo not need ordering constraints on the CPU or the compiler. Thesespecial cases are:

  1. This use ofRCU_INIT_POINTER() is NULLing out the pointeror

  2. The caller has taken whatever steps are required to preventRCU readers from concurrently accessing this pointeror

  3. The referenced data structure has already been exposed toreaders either at compile time or viarcu_assign_pointer()and

    1. You have not madeany reader-visible changes tothis structure since thenor

    2. It is OK for readers accessing this structure from itsnew location to see the old state of the structure. (Forexample, the changes were to statistical counters or toother state where exact synchronization is not required.)

Failure to follow these rules governing use ofRCU_INIT_POINTER() willresult in impossible-to-diagnose memory corruption. As in the structureswill look OK in crash dumps, but any concurrent RCU readers mightsee pre-initialized values of the referenced data structure. Soplease be very careful how you useRCU_INIT_POINTER()!!!

If you are creating an RCU-protected linked structure that is accessedby a single external-to-structure RCU-protected pointer, then you mayuseRCU_INIT_POINTER() to initialize the internal RCU-protectedpointers, but you must usercu_assign_pointer() to initialize theexternal-to-structure pointerafter you have completely initializedthe reader-accessible portions of the linked structure.

Note that unlikercu_assign_pointer(),RCU_INIT_POINTER() provides noordering guarantees for either the CPU or the compiler.

RCU_POINTER_INITIALIZER

RCU_POINTER_INITIALIZER(p,v)

statically initialize an RCU protected pointer

Parameters

p

The pointer to be initialized.

v

The value to initialized the pointer to.

Description

GCC-style initialization for an RCU-protected pointer in a structure field.

kfree_rcu

kfree_rcu(ptr,rhf)

kfree an object after a grace period.

Parameters

ptr

pointer to kfree for double-argument invocations.

rhf

the name of the struct rcu_head within the type ofptr.

Description

Many rcu callbacks functions just callkfree() on the base structure.These functions are trivial, but their size adds up, and furthermorewhen they are used in a kernel module, that module must invoke thehigh-latencyrcu_barrier() function at module-unload time.

Thekfree_rcu() function handles this issue. In order to have a universalcallback function handling different offsets of rcu_head, the callback needsto determine the starting address of the freed object, which can be a largekmalloc or vmalloc allocation. To allow simply aligning the pointer down topage boundary for those, only offsets up to 4095 bytes can be accommodated.If the offset is larger than 4095 bytes, a compile-time error willbe generated in kvfree_rcu_arg_2(). If this error is triggered, you caneither fall back to use ofcall_rcu() or rearrange the structure toposition the rcu_head structure into the first 4096 bytes.

The object to be freed can be allocated either bykmalloc() orkmem_cache_alloc().

Note that the allowable offset might decrease in the future.

The BUILD_BUG_ON check must not involve any function calls, hence thechecks are done in macros here.

kfree_rcu_mightsleep

kfree_rcu_mightsleep(ptr)

kfree an object after a grace period.

Parameters

ptr

pointer to kfree for single-argument invocations.

Description

When it comes to head-less variant, only one argumentis passed and that is just a pointer which has to befreed after a grace period. Therefore the semantic is

kfree_rcu_mightsleep(ptr);

whereptr is the pointer to be freed bykvfree().

Please note, head-less way of freeing is permitted touse from a context that has to followmight_sleep()annotation. Otherwise, please switch and embed thercu_head structure within the type ofptr.

voidrcu_head_init(structrcu_head*rhp)

Initialize rcu_head forrcu_head_after_call_rcu()

Parameters

structrcu_head*rhp

The rcu_head structure to initialize.

Description

If you intend to invokercu_head_after_call_rcu() to test whether agiven rcu_head structure has already been passed tocall_rcu(), thenyou must also invoke thisrcu_head_init() function on it just afterallocating that structure. Calls to this function must not race withcalls tocall_rcu(),rcu_head_after_call_rcu(), or callback invocation.

boolrcu_head_after_call_rcu(structrcu_head*rhp,rcu_callback_tf)

Has this rcu_head been passed tocall_rcu()?

Parameters

structrcu_head*rhp

The rcu_head structure to test.

rcu_callback_tf

The function passed tocall_rcu() along withrhp.

Description

Returnstrue if therhp has been passed tocall_rcu() withfunc,andfalse otherwise. Emits a warning in any other case, includingthe case whererhp has already been invoked after a grace period.Calls to this function must not race with callback invocation. One wayto avoid such races is to enclose the call torcu_head_after_call_rcu()in an RCU read-side critical section that includes a read-side fetchof the pointer to the structure containingrhp.

voidrcu_softirq_qs(void)

Provide a set of RCU quiescent states in softirq processing

Parameters

void

no arguments

Description

Mark a quiescent state for RCU, Tasks RCU, and Tasks Trace RCU.This is a special-purpose function to be used in the softirqinfrastructure and perhaps the occasional long-running softirqhandler.

Note that from RCU’s viewpoint, a call torcu_softirq_qs() isequivalent to momentarily completely enabling preemption. Forexample, given this code:

local_bh_disable();do_something();rcu_softirq_qs();  // Ado_something_else();local_bh_enable();  // B

A call tosynchronize_rcu() that began concurrently with thecall to do_something() would be guaranteed to wait only untilexecution reached statement A. Without thatrcu_softirq_qs(),that samesynchronize_rcu() would instead be guaranteed to waituntil execution reached statement B.

boolrcu_watching_snap_stopped_since(structrcu_data*rdp,intsnap)

Has RCU stopped watching a given CPU since the specifiedsnap?

Parameters

structrcu_data*rdp

The rcu_data corresponding to the CPU for which to check EQS.

intsnap

rcu_watching snapshot taken when the CPU wasn’t in an EQS.

Description

Returns true if the CPU corresponding tordp has spent some time in anextended quiescent state sincesnap. Note that this doesn’t check if it/still/ is in an EQS, just that it went through one sincesnap.

This is meant to be used in a loop waiting for a CPU to go through an EQS.

intrcu_is_cpu_rrupt_from_idle(void)

see if ‘interrupted’ from idle

Parameters

void

no arguments

Description

If the current CPU is idle and running at a first-level (not nested)interrupt, or directly, from idle, return true.

The caller must have at least disabled IRQs.

voidrcu_irq_exit_check_preempt(void)

Validate that scheduling is possible

Parameters

void

no arguments

void__rcu_irq_enter_check_tick(void)

Enable scheduler tick on CPU if RCU needs it.

Parameters

void

no arguments

Description

The scheduler tick is not normally enabled when CPUs enter the kernelfrom nohz_full userspace execution. After all, nohz_full userspaceexecution is an RCU quiescent state and the time executing in the kernelis quite short. Except of course when it isn’t. And it is not hard tocause a large system to spend tens of seconds or even minutes loopingin the kernel, which can cause a number of problems, include RCU CPUstall warnings.

Therefore, if a nohz_full CPU fails to report a quiescent statein a timely manner, the RCU grace-period kthread sets that CPU’s->rcu_urgent_qs flag with the expectation that the next interrupt orexception will invoke this function, which will turn on the schedulertick, which will enable RCU to detect that CPU’s quiescent states,for example, due to cond_resched() calls in CONFIG_PREEMPT=n kernels.The tick will be disabled once a quiescent state is reported forthis CPU.

Of course, in carefully tuned systems, there might never be aninterrupt or exception. In that case, the RCU grace-period kthreadwill eventually cause one to happen. However, in less carefullycontrolled environments, this function allows RCU to get what itneeds without creating otherwise useless interruptions.

notraceboolrcu_is_watching(void)

RCU read-side critical sections permitted on current CPU?

Parameters

void

no arguments

Description

Returntrue if RCU is watching the running CPU andfalse otherwise.Antrue return means that this CPU can safely enter RCU read-sidecritical sections.

Although calls torcu_is_watching() from most parts of the kernelwill returntrue, there are important exceptions. For example, if thecurrent CPU is deep within its idle loop, in kernel entry/exit code,or offline,rcu_is_watching() will returnfalse.

Make notrace because it can be called by the internal functions offtrace, and making this notrace removes unnecessary recursion calls.

voidrcu_set_gpwrap_lag(unsignedlonglag_gps)

Set RCU GP sequence overflow lag value.

Parameters

unsignedlonglag_gps

Set overflow lag to this many grace period worth of counterswhich is used by rcutorture to quickly force a gpwrap situation.lag_gps = 0 means we reset it back to the boot-time value.

voidcall_rcu_hurry(structrcu_head*head,rcu_callback_tfunc)

Queue RCU callback for invocation after grace period, and flush all lazy callbacks (including the new one) to the main ->cblist while doing so.

Parameters

structrcu_head*head

structure to be used for queueing the RCU updates.

rcu_callback_tfunc

actual callback function to be invoked after the grace period

Description

The callback function will be invoked some time after a full graceperiod elapses, in other words after all pre-existing RCU read-sidecritical sections have completed.

Use this API instead ofcall_rcu() if you don’t want the callback to bedelayed for very long periods of time, which can happen on systems withoutmemory pressure and on systems which are lightly loaded or mostly idle.This function will cause callbacks to be invoked sooner than later at theexpense of extra power. Other than that, this function is identical to, andreusescall_rcu()’s logic. Refer tocall_rcu() for more details about memoryordering and other functionality.

voidcall_rcu(structrcu_head*head,rcu_callback_tfunc)

Queue an RCU callback for invocation after a grace period. By default the callbacks are ‘lazy’ and are kept hidden from the main ->cblist to prevent starting of grace periods too soon. If you desire grace periods to start very soon, usecall_rcu_hurry().

Parameters

structrcu_head*head

structure to be used for queueing the RCU updates.

rcu_callback_tfunc

actual callback function to be invoked after the grace period

Description

The callback function will be invoked some time after a full graceperiod elapses, in other words after all pre-existing RCU read-sidecritical sections have completed. However, the callback functionmight well execute concurrently with RCU read-side critical sectionsthat started aftercall_rcu() was invoked.

It is perfectly legal to repost an RCU callback, potentially witha different callback function, from within its callback function.The specified function will be invoked after another full grace periodhas elapsed. This use case is similar in form to the common practiceof reposting a timer from within its own handler.

RCU read-side critical sections are delimited byrcu_read_lock()andrcu_read_unlock(), and may be nested. In addition, but only inv5.0 and later, regions of code across which interrupts, preemption,or softirqs have been disabled also serve as RCU read-side criticalsections. This includes hardware interrupt handlers, softirq handlers,and NMI handlers.

Note that all CPUs must agree that the grace period extended beyondall pre-existing RCU read-side critical section. On systems with morethan one CPU, this means that when “func()” is invoked, each CPU isguaranteed to have executed a full memory barrier since the end of itslast RCU read-side critical section whose beginning preceded the calltocall_rcu(). It also means that each CPU executing an RCU read-sidecritical section that continues beyond the start of “func()” must haveexecuted a memory barrier after thecall_rcu() but before the beginningof that RCU read-side critical section. Note that these guaranteesinclude CPUs that are offline, idle, or executing in user mode, aswell as CPUs that are executing in the kernel.

Furthermore, if CPU A invokedcall_rcu() and CPU B invoked theresulting RCU callback function “func()”, then both CPU A and CPU B areguaranteed to execute a full memory barrier during the time intervalbetween the call tocall_rcu() and the invocation of “func()” -- evenif CPU A and CPU B are the same CPU (but again only if the system hasmore than one CPU).

Implementation of these memory-ordering guarantees is described here:A Tour Through TREE_RCU’s Grace-Period Memory Ordering.

Specific tocall_rcu() (as opposed to the other call_rcu*() functions),in kernels built with CONFIG_RCU_LAZY=y,call_rcu() might delay for manyseconds before starting the grace period needed by the correspondingcallback. This delay can significantly improve energy-efficiencyon low-utilization battery-powered devices. To avoid this delay,in latency-sensitive kernel code, usecall_rcu_hurry().

voidsynchronize_rcu(void)

wait until a grace period has elapsed.

Parameters

void

no arguments

Description

Control will return to the caller some time after a full graceperiod has elapsed, in other words after all currently executing RCUread-side critical sections have completed. Note, however, thatupon return fromsynchronize_rcu(), the caller might well be executingconcurrently with new RCU read-side critical sections that began whilesynchronize_rcu() was waiting.

RCU read-side critical sections are delimited byrcu_read_lock()andrcu_read_unlock(), and may be nested. In addition, but only inv5.0 and later, regions of code across which interrupts, preemption,or softirqs have been disabled also serve as RCU read-side criticalsections. This includes hardware interrupt handlers, softirq handlers,and NMI handlers.

Note that this guarantee implies further memory-ordering guarantees.On systems with more than one CPU, whensynchronize_rcu() returns,each CPU is guaranteed to have executed a full memory barrier sincethe end of its last RCU read-side critical section whose beginningpreceded the call tosynchronize_rcu(). In addition, each CPU havingan RCU read-side critical section that extends beyond the return fromsynchronize_rcu() is guaranteed to have executed a full memory barrierafter the beginning ofsynchronize_rcu() and before the beginning ofthat RCU read-side critical section. Note that these guarantees includeCPUs that are offline, idle, or executing in user mode, as well as CPUsthat are executing in the kernel.

Furthermore, if CPU A invokedsynchronize_rcu(), which returnedto its caller on CPU B, then both CPU A and CPU B are guaranteedto have executed a full memory barrier during the execution ofsynchronize_rcu() -- even if CPU A and CPU B are the same CPU (butagain only if the system has more than one CPU).

Implementation of these memory-ordering guarantees is described here:A Tour Through TREE_RCU’s Grace-Period Memory Ordering.

voidget_completed_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)

Return a full pre-completed polled state cookie

Parameters

structrcu_gp_oldstate*rgosp

Place to put state cookie

Description

Stores intorgosp a value that will always be treated by functionslikepoll_state_synchronize_rcu_full() as a cookie whose grace periodhas already completed.

unsignedlongget_state_synchronize_rcu(void)

Snapshot current RCU state

Parameters

void

no arguments

Description

Returns a cookie that is used by a later call tocond_synchronize_rcu()orpoll_state_synchronize_rcu() to determine whether or not a fullgrace period has elapsed in the meantime.

voidget_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)

Snapshot RCU state, both normal and expedited

Parameters

structrcu_gp_oldstate*rgosp

location to place combined normal/expedited grace-period state

Description

Places the normal and expedited grace-period states inrgosp. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()orpoll_state_synchronize_rcu_full() to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.The rcu_gp_oldstate structure takes up twice the memory of an unsignedlong, but is guaranteed to see all grace periods. In contrast, thecombined state occupies less memory, but can sometimes fail to takegrace periods into account.

This does not guarantee that the needed grace period will actuallystart.

unsignedlongstart_poll_synchronize_rcu(void)

Snapshot and start RCU grace period

Parameters

void

no arguments

Description

Returns a cookie that is used by a later call tocond_synchronize_rcu()orpoll_state_synchronize_rcu() to determine whether or not a fullgrace period has elapsed in the meantime. If the needed grace periodis not already slated to start, notifies RCU core of the need for thatgrace period.

voidstart_poll_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)

Take a full snapshot and start RCU grace period

Parameters

structrcu_gp_oldstate*rgosp

value fromget_state_synchronize_rcu_full() orstart_poll_synchronize_rcu_full()

Description

Places the normal and expedited grace-period states in*rgos. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()orpoll_state_synchronize_rcu_full() to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.If the needed grace period is not already slated to start, notifiesRCU core of the need for that grace period.

boolpoll_state_synchronize_rcu(unsignedlongoldstate)

Has the specified RCU grace period completed?

Parameters

unsignedlongoldstate

value fromget_state_synchronize_rcu() orstart_poll_synchronize_rcu()

Description

If a full RCU grace period has elapsed since the earlier call fromwhicholdstate was obtained, returntrue, otherwise returnfalse.Iffalse is returned, it is the caller’s responsibility to invoke thisfunction later on until it does returntrue. Alternatively, the callercan explicitly wait for a grace period, for example, by passingoldstateto eithercond_synchronize_rcu() orcond_synchronize_rcu_expedited()on the one hand or by directly invoking eithersynchronize_rcu() orsynchronize_rcu_expedited() on the other.

Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than a billion grace periods (and way more on a 64-bit system!).Those needing to keep old state values for very long time periods(many hours even on 32-bit systems) should check them occasionally andeither refresh them or set a flag indicating that the grace period hascompleted. Alternatively, they can useget_completed_synchronize_rcu()to get a guaranteed-completed grace-period state.

In addition, because oldstate compresses the grace-period state forboth normal and expedited grace periods into a single unsigned long,it can miss a grace period whensynchronize_rcu() runs concurrentlywithsynchronize_rcu_expedited(). If this is unacceptable, pleaseinstead use the _full() variant of these polling APIs.

This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedoldstate, and that returned at the endof this function.

boolpoll_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)

Has the specified RCU grace period completed?

Parameters

structrcu_gp_oldstate*rgosp

value fromget_state_synchronize_rcu_full() orstart_poll_synchronize_rcu_full()

Description

If a full RCU grace period has elapsed since the earlier call fromwhichrgosp was obtained, return **true*, otherwise returnfalse.Iffalse is returned, it is the caller’s responsibility to invoke thisfunction later on until it does returntrue. Alternatively, the callercan explicitly wait for a grace period, for example, by passingrgosptocond_synchronize_rcu() or by directly invokingsynchronize_rcu().

Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waitedfor more than a billion grace periods (and way more on a 64-bitsystem!). Those needing to keep rcu_gp_oldstate values for verylong time periods (many hours even on 32-bit systems) should checkthem occasionally and either refresh them or set a flag indicatingthat the grace period has completed. Alternatively, they can useget_completed_synchronize_rcu_full() to get a guaranteed-completedgrace-period state.

This function provides the same memory-ordering guarantees that wouldbe provided by asynchronize_rcu() that was invoked at the call tothe function that providedrgosp, and that returned at the end of thisfunction. And this guarantee requires that the root rcu_node structure’s->gp_seq field be checked instead of that of the rcu_state structure.The problem is that the just-ending grace-period’s callbacks can beinvoked between the time that the root rcu_node structure’s ->gp_seqfield is updated and the time that the rcu_state structure’s ->gp_seqfield is updated. Therefore, if a singlesynchronize_rcu() is tocause a subsequentpoll_state_synchronize_rcu_full() to returntrue,then the root rcu_node structure is the one that needs to be polled.

voidcond_synchronize_rcu(unsignedlongoldstate)

Conditionally wait for an RCU grace period

Parameters

unsignedlongoldstate

value fromget_state_synchronize_rcu(),start_poll_synchronize_rcu(), orstart_poll_synchronize_rcu_expedited()

Description

If a full RCU grace period has elapsed since the earlier call toget_state_synchronize_rcu() orstart_poll_synchronize_rcu(), just return.Otherwise, invokesynchronize_rcu() to wait for a full grace period.

Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.

This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedoldstate and that returned at the endof this function.

voidcond_synchronize_rcu_full(structrcu_gp_oldstate*rgosp)

Conditionally wait for an RCU grace period

Parameters

structrcu_gp_oldstate*rgosp

value fromget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(), orstart_poll_synchronize_rcu_expedited_full()

Description

If a full RCU grace period has elapsed since the call toget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(),orstart_poll_synchronize_rcu_expedited_full() from whichrgosp wasobtained, just return. Otherwise, invokesynchronize_rcu() to waitfor a full grace period.

Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.

This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedrgosp and that returned at the end ofthis function.

voidrcu_barrier(void)

Wait until all in-flightcall_rcu() callbacks complete.

Parameters

void

no arguments

Description

Note that this primitive does not necessarily wait for an RCU grace periodto complete. For example, if there are no RCU callbacks queued anywherein the system, thenrcu_barrier() is within its rights to returnimmediately, without waiting for anything, much less an RCU grace period.

voidrcu_barrier_throttled(void)

Dorcu_barrier(), but limit to one per second

Parameters

void

no arguments

Description

This can be thought of as guard rails aroundrcu_barrier() thatpermits unrestricted userspace use, at least assuming the hardware’stry_cmpxchg() is robust. There will be at most one call per second torcu_barrier() system-wide from use of this function, which means thatcallers might needlessly wait a second or three.

This is intended for use by test suites to avoid OOM by flushing RCUcallbacks from the previous test before starting the next. See thercutree.do_rcu_barrier module parameter for more information.

Why not simply makercu_barrier() more scalable? That might bethe eventual endpoint, but let’s keep it simple for the time being.Note that the module parameter infrastructure serializes calls to agiven .set() function, but should concurrent .set() invocation ever bepossible, we are ready!

voidsynchronize_rcu_expedited(void)

Brute-force RCU grace period

Parameters

void

no arguments

Description

Wait for an RCU grace period, but expedite it. The basic idea is toIPI all non-idle non-nohz online CPUs. The IPI handler checks whetherthe CPU is in an RCU critical section, and if so, it sets a flag thatcauses the outermostrcu_read_unlock() to report the quiescent statefor RCU-preempt or asks the scheduler for help for RCU-sched. On theother hand, if the CPU is not in an RCU read-side critical section,the IPI handler reports the quiescent state immediately.

Although this is a great improvement over previous expeditedimplementations, it is still unfriendly to real-time workloads, so isthus not recommended for any sort of common-case code. In fact, ifyou are usingsynchronize_rcu_expedited() in a loop, please restructureyour code to batch your updates, and then use a singlesynchronize_rcu()instead.

This has the same semantics as (but is more brutal than)synchronize_rcu().

unsignedlongstart_poll_synchronize_rcu_expedited(void)

Snapshot current RCU state and start expedited grace period

Parameters

void

no arguments

Description

Returns a cookie to pass to a call tocond_synchronize_rcu(),cond_synchronize_rcu_expedited(), orpoll_state_synchronize_rcu(),allowing them to determine whether or not any sort of grace period haselapsed in the meantime. If the needed expedited grace period is notalready slated to start, initiates that grace period.

voidstart_poll_synchronize_rcu_expedited_full(structrcu_gp_oldstate*rgosp)

Take a full snapshot and start expedited grace period

Parameters

structrcu_gp_oldstate*rgosp

Place to put snapshot of grace-period state

Description

Places the normal and expedited grace-period states in rgosp. Thisstate value can be passed to a later call tocond_synchronize_rcu_full()orpoll_state_synchronize_rcu_full() to determine whether or not agrace period (whether normal or expedited) has elapsed in the meantime.If the needed expedited grace period is not already slated to start,initiates that grace period.

voidcond_synchronize_rcu_expedited(unsignedlongoldstate)

Conditionally wait for an expedited RCU grace period

Parameters

unsignedlongoldstate

value fromget_state_synchronize_rcu(),start_poll_synchronize_rcu(), orstart_poll_synchronize_rcu_expedited()

Description

If any type of full RCU grace period has elapsed since the earliercall toget_state_synchronize_rcu(),start_poll_synchronize_rcu(),orstart_poll_synchronize_rcu_expedited(), just return. Otherwise,invokesynchronize_rcu_expedited() to wait for a full grace period.

Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.

This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedoldstate and that returned at the endof this function.

voidcond_synchronize_rcu_expedited_full(structrcu_gp_oldstate*rgosp)

Conditionally wait for an expedited RCU grace period

Parameters

structrcu_gp_oldstate*rgosp

value fromget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(), orstart_poll_synchronize_rcu_expedited_full()

Description

If a full RCU grace period has elapsed since the call toget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(),orstart_poll_synchronize_rcu_expedited_full() from whichrgosp wasobtained, just return. Otherwise, invokesynchronize_rcu_expedited()to wait for a full grace period.

Yes, this function does not take counter wrap into account.But counter wrap is harmless. If the counter wraps, we have waited formore than 2 billion grace periods (and way more on a 64-bit system!),so waiting for a couple of additional grace periods should be just fine.

This function provides the same memory-ordering guarantees thatwould be provided by asynchronize_rcu() that was invoked at the callto the function that providedrgosp and that returned at the end ofthis function.

boolrcu_read_lock_held_common(bool*ret)

might we be in RCU-sched read-side critical section?

Parameters

bool*ret

Best guess answer if lockdep cannot be relied on

Description

Returns true if lockdep must be ignored, in which case*ret containsthe best guess described below. Otherwise returns false, in whichcase*ret tells the caller nothing and the caller should insteadconsult lockdep.

If CONFIG_DEBUG_LOCK_ALLOC is selected, set*ret to nonzero iff in anRCU-sched read-side critical section. In absence ofCONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-sidecritical section unless it can prove otherwise. Note that disablingof preemption (including disabling irqs) counts as an RCU-schedread-side critical section. This is useful for debug checks in functionsthat required that they be called within an RCU-sched read-sidecritical section.

Check debug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.

Note that if the CPU is in the idle loop from an RCU point of view (ie:that we are in the section between ct_idle_enter() and ct_idle_exit())thenrcu_read_lock_held() sets*ret to false even if the CPU did anrcu_read_lock(). The reason for this is that RCU ignores CPUs that arein such a section, considering these as in extended quiescent state,so such a CPU is effectively never in an RCU read-side critical sectionregardless of what RCU primitives it invokes. This state of affairs isrequired --- we need to keep an RCU-free window in idle where the CPU maypossibly enter into low power mode. This way we can notice an extendedquiescent state to other CPUs that started a grace period. Otherwisewe would delay any grace period as long as we run in the idle task.

Similarly, we avoid claiming an RCU read lock held if the currentCPU is offline.

voidrcu_async_hurry(void)

Make future async RCU callbacks not lazy.

Parameters

void

no arguments

Description

After a call to this function, future calls tocall_rcu()will be processed in a timely fashion.

voidrcu_async_relax(void)

Make future async RCU callbacks lazy.

Parameters

void

no arguments

Description

After a call to this function, future calls tocall_rcu()will be processed in a lazy fashion.

voidrcu_expedite_gp(void)

Expedite future RCU grace periods

Parameters

void

no arguments

Description

After a call to this function, future calls tosynchronize_rcu() andfriends act as the correspondingsynchronize_rcu_expedited() functionhad instead been called.

voidrcu_unexpedite_gp(void)

Cancel priorrcu_expedite_gp() invocation

Parameters

void

no arguments

Description

Undo a prior call torcu_expedite_gp(). If all prior calls torcu_expedite_gp() are undone by a subsequent call torcu_unexpedite_gp(),and if the rcu_expedited sysfs/boot parameter is not set, then allsubsequent calls tosynchronize_rcu() and friends will return totheir normal non-expedited behavior.

intrcu_read_lock_held(void)

might we be in RCU read-side critical section?

Parameters

void

no arguments

Description

If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCUread-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,this assumes we are in an RCU read-side critical section unless it canprove otherwise. This is useful for debug checks in functions thatrequire that they be called within an RCU read-side critical section.

Checks debug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.

Note thatrcu_read_lock() and the matchingrcu_read_unlock() mustoccur in the same context, for example, it is illegal to invokercu_read_unlock() in process context if the matchingrcu_read_lock()was invoked from within an irq handler.

Note thatrcu_read_lock() is disallowed if the CPU is either idle oroffline from an RCU perspective, so check for those as well.

intrcu_read_lock_bh_held(void)

might we be in RCU-bh read-side critical section?

Parameters

void

no arguments

Description

Check for bottom half being disabled, which covers both theCONFIG_PROVE_RCU and not cases. Note that if someone usesrcu_read_lock_bh(), but then later enables BH, lockdep (if enabled)will show the situation. This is useful for debug checks in functionsthat require that they be called within an RCU read-side criticalsection.

Check debug_lockdep_rcu_enabled() to prevent false positives during boot.

Note thatrcu_read_lock_bh() is disallowed if the CPU is either idle oroffline from an RCU perspective, so check for those as well.

voidwakeme_after_rcu(structrcu_head*head)

Callback function to awaken a task after grace period

Parameters

structrcu_head*head

Pointer to rcu_head member within rcu_synchronize structure

Description

Awaken the corresponding task now that a grace period has elapsed.

voidinit_rcu_head_on_stack(structrcu_head*head)

initialize on-stack rcu_head for debugobjects

Parameters

structrcu_head*head

pointer to rcu_head structure to be initialized

Description

This function informs debugobjects of a new rcu_head structure thathas been allocated as an auto variable on the stack. This functionis not required for rcu_head structures that are statically defined orthat are dynamically allocated on the heap. This function has noeffect for !CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.

voiddestroy_rcu_head_on_stack(structrcu_head*head)

destroy on-stack rcu_head for debugobjects

Parameters

structrcu_head*head

pointer to rcu_head structure to be initialized

Description

This function informs debugobjects that an on-stack rcu_head structureis about to go out of scope. As withinit_rcu_head_on_stack(), thisfunction is not required for rcu_head structures that are staticallydefined or that are dynamically allocated on the heap. Also as withinit_rcu_head_on_stack(), this function has no effect for!CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.

unsignedlongget_completed_synchronize_rcu(void)

Return a pre-completed polled state cookie

Parameters

void

no arguments

Description

Returns a value that will always be treated by functions likepoll_state_synchronize_rcu() as a cookie whose grace period has alreadycompleted.

unsignedlongget_completed_synchronize_srcu(void)

Return a pre-completed polled state cookie

Parameters

void

no arguments

Description

Returns a value thatpoll_state_synchronize_srcu() will always treatas a cookie whose grace period has already completed.

boolsame_state_synchronize_srcu(unsignedlongoldstate1,unsignedlongoldstate2)

Are two old-state values identical?

Parameters

unsignedlongoldstate1

First old-state value.

unsignedlongoldstate2

Second old-state value.

Description

The two old-state values must have been obtained from eitherget_state_synchronize_srcu(),start_poll_synchronize_srcu(), orget_completed_synchronize_srcu(). Returnstrue if the two values areidentical andfalse otherwise. This allows structures whose lifetimesare tracked by old-state values to push these values to a list header,allowing those structures to be slightly smaller.

intsrcu_read_lock_held(conststructsrcu_struct*ssp)

might we be in SRCU read-side critical section?

Parameters

conststructsrcu_struct*ssp

The srcu_struct structure to check

Description

If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an SRCUread-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,this assumes we are in an SRCU read-side critical section unless it canprove otherwise.

Checks debug_lockdep_rcu_enabled() to prevent false positives during bootand while lockdep is disabled.

Note that SRCU is based on its own statemachine and it doesn’trelies on normal RCU, it can be called from the CPU whichis in the idle loop from an RCU point of view or offline.

srcu_dereference_check

srcu_dereference_check(p,ssp,c)

fetch SRCU-protected pointer for later dereferencing

Parameters

p

the pointer to fetch and protect for later dereferencing

ssp

pointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.

c

condition to check for update-side use

Description

If PROVE_RCU is enabled, invoking this outside of an RCU read-sidecritical section will result in an RCU-lockdep splat, unlessc evaluatesto 1. Thec argument will normally be a logical expression containinglockdep_is_held() calls.

srcu_dereference

srcu_dereference(p,ssp)

fetch SRCU-protected pointer for later dereferencing

Parameters

p

the pointer to fetch and protect for later dereferencing

ssp

pointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.

Description

Makesrcu_dereference_check() do the dirty work. If PROVE_RCUis enabled, invoking this outside of an RCU read-side criticalsection will result in an RCU-lockdep splat.

srcu_dereference_notrace

srcu_dereference_notrace(p,ssp)

no tracing and no lockdep calls from here

Parameters

p

the pointer to fetch and protect for later dereferencing

ssp

pointer to the srcu_struct, which is used to check that wereally are in an SRCU read-side critical section.

intsrcu_read_lock(structsrcu_struct*ssp)

register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to register the new reader.

Description

Enter an SRCU read-side critical section. Note that SRCU read-sidecritical sections may be nested. However, it is illegal tocall anything that waits on an SRCU grace period for the samesrcu_struct, whether directly or indirectly. Please note thatone way to indirectly wait on an SRCU grace period is to acquirea mutex that is held elsewhere while callingsynchronize_srcu() orsynchronize_srcu_expedited().

The return value fromsrcu_read_lock() is guaranteed to benon-negative. This value must be passed unaltered to the matchingsrcu_read_unlock(). Note thatsrcu_read_lock() and the matchingsrcu_read_unlock() must occur in the same context, for example, it isillegal to invokesrcu_read_unlock() in an irq handler if the matchingsrcu_read_lock() was invoked in process context. Or, for that matter toinvokesrcu_read_unlock() from one task and the matchingsrcu_read_lock()from another.

structsrcu_ctr__percpu*srcu_read_lock_fast(structsrcu_struct*ssp)

register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to register the new reader.

Description

Enter an SRCU read-side critical section, but for a light-weightsmp_mb()-free reader. Seesrcu_read_lock() for more information.

Ifsrcu_read_lock_fast() is ever used on an srcu_struct structure,then none of the other flavors may be used, whether before, during,or after. Note that grace-period auto-expediting is disabled for _fastsrcu_struct structures because auto-expedited grace periods invokesynchronize_rcu_expedited(), IPIs and all.

Note thatsrcu_read_lock_fast() can be invoked only from those contextswhere RCU is watching, that is, from contexts where it would be legalto invokercu_read_lock(). Otherwise, lockdep will complain.

structsrcu_ctr__percpu*srcu_down_read_fast(structsrcu_struct*ssp)

register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to register the new reader.

Description

Enter a semaphore-like SRCU read-side critical section, but fora light-weight smp_mb()-free reader. Seesrcu_read_lock_fast() andsrcu_down_read() for more information.

The same srcu_struct may be used concurrently bysrcu_down_read_fast()andsrcu_read_lock_fast().

intsrcu_read_lock_lite(structsrcu_struct*ssp)

register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to register the new reader.

Description

Enter an SRCU read-side critical section, but for a light-weightsmp_mb()-free reader. Seesrcu_read_lock() for more information.

Ifsrcu_read_lock_lite() is ever used on an srcu_struct structure,then none of the other flavors may be used, whether before, during,or after. Note that grace-period auto-expediting is disabled for _litesrcu_struct structures because auto-expedited grace periods invokesynchronize_rcu_expedited(), IPIs and all.

Note thatsrcu_read_lock_lite() can be invoked only from those contextswhere RCU is watching, that is, from contexts where it would be legalto invokercu_read_lock(). Otherwise, lockdep will complain.

intsrcu_read_lock_nmisafe(structsrcu_struct*ssp)

register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to register the new reader.

Description

Enter an SRCU read-side critical section, but in an NMI-safe manner.Seesrcu_read_lock() for more information.

Ifsrcu_read_lock_nmisafe() is ever used on an srcu_struct structure,then none of the other flavors may be used, whether before, during,or after.

intsrcu_down_read(structsrcu_struct*ssp)

register a new reader for an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to register the new reader.

Description

Enter a semaphore-like SRCU read-side critical section. Note thatSRCU read-side critical sections may be nested. However, it isillegal to call anything that waits on an SRCU grace period for thesame srcu_struct, whether directly or indirectly. Please note thatone way to indirectly wait on an SRCU grace period is to acquirea mutex that is held elsewhere while callingsynchronize_srcu() orsynchronize_srcu_expedited(). But if you want lockdep to help youkeep this stuff straight, you should instead usesrcu_read_lock().

The semaphore-like nature ofsrcu_down_read() means that the matchingsrcu_up_read() can be invoked from some other context, for example,from some other task or from an irq handler. However, neithersrcu_down_read() norsrcu_up_read() may be invoked from an NMI handler.

Calls tosrcu_down_read() may be nested, similar to the manner inwhich calls to down_read() may be nested. The same srcu_struct may beused concurrently bysrcu_down_read() andsrcu_read_lock().

voidsrcu_read_unlock(structsrcu_struct*ssp,intidx)

unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to unregister the old reader.

intidx

return value from correspondingsrcu_read_lock().

Description

Exit an SRCU read-side critical section.

voidsrcu_read_unlock_fast(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)

unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to unregister the old reader.

structsrcu_ctr__percpu*scp

return value from correspondingsrcu_read_lock_fast().

Description

Exit a light-weight SRCU read-side critical section.

voidsrcu_up_read_fast(structsrcu_struct*ssp,structsrcu_ctr__percpu*scp)

unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to unregister the old reader.

structsrcu_ctr__percpu*scp

return value from correspondingsrcu_read_lock_fast().

Description

Exit an SRCU read-side critical section, but not necessarily fromthe same context as the machingsrcu_down_read_fast().

voidsrcu_read_unlock_lite(structsrcu_struct*ssp,intidx)

unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to unregister the old reader.

intidx

return value from correspondingsrcu_read_lock_lite().

Description

Exit a light-weight SRCU read-side critical section.

voidsrcu_read_unlock_nmisafe(structsrcu_struct*ssp,intidx)

unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to unregister the old reader.

intidx

return value from correspondingsrcu_read_lock_nmisafe().

Description

Exit an SRCU read-side critical section, but in an NMI-safe manner.

voidsrcu_up_read(structsrcu_struct*ssp,intidx)

unregister a old reader from an SRCU-protected structure.

Parameters

structsrcu_struct*ssp

srcu_struct in which to unregister the old reader.

intidx

return value from correspondingsrcu_read_lock().

Description

Exit an SRCU read-side critical section, but not necessarily fromthe same context as the machingsrcu_down_read().

voidsmp_mb__after_srcu_read_unlock(void)

ensure full ordering after srcu_read_unlock

Parameters

void

no arguments

Description

Converts the preceding srcu_read_unlock into a two-way memory barrier.

Call this after srcu_read_unlock, to guarantee that all memory operationsthat occur after smp_mb__after_srcu_read_unlock will appear to happen afterthe preceding srcu_read_unlock.

voidsmp_mb__after_srcu_read_lock(void)

ensure full ordering after srcu_read_lock

Parameters

void

no arguments

Description

Converts the preceding srcu_read_lock into a two-way memory barrier.

Call this after srcu_read_lock, to guarantee that all memory operationsthat occur after smp_mb__after_srcu_read_lock will appear to happen afterthe preceding srcu_read_lock.

intinit_srcu_struct(structsrcu_struct*ssp)

initialize a sleep-RCU structure

Parameters

structsrcu_struct*ssp

structure to initialize.

Description

Must invoke this on a given srcu_struct before passing that srcu_structto any other function. Each srcu_struct represents a separate domainof SRCU protection.

boolsrcu_readers_active(structsrcu_struct*ssp)

returns true if there are readers. and false otherwise

Parameters

structsrcu_struct*ssp

which srcu_struct to count active readers (holding srcu_read_lock).

Description

Note that this is not an atomic primitive, and can therefore suffersevere errors when invoked on an active srcu_struct. That said, itcan be useful as an error check at cleanup time.

voidcleanup_srcu_struct(structsrcu_struct*ssp)

deconstruct a sleep-RCU structure

Parameters

structsrcu_struct*ssp

structure to clean up.

Description

Must invoke this after you are finished using a given srcu_struct thatwas initialized viainit_srcu_struct(), else you leak memory.

voidcall_srcu(structsrcu_struct*ssp,structrcu_head*rhp,rcu_callback_tfunc)

Queue a callback for invocation after an SRCU grace period

Parameters

structsrcu_struct*ssp

srcu_struct in queue the callback

structrcu_head*rhp

structure to be used for queueing the SRCU callback.

rcu_callback_tfunc

function to be invoked after the SRCU grace period

Description

The callback function will be invoked some time after a full SRCUgrace period elapses, in other words after all pre-existing SRCUread-side critical sections have completed. However, the callbackfunction might well execute concurrently with other SRCU read-sidecritical sections that started aftercall_srcu() was invoked. SRCUread-side critical sections are delimited bysrcu_read_lock() andsrcu_read_unlock(), and may be nested.

The callback will be invoked from process context, but with bhdisabled. The callback function must therefore be fast and mustnot block.

See the description ofcall_rcu() for more detailed information onmemory ordering guarantees.

voidsynchronize_srcu_expedited(structsrcu_struct*ssp)

Brute-force SRCU grace period

Parameters

structsrcu_struct*ssp

srcu_struct with which to synchronize.

Description

Wait for an SRCU grace period to elapse, but be more aggressive aboutspinning rather than blocking when waiting.

Note thatsynchronize_srcu_expedited() has the same deadlock andmemory-ordering properties as doessynchronize_srcu().

voidsynchronize_srcu(structsrcu_struct*ssp)

wait for prior SRCU read-side critical-section completion

Parameters

structsrcu_struct*ssp

srcu_struct with which to synchronize.

Description

Wait for the count to drain to zero of both indexes. To avoid thepossible starvation ofsynchronize_srcu(), it waits for the count ofthe index=!(ssp->srcu_ctrp -ssp->sda->srcu_ctrs[0]) to drain to zeroat first, and then flip the ->srcu_ctrp and wait for the count of theother index.

Can block; must be called from process context.

Note that it is illegal to callsynchronize_srcu() from the correspondingSRCU read-side critical section; doing so will result in deadlock.However, it is perfectly legal to callsynchronize_srcu() on onesrcu_struct from some other srcu_struct’s read-side critical section,as long as the resulting graph of srcu_structs is acyclic.

There are memory-ordering constraints implied bysynchronize_srcu().On systems with more than one CPU, whensynchronize_srcu() returns,each CPU is guaranteed to have executed a full memory barrier sincethe end of its last corresponding SRCU read-side critical sectionwhose beginning preceded the call tosynchronize_srcu(). In addition,each CPU having an SRCU read-side critical section that extends beyondthe return fromsynchronize_srcu() is guaranteed to have executed afull memory barrier after the beginning ofsynchronize_srcu() and beforethe beginning of that SRCU read-side critical section. Note that theseguarantees include CPUs that are offline, idle, or executing in user mode,as well as CPUs that are executing in the kernel.

Furthermore, if CPU A invokedsynchronize_srcu(), which returnedto its caller on CPU B, then both CPU A and CPU B are guaranteedto have executed a full memory barrier during the execution ofsynchronize_srcu(). This guarantee applies even if CPU A and CPU Bare the same CPU, but again only if the system has more than one CPU.

Of course, these memory-ordering guarantees apply only whensynchronize_srcu(),srcu_read_lock(), andsrcu_read_unlock() arepassed the same srcu_struct structure.

Implementation of these memory-ordering guarantees is similar tothat ofsynchronize_rcu().

If SRCU is likely idle as determined by srcu_should_expedite(),expedite the first request. This semantic was provided by Classic SRCU,and is relied upon by its users, so TREE SRCU must also provide it.Note that detecting idleness is heuristic and subject to both falsepositives and negatives.

unsignedlongget_state_synchronize_srcu(structsrcu_struct*ssp)

Provide an end-of-grace-period cookie

Parameters

structsrcu_struct*ssp

srcu_struct to provide cookie for.

Description

This function returns a cookie that can be passed topoll_state_synchronize_srcu(), which will return true if a full graceperiod has elapsed in the meantime. It is the caller’s responsibilityto make sure that grace period happens, for example, by invokingcall_srcu() after return fromget_state_synchronize_srcu().

unsignedlongstart_poll_synchronize_srcu(structsrcu_struct*ssp)

Provide cookie and start grace period

Parameters

structsrcu_struct*ssp

srcu_struct to provide cookie for.

Description

This function returns a cookie that can be passed topoll_state_synchronize_srcu(), which will return true if a full graceperiod has elapsed in the meantime. Unlikeget_state_synchronize_srcu(),this function also ensures that any needed SRCU grace period will bestarted. This convenience does come at a cost in terms of CPU overhead.

boolpoll_state_synchronize_srcu(structsrcu_struct*ssp,unsignedlongcookie)

Has cookie’s grace period ended?

Parameters

structsrcu_struct*ssp

srcu_struct to provide cookie for.

unsignedlongcookie

Return value fromget_state_synchronize_srcu() orstart_poll_synchronize_srcu().

Description

This function takes the cookie that was returned from eitherget_state_synchronize_srcu() orstart_poll_synchronize_srcu(), andreturnstrue if an SRCU grace period elapsed since the time that thecookie was created.

Because cookies are finite in size, wrapping/overflow is possible.This is more pronounced on 32-bit systems where cookies are 32 bits,where in theory wrapping could happen in about 14 hours assuming25-microsecond expedited SRCU grace periods. However, a more likelyoverflow lower bound is on the order of 24 days in the case ofone-millisecond SRCU grace periods. Of course, wrapping in a 64-bitsystem requires geologic timespans, as in more than seven million yearseven for expedited SRCU grace periods.

Wrapping/overflow is much more of an issue for CONFIG_SMP=n systemsthat also have CONFIG_PREEMPTION=n, which selects Tiny SRCU. This usesa 16-bit cookie, which rcutorture routinely wraps in a matter of afew minutes. If this proves to be a problem, this counter will beexpanded to the same size as for Tree SRCU.

voidsrcu_barrier(structsrcu_struct*ssp)

Wait until all in-flightcall_srcu() callbacks complete.

Parameters

structsrcu_struct*ssp

srcu_struct on which to wait for in-flight callbacks.

unsignedlongsrcu_batches_completed(structsrcu_struct*ssp)

return batches completed.

Parameters

structsrcu_struct*ssp

srcu_struct on which to report batch completion.

Description

Report the number of batches, correlated with, but not necessarilyprecisely the same as, the number of grace periods that have elapsed.

voidhlist_bl_del_rcu(structhlist_bl_node*n)

deletes entry from hash list without re-initialization

Parameters

structhlist_bl_node*n

the element to delete from the hash list.

Note

hlist_bl_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.

Description

In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_bl_add_head_rcu()orhlist_bl_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_bl_for_each_entry().

voidhlist_bl_add_head_rcu(structhlist_bl_node*n,structhlist_bl_head*h)

Parameters

structhlist_bl_node*n

the element to add to the hash list.

structhlist_bl_head*h

the list to add to.

Description

Adds the specified element to the specified hlist_bl,while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_bl_add_head_rcu()orhlist_bl_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_bl_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().

hlist_bl_for_each_entry_rcu

hlist_bl_for_each_entry_rcu(tpos,pos,head,member)

iterate over rcu list of given type

Parameters

tpos

the type * to use as a loop cursor.

pos

thestructhlist_bl_node to use as a loop cursor.

head

the head for your list.

member

the name of the hlist_bl_node within the struct.

list_tail_rcu

list_tail_rcu(head)

returns the prev pointer of the head of the list

Parameters

head

the head of the list

Note

This should only be used with the list header, and even thenonly iflist_del() and similar primitives are not also used on thelist header.

voidlist_add_rcu(structlist_head*new,structlist_head*head)

add a new entry to rcu-protected list

Parameters

structlist_head*new

new entry to be added

structlist_head*head

list head to add it after

Description

Insert a new entry after the specified head.This is good for implementing stacks.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_add_rcu()orlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu().

voidlist_add_tail_rcu(structlist_head*new,structlist_head*head)

add a new entry to rcu-protected list

Parameters

structlist_head*new

new entry to be added

structlist_head*head

list head to add it before

Description

Insert a new entry before the specified head.This is useful for implementing queues.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_add_tail_rcu()orlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu().

voidlist_del_rcu(structlist_head*entry)

deletes entry from list without re-initialization

Parameters

structlist_head*entry

the element to delete from the list.

Note

list_empty() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.

Description

In particular, it means that we can not poison the forwardpointers that may still be used for walking the list.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such aslist_del_rcu()orlist_add_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such aslist_for_each_entry_rcu().

Note that the caller is not permitted to immediately freethe newly deleted entry. Instead, eithersynchronize_rcu()orcall_rcu() must be used to defer freeing until an RCUgrace period has elapsed.

voidlist_bidir_del_rcu(structlist_head*entry)

deletes entry from list without re-initialization

Parameters

structlist_head*entry

the element to delete from the list.

Description

In contrast tolist_del_rcu() doesn’t poison the prev pointer thusallowing backwards traversal via list_bidir_prev_rcu().

The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with another list-mutationprimitive, such aslist_bidir_del_rcu() orlist_add_rcu(), running onthis same list. However, it is perfectly legal to run concurrentlywith the _rcu list-traversal primitives, such aslist_for_each_entry_rcu().

Note thatlist_del_rcu() andlist_bidir_del_rcu() must not be used onthe same list.

Note that the caller is not permitted to immediately freethe newly deleted entry. Instead, eithersynchronize_rcu()orcall_rcu() must be used to defer freeing until an RCUgrace period has elapsed.

Note

list_empty() on entry does not return true after this becausethe entry is in a special undefined state that permits RCU-basedlockfree reverse traversal. In particular this means that we can notpoison the forward and backwards pointers that may still be used forwalking the list.

voidhlist_del_init_rcu(structhlist_node*n)

deletes entry from hash list with re-initialization

Parameters

structhlist_node*n

the element to delete from the hash list.

Note

list_unhashed() on the node return true after this. It isuseful for RCU based read lockfree traversal if the writer sidemust know if the list entry is still hashed or already unhashed.

Description

In particular, it means that we can not poison the forward pointersthat may still be used for walking the hash list and we can onlyzero the pprev pointer so list_unhashed() will return true afterthis.

The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with anotherlist-mutation primitive, such ashlist_add_head_rcu() orhlist_del_rcu(), running on this same list. However, it isperfectly legal to run concurrently with the _rcu list-traversalprimitives, such ashlist_for_each_entry_rcu().

voidlist_replace_rcu(structlist_head*old,structlist_head*new)

replace old entry by new one

Parameters

structlist_head*old

the element to be replaced

structlist_head*new

the new element to insert

Description

Theold entry will be replaced with thenew entry atomically fromthe perspective of concurrent readers. It is the caller’s responsibilityto synchronize with concurrent updaters, if any.

Note

old should not be empty.

void__list_splice_init_rcu(structlist_head*list,structlist_head*prev,structlist_head*next,void(*sync)(void))

join an RCU-protected list into an existing list.

Parameters

structlist_head*list

the RCU-protected list to splice

structlist_head*prev

points to the last element of the existing list

structlist_head*next

points to the first element of the existing list

void(*sync)(void)

synchronize_rcu, synchronize_rcu_expedited, ...

Description

The list pointed to byprev andnext can be RCU-read traversedconcurrently with this function.

Note that this function blocks.

Important note: the caller must take whatever action is necessary to preventany other updates to the existing list. In principle, it is possible tomodify the list as soon as sync() begins execution. If this sort of thingbecomes necessary, an alternative version based oncall_rcu() could becreated. But only if -really- needed -- there is no shortage of RCU APImembers.

voidlist_splice_init_rcu(structlist_head*list,structlist_head*head,void(*sync)(void))

splice an RCU-protected list into an existing list, designed for stacks.

Parameters

structlist_head*list

the RCU-protected list to splice

structlist_head*head

the place in the existing list to splice the first list into

void(*sync)(void)

synchronize_rcu, synchronize_rcu_expedited, ...

voidlist_splice_tail_init_rcu(structlist_head*list,structlist_head*head,void(*sync)(void))

splice an RCU-protected list into an existing list, designed for queues.

Parameters

structlist_head*list

the RCU-protected list to splice

structlist_head*head

the place in the existing list to splice the first list into

void(*sync)(void)

synchronize_rcu, synchronize_rcu_expedited, ...

list_entry_rcu

list_entry_rcu(ptr,type,member)

get the struct for this entry

Parameters

ptr

thestructlist_head pointer.

type

the type of the struct this is embedded in.

member

the name of the list_head within the struct.

Description

This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu() as long as it’s guarded byrcu_read_lock().

list_first_or_null_rcu

list_first_or_null_rcu(ptr,type,member)

get the first element from a list

Parameters

ptr

the list head to take the element from.

type

the type of the struct this is embedded in.

member

the name of the list_head within the struct.

Description

Note that if the list is empty, it returns NULL.

This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu() as long as it’s guarded byrcu_read_lock().

list_next_or_null_rcu

list_next_or_null_rcu(head,ptr,type,member)

get the next element from a list

Parameters

head

the head for the list.

ptr

the list head to take the next element from.

type

the type of the struct this is embedded in.

member

the name of the list_head within the struct.

Description

Note that if the ptr is at the end of the list, NULL is returned.

This primitive may safely run concurrently with the _rcu list-mutationprimitives such aslist_add_rcu() as long as it’s guarded byrcu_read_lock().

list_for_each_entry_rcu

list_for_each_entry_rcu(pos,head,member,cond...)

iterate over rcu list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_head within the struct.

cond...

optional lockdep expression if called from non-RCU protection.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such aslist_add_rcu()as long as the traversal is guarded byrcu_read_lock().

list_for_each_entry_srcu

list_for_each_entry_srcu(pos,head,member,cond)

iterate over rcu list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_head within the struct.

cond

lockdep expression for the lock required to traverse the list.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such aslist_add_rcu()as long as the traversal is guarded bysrcu_read_lock().The lockdep expressionsrcu_read_lock_held() can be passed as thecond argument from read side.

list_entry_lockless

list_entry_lockless(ptr,type,member)

get the struct for this entry

Parameters

ptr

thestructlist_head pointer.

type

the type of the struct this is embedded in.

member

the name of the list_head within the struct.

Description

This primitive may safely run concurrently with the _rculist-mutation primitives such aslist_add_rcu(), but requires someimplicit RCU read-side guarding. One example is running within a specialexception-time environment where preemption is disabled and where lockdepcannot be invoked. Another example is when items are added to the list,but never deleted.

list_for_each_entry_lockless

list_for_each_entry_lockless(pos,head,member)

iterate over rcu list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_struct within the struct.

Description

This primitive may safely run concurrently with the _rculist-mutation primitives such aslist_add_rcu(), but requires someimplicit RCU read-side guarding. One example is running within a specialexception-time environment where preemption is disabled and where lockdepcannot be invoked. Another example is when items are added to the list,but never deleted.

list_for_each_entry_continue_rcu

list_for_each_entry_continue_rcu(pos,head,member)

continue iteration over list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_head within the struct.

Description

Continue to iterate over list of given type, continuing afterthe current position which must have been in the list when the RCU readlock was taken.This would typically require either that you obtained the node from aprevious walk of the list in the same RCU read-side critical section, orthat you held some sort of non-RCU reference (such as a reference count)to keep the node aliveand in the list.

This iterator is similar tolist_for_each_entry_from_rcu() exceptthis starts after the given position and that one starts at the givenposition.

list_for_each_entry_from_rcu

list_for_each_entry_from_rcu(pos,head,member)

iterate over a list from current point

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the list_node within the struct.

Description

Iterate over the tail of a list starting from a given position,which must have been in the list when the RCU read lock was taken.This would typically require either that you obtained the node from aprevious walk of the list in the same RCU read-side critical section, orthat you held some sort of non-RCU reference (such as a reference count)to keep the node aliveand in the list.

This iterator is similar tolist_for_each_entry_continue_rcu() exceptthis starts from the given position and that one starts from the positionafter the given position.

voidhlist_del_rcu(structhlist_node*n)

deletes entry from hash list without re-initialization

Parameters

structhlist_node*n

the element to delete from the hash list.

Note

list_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.

Description

In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry().

voidhlist_replace_rcu(structhlist_node*old,structhlist_node*new)

replace old entry by new one

Parameters

structhlist_node*old

the element to be replaced

structhlist_node*new

the new element to insert

Description

Theold entry will be replaced with thenew entry atomically fromthe perspective of concurrent readers. It is the caller’s responsibilityto synchronize with concurrent updaters, if any.

voidhlists_swap_heads_rcu(structhlist_head*left,structhlist_head*right)

swap the lists the hlist heads point to

Parameters

structhlist_head*left

The hlist head on the left

structhlist_head*right

The hlist head on the right

Description

The lists start out as [left ][node1 ... ] and

[right ][node2 ... ]

The lists end up as [left ][node2 ... ]

[right ][node1 ... ]

voidhlist_add_head_rcu(structhlist_node*n,structhlist_head*h)

Parameters

structhlist_node*n

the element to add to the hash list.

structhlist_head*h

the list to add to.

Description

Adds the specified element to the specified hlist,while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().

voidhlist_add_tail_rcu(structhlist_node*n,structhlist_head*h)

Parameters

structhlist_node*n

the element to add to the hash list.

structhlist_head*h

the list to add to.

Description

Adds the specified element to the specified hlist,while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().

voidhlist_add_before_rcu(structhlist_node*n,structhlist_node*next)

Parameters

structhlist_node*n

the new element to add to the hash list.

structhlist_node*next

the existing element to add the new element before.

Description

Adds the specified element to the specified hlistbefore the specified node while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs.

voidhlist_add_behind_rcu(structhlist_node*n,structhlist_node*prev)

Parameters

structhlist_node*n

the new element to add to the hash list.

structhlist_node*prev

the existing element to add the new element after.

Description

Adds the specified element to the specified hlistafter the specified node while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_add_head_rcu()orhlist_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs.

hlist_for_each_entry_rcu

hlist_for_each_entry_rcu(pos,head,member,cond...)

iterate over rcu list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the hlist_node within the struct.

cond...

optional lockdep expression if called from non-RCU protection.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().

hlist_for_each_entry_srcu

hlist_for_each_entry_srcu(pos,head,member,cond)

iterate over rcu list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the hlist_node within the struct.

cond

lockdep expression for the lock required to traverse the list.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded bysrcu_read_lock().The lockdep expressionsrcu_read_lock_held() can be passed as thecond argument from read side.

hlist_for_each_entry_rcu_notrace

hlist_for_each_entry_rcu_notrace(pos,head,member)

iterate over rcu list of given type (for tracing)

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the hlist_node within the struct.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().

This is the same ashlist_for_each_entry_rcu() except that it doesnot do any RCU debugging or tracing.

hlist_for_each_entry_rcu_bh

hlist_for_each_entry_rcu_bh(pos,head,member)

iterate over rcu list of given type

Parameters

pos

the type * to use as a loop cursor.

head

the head for your list.

member

the name of the hlist_node within the struct.

Description

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().

hlist_for_each_entry_continue_rcu

hlist_for_each_entry_continue_rcu(pos,member)

iterate over a hlist continuing after current point

Parameters

pos

the type * to use as a loop cursor.

member

the name of the hlist_node within the struct.

hlist_for_each_entry_continue_rcu_bh

hlist_for_each_entry_continue_rcu_bh(pos,member)

iterate over a hlist continuing after current point

Parameters

pos

the type * to use as a loop cursor.

member

the name of the hlist_node within the struct.

hlist_for_each_entry_from_rcu

hlist_for_each_entry_from_rcu(pos,member)

iterate over a hlist continuing from current point

Parameters

pos

the type * to use as a loop cursor.

member

the name of the hlist_node within the struct.

voidhlist_nulls_del_init_rcu(structhlist_nulls_node*n)

deletes entry from hash list with re-initialization

Parameters

structhlist_nulls_node*n

the element to delete from the hash list.

Note

hlist_nulls_unhashed() on the node return true after this. It isuseful for RCU based read lockfree traversal if the writer sidemust know if the list entry is still hashed or already unhashed.

Description

In particular, it means that we can not poison the forward pointersthat may still be used for walking the hash list and we can onlyzero the pprev pointer so list_unhashed() will return true afterthis.

The caller must take whatever precautions are necessary (such asholding appropriate locks) to avoid racing with anotherlist-mutation primitive, such ashlist_nulls_add_head_rcu() orhlist_nulls_del_rcu(), running on this same list. However, it isperfectly legal to run concurrently with the _rcu list-traversalprimitives, such ashlist_nulls_for_each_entry_rcu().

hlist_nulls_first_rcu

hlist_nulls_first_rcu(head)

returns the first element of the hash list.

Parameters

head

the head of the list.

hlist_nulls_next_rcu

hlist_nulls_next_rcu(node)

returns the element of the list afternode.

Parameters

node

element of the list.

voidhlist_nulls_del_rcu(structhlist_nulls_node*n)

deletes entry from hash list without re-initialization

Parameters

structhlist_nulls_node*n

the element to delete from the hash list.

Note

hlist_nulls_unhashed() on entry does not return true after this,the entry is in an undefined state. It is useful for RCU basedlockfree traversal.

Description

In particular, it means that we can not poison the forwardpointers that may still be used for walking the hash list.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()orhlist_nulls_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry().

voidhlist_nulls_add_head_rcu(structhlist_nulls_node*n,structhlist_nulls_head*h)

Parameters

structhlist_nulls_node*n

the element to add to the hash list.

structhlist_nulls_head*h

the list to add to.

Description

Adds the specified element to the specified hlist_nulls,while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()orhlist_nulls_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().

voidhlist_nulls_add_tail_rcu(structhlist_nulls_node*n,structhlist_nulls_head*h)

Parameters

structhlist_nulls_node*n

the element to add to the hash list.

structhlist_nulls_head*h

the list to add to.

Description

Adds the specified element to the specified hlist_nulls,while permitting racing traversals.

The caller must take whatever precautions are necessary(such as holding appropriate locks) to avoid racingwith another list-mutation primitive, such ashlist_nulls_add_head_rcu()orhlist_nulls_del_rcu(), running on this same list.However, it is perfectly legal to run concurrently withthe _rcu list-traversal primitives, such ashlist_nulls_for_each_entry_rcu(), used to prevent memory-consistencyproblems on Alpha CPUs. Regardless of the type of CPU, thelist-traversal primitive must be guarded byrcu_read_lock().

hlist_nulls_for_each_entry_rcu

hlist_nulls_for_each_entry_rcu(tpos,pos,head,member)

iterate over rcu list of given type

Parameters

tpos

the type * to use as a loop cursor.

pos

thestructhlist_nulls_node to use as a loop cursor.

head

the head of the list.

member

the name of the hlist_nulls_node within the struct.

Description

The barrier() is needed to make sure compiler doesn’t cache first element [1],as this loop can be restarted [2][1] Documentation/memory-barriers.txt around line 1533[2]Using RCU hlist_nulls to protect list and objects around line 146

hlist_nulls_for_each_entry_safe

hlist_nulls_for_each_entry_safe(tpos,pos,head,member)

iterate over list of given type safe against removal of list entry

Parameters

tpos

the type * to use as a loop cursor.

pos

thestructhlist_nulls_node to use as a loop cursor.

head

the head of the list.

member

the name of the hlist_nulls_node within the struct.

boolrcu_sync_is_idle(structrcu_sync*rsp)

Are readers permitted to use their fastpaths?

Parameters

structrcu_sync*rsp

Pointer to rcu_sync structure to use for synchronization

Description

Returns true if readers are permitted to use their fastpaths. Must beinvoked within some flavor of RCU read-side critical section.

voidrcu_sync_init(structrcu_sync*rsp)

Initialize an rcu_sync structure

Parameters

structrcu_sync*rsp

Pointer to rcu_sync structure to be initialized

voidrcu_sync_func(structrcu_head*rhp)

Callback function managing reader access to fastpath

Parameters

structrcu_head*rhp

Pointer to rcu_head in rcu_sync structure to use for synchronization

Description

This function is passed tocall_rcu() function byrcu_sync_enter() andrcu_sync_exit(), so that it is invoked after a grace period following thethat invocation of enter/exit.

If it is called byrcu_sync_enter() it signals that all the readers wereswitched onto slow path.

If it is called byrcu_sync_exit() it takes action based on events thathave taken place in the meantime, so that closely spacedrcu_sync_enter()andrcu_sync_exit() pairs need not wait for a grace period.

If anotherrcu_sync_enter() is invoked before the grace periodended, reset state to allow the nextrcu_sync_exit() to let thereaders back onto their fastpaths (after a grace period). If bothanotherrcu_sync_enter() and its matchingrcu_sync_exit() are invokedbefore the grace period ended, re-invokecall_rcu() on behalf of thatrcu_sync_exit(). Otherwise, set all state back to idle so that readerscan again use their fastpaths.

voidrcu_sync_enter(structrcu_sync*rsp)

Force readers onto slowpath

Parameters

structrcu_sync*rsp

Pointer to rcu_sync structure to use for synchronization

Description

This function is used by updaters who need readers to make use ofa slowpath during the update. After this function returns, allsubsequent calls torcu_sync_is_idle() will return false, whichtells readers to stay off their fastpaths. A later call torcu_sync_exit() re-enables reader fastpaths.

When called in isolation,rcu_sync_enter() must wait for a graceperiod, however, closely spaced calls torcu_sync_enter() canoptimize away the grace-period wait via a state machine implementedbyrcu_sync_enter(),rcu_sync_exit(), andrcu_sync_func().

voidrcu_sync_exit(structrcu_sync*rsp)

Allow readers back onto fast path after grace period

Parameters

structrcu_sync*rsp

Pointer to rcu_sync structure to use for synchronization

Description

This function is used by updaters who have completed, and can thereforenow allow readers to make use of their fastpaths after a grace periodhas elapsed. After this grace period has completed, all subsequentcalls torcu_sync_is_idle() will return true, which tells readers thatthey can once again use their fastpaths.

voidrcu_sync_dtor(structrcu_sync*rsp)

Clean up an rcu_sync structure

Parameters

structrcu_sync*rsp

Pointer to rcu_sync structure to be cleaned up

structrcu_tasks_percpu

Per-CPU component of definition for a Tasks-RCU-like mechanism.

Definition:

struct rcu_tasks_percpu {    struct rcu_segcblist cblist;    raw_spinlock_t __private lock;    unsigned long rtp_jiffies;    unsigned long rtp_n_lock_retries;    struct timer_list lazy_timer;    unsigned int urgent_gp;    struct work_struct rtp_work;    struct irq_work rtp_irq_work;    struct rcu_head barrier_q_head;    struct list_head rtp_blkd_tasks;    struct list_head rtp_exit_list;    int cpu;    int index;    struct rcu_tasks *rtpp;};

Members

cblist

Callback list.

lock

Lock protecting per-CPU callback list.

rtp_jiffies

Jiffies counter value for statistics.

rtp_n_lock_retries

Rough lock-contention statistic.

lazy_timer

Timer to unlazify callbacks.

urgent_gp

Number of additional non-lazy grace periods.

rtp_work

Work queue for invoking callbacks.

rtp_irq_work

IRQ work queue for deferred wakeups.

barrier_q_head

RCU callback for barrier operation.

rtp_blkd_tasks

List of tasks blocked as readers.

rtp_exit_list

List of tasks in the latter portion of do_exit().

cpu

CPU number corresponding to this entry.

index

Index of this CPU in rtpcp_array of the rcu_tasks structure.

rtpp

Pointer to the rcu_tasks structure.

structrcu_tasks

Definition for a Tasks-RCU-like mechanism.

Definition:

struct rcu_tasks {    struct rcuwait cbs_wait;    raw_spinlock_t cbs_gbl_lock;    struct mutex tasks_gp_mutex;    int gp_state;    int gp_sleep;    int init_fract;    unsigned long gp_jiffies;    unsigned long gp_start;    unsigned long tasks_gp_seq;    unsigned long n_ipis;    unsigned long n_ipis_fails;    struct task_struct *kthread_ptr;    unsigned long lazy_jiffies;    rcu_tasks_gp_func_t gp_func;    pregp_func_t pregp_func;    pertask_func_t pertask_func;    postscan_func_t postscan_func;    holdouts_func_t holdouts_func;    postgp_func_t postgp_func;    call_rcu_func_t call_func;    unsigned int wait_state;    struct rcu_tasks_percpu __percpu *rtpcpu;    struct rcu_tasks_percpu **rtpcp_array;    int percpu_enqueue_shift;    int percpu_enqueue_lim;    int percpu_dequeue_lim;    unsigned long percpu_dequeue_gpseq;    struct mutex barrier_q_mutex;    atomic_t barrier_q_count;    struct completion barrier_q_completion;    unsigned long barrier_q_seq;    unsigned long barrier_q_start;    char *name;    char *kname;};

Members

cbs_wait

RCU wait allowing a new callback to get kthread’s attention.

cbs_gbl_lock

Lock protecting callback list.

tasks_gp_mutex

Mutex protecting grace period, needed during mid-boot dead zone.

gp_state

Grace period’s most recent state transition (debugging).

gp_sleep

Per-grace-period sleep to prevent CPU-bound looping.

init_fract

Initial backoff sleep interval.

gp_jiffies

Time of lastgp_state transition.

gp_start

Most recent grace-period start in jiffies.

tasks_gp_seq

Number of grace periods completed since boot in upper bits.

n_ipis

Number of IPIs sent to encourage grace periods to end.

n_ipis_fails

Number of IPI-send failures.

kthread_ptr

This flavor’s grace-period/callback-invocation kthread.

lazy_jiffies

Number of jiffies to allow callbacks to be lazy.

gp_func

This flavor’s grace-period-wait function.

pregp_func

This flavor’s pre-grace-period function (optional).

pertask_func

This flavor’s per-task scan function (optional).

postscan_func

This flavor’s post-task scan function (optional).

holdouts_func

This flavor’s holdout-list scan function (optional).

postgp_func

This flavor’s post-grace-period function (optional).

call_func

This flavor’scall_rcu()-equivalent function.

wait_state

Task state for synchronous grace-period waits (default TASK_UNINTERRUPTIBLE).

rtpcpu

This flavor’s rcu_tasks_percpu structure.

rtpcp_array

Array of pointers to rcu_tasks_percpu structure of CPUs in cpu_possible_mask.

percpu_enqueue_shift

Shift down CPU ID this much when enqueuing callbacks.

percpu_enqueue_lim

Number of per-CPU callback queues in use for enqueuing.

percpu_dequeue_lim

Number of per-CPU callback queues in use for dequeuing.

percpu_dequeue_gpseq

RCU grace-period number to propagate enqueue limit to dequeuers.

barrier_q_mutex

Serialize barrier operations.

barrier_q_count

Number of queues being waited on.

barrier_q_completion

Barrier wait/wakeup mechanism.

barrier_q_seq

Sequence number for barrier operations.

barrier_q_start

Most recent barrier start in jiffies.

name

This flavor’s textual name.

kname

This flavor’s kthread name.

voidcall_rcu_tasks(structrcu_head*rhp,rcu_callback_tfunc)

Queue an RCU for invocation task-based grace period

Parameters

structrcu_head*rhp

structure to be used for queueing the RCU updates.

rcu_callback_tfunc

actual callback function to be invoked after the grace period

Description

The callback function will be invoked some time after a full graceperiod elapses, in other words after all currently executing RCUread-side critical sections have completed.call_rcu_tasks() assumesthat the read-side critical sections end at a voluntary contextswitch (not a preemption!),cond_resched_tasks_rcu_qs(), entry into idle,or transition to usermode execution. As such, there are no read-sideprimitives analogous torcu_read_lock() andrcu_read_unlock() becausethis primitive is intended to determine that all tasks have passedthrough a safe state, not so much for data-structure synchronization.

See the description ofcall_rcu() for more detailed information onmemory ordering guarantees.

voidsynchronize_rcu_tasks(void)

wait until an rcu-tasks grace period has elapsed.

Parameters

void

no arguments

Description

Control will return to the caller some time after a full rcu-tasksgrace period has elapsed, in other words after all currentlyexecuting rcu-tasks read-side critical sections have elapsed. Theseread-side critical sections are delimited by calls to schedule(),cond_resched_tasks_rcu_qs(), idle execution, userspace execution, callstosynchronize_rcu_tasks(), and (in theory, anyway) cond_resched().

This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of functionpreambles and profiling hooks. Thesynchronize_rcu_tasks() functionis not (yet) intended for heavy use from multiple CPUs.

See the description ofsynchronize_rcu() for more detailed informationon memory ordering guarantees.

voidrcu_barrier_tasks(void)

Wait for in-flightcall_rcu_tasks() callbacks.

Parameters

void

no arguments

Description

Although the current implementation is guaranteed to wait, it is notobligated to, for example, if there are no pending callbacks.

voidsynchronize_rcu_tasks_rude(void)

wait for a rude rcu-tasks grace period

Parameters

void

no arguments

Description

Control will return to the caller some time after a rude rcu-tasksgrace period has elapsed, in other words after all currentlyexecuting rcu-tasks read-side critical sections have elapsed. Theseread-side critical sections are delimited by calls to schedule(),cond_resched_tasks_rcu_qs(), userspace execution (which is a schedulablecontext), and (in theory, anyway) cond_resched().

This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of function preamblesand profiling hooks. Thesynchronize_rcu_tasks_rude() function is not(yet) intended for heavy use from multiple CPUs.

See the description ofsynchronize_rcu() for more detailed informationon memory ordering guarantees.

voidcall_rcu_tasks_trace(structrcu_head*rhp,rcu_callback_tfunc)

Queue a callback trace task-based grace period

Parameters

structrcu_head*rhp

structure to be used for queueing the RCU updates.

rcu_callback_tfunc

actual callback function to be invoked after the grace period

Description

The callback function will be invoked some time after a trace rcu-tasksgrace period elapses, in other words after all currently executingtrace rcu-tasks read-side critical sections have completed. Theseread-side critical sections are delimited by calls torcu_read_lock_trace()andrcu_read_unlock_trace().

See the description ofcall_rcu() for more detailed information onmemory ordering guarantees.

voidsynchronize_rcu_tasks_trace(void)

wait for a trace rcu-tasks grace period

Parameters

void

no arguments

Description

Control will return to the caller some time after a trace rcu-tasksgrace period has elapsed, in other words after all currently executingtrace rcu-tasks read-side critical sections have elapsed. These read-sidecritical sections are delimited by calls torcu_read_lock_trace()andrcu_read_unlock_trace().

This is a very specialized primitive, intended only for a few uses intracing and other situations requiring manipulation of function preamblesand profiling hooks. Thesynchronize_rcu_tasks_trace() function is not(yet) intended for heavy use from multiple CPUs.

See the description ofsynchronize_rcu() for more detailed informationon memory ordering guarantees.

voidrcu_barrier_tasks_trace(void)

Wait for in-flightcall_rcu_tasks_trace() callbacks.

Parameters

void

no arguments

Description

Although the current implementation is guaranteed to wait, it is notobligated to, for example, if there are no pending callbacks.

voidrcu_cpu_stall_reset(void)

restart stall-warning timeout for current grace period

Parameters

void

no arguments

Description

To perform the reset request from the caller, disable stall detection until3 fqs loops have passed. This is required to ensure a fresh jiffies isloaded. It should be safe to do from the fqs loop as enough timerinterrupts and context switches should have passed.

The caller must disable hard irqs.

intrcu_stall_chain_notifier_register(structnotifier_block*n)

Add an RCU CPU stall notifier

Parameters

structnotifier_block*n

Entry to add.

Description

Adds an RCU CPU stall notifier to an atomic notifier chain.Theaction passed to a notifier will beRCU_STALL_NOTIFY_NORM orfriends. Thedata will be the duration of the stalled grace period,in jiffies, coerced to a void* pointer.

Returns 0 on success,-EEXIST on error.

intrcu_stall_chain_notifier_unregister(structnotifier_block*n)

Remove an RCU CPU stall notifier

Parameters

structnotifier_block*n

Entry to add.

Description

Removes an RCU CPU stall notifier from an atomic notifier chain.

Returns zero on success,-ENOENT on failure.

voidrcu_read_lock_trace(void)

mark beginning of RCU-trace read-side critical section

Parameters

void

no arguments

Description

Whensynchronize_rcu_tasks_trace() is invoked by one task, then thattask is guaranteed to block until all other tasks exit their read-sidecritical sections. Similarly, if call_rcu_trace() is invoked on onetask while other tasks are within RCU read-side critical sections,invocation of the corresponding RCU callback is deferred until afterthe all the other tasks exit their critical sections.

For more details, please see the documentation forrcu_read_lock().

voidrcu_read_unlock_trace(void)

mark end of RCU-trace read-side critical section

Parameters

void

no arguments

Description

Pairs with a preceding call torcu_read_lock_trace(), and nesting isallowed. Invoking arcu_read_unlock_trace() when there is no matchingrcu_read_lock_trace() is verboten, and will result in lockdep complaints.

For more details, please see the documentation forrcu_read_unlock().

synchronize_rcu_mult

synchronize_rcu_mult(...)

Wait concurrently for multiple grace periods

Parameters

...

List ofcall_rcu() functions for different grace periods to wait on

Description

This macro waits concurrently for multiple types of RCU grace periods.For example, synchronize_rcu_mult(call_rcu, call_rcu_tasks) would waiton concurrent RCU and RCU-tasks grace periods. Waiting on a given SRCUdomain requires you to write a wrapper function for that SRCU domain’scall_srcu() function, with this wrapper supplying the pointer to thecorresponding srcu_struct.

Note thatcall_rcu_hurry() should be used instead ofcall_rcu()because in kernels built with CONFIG_RCU_LAZY=y the delay between theinvocation ofcall_rcu() and that of the corresponding RCU callbackcan be multiple seconds.

The first argument tells Tiny RCU’s _wait_rcu_gp() not tobother waiting for RCU. The reason for this is because anywheresynchronize_rcu_mult() can be called is automatically already a fullgrace period.

voidrcuref_init(rcuref_t*ref,unsignedintcnt)

Initialize a rcuref reference count with the given reference count

Parameters

rcuref_t*ref

Pointer to the reference count

unsignedintcnt

The initial reference count typically ‘1’

unsignedintrcuref_read(rcuref_t*ref)

Read the number of held reference counts of a rcuref

Parameters

rcuref_t*ref

Pointer to the reference count

Return

The number of held references (0 ... N). The value 0 does notindicate that it is safe to schedule the object, protected by this referencecounter, for deconstruction.If you want to know if the reference counter has been marked DEAD (assignaled byrcuref_put()) please use rcuread_is_dead().

boolrcuref_is_dead(rcuref_t*ref)

Check if the rcuref has been already marked dead

Parameters

rcuref_t*ref

Pointer to the reference count

Return

True if the object has been marked DEAD. This signals that a previousinvocation ofrcuref_put() returned true on this reference counter meaningthe protected object can safely be scheduled for deconstruction.Otherwise, returns false.

boolrcuref_get(rcuref_t*ref)

Acquire one reference on a rcuref reference count

Parameters

rcuref_t*ref

Pointer to the reference count

Description

Similar toatomic_inc_not_zero() but saturates at RCUREF_MAXREF.

Provides no memory ordering, it is assumed the caller has guaranteed theobject memory to be stable (RCU, etc.). It does provide a control dependencyand thereby orders future stores. See documentation in lib/rcuref.c

Return

False if the attempt to acquire a reference failed. This happenswhen the last reference has been put already

True if a reference was successfully acquired

boolrcuref_put_rcusafe(rcuref_t*ref)
  • Release one reference for a rcuref reference count RCU safe

Parameters

rcuref_t*ref

Pointer to the reference count

Description

Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such that free()must come after.

Can be invoked from contexts, which guarantee that no grace period canhappen which would free the object concurrently if the decrement dropsthe last reference and the slowpath races against a concurrent get() andput() pair.rcu_read_lock()’ed and atomic contexts qualify.

Return

True if this was the last reference with no future referencespossible. This signals the caller that it can safely release theobject which is protected by the reference counter.

False if there are still active references or the put() racedwith a concurrent get()/put() pair. Caller is not allowed torelease the protected object.

boolrcuref_put(rcuref_t*ref)
  • Release one reference for a rcuref reference count

Parameters

rcuref_t*ref

Pointer to the reference count

Description

Can be invoked from any context.

Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such that free()must come after.

Return

True if this was the last reference with no future referencespossible. This signals the caller that it can safely schedule theobject, which is protected by the reference counter, fordeconstruction.

False if there are still active references or the put() racedwith a concurrent get()/put() pair. Caller is not allowed todeconstruct the protected object.

boolsame_state_synchronize_rcu_full(structrcu_gp_oldstate*rgosp1,structrcu_gp_oldstate*rgosp2)

Are two old-state values identical?

Parameters

structrcu_gp_oldstate*rgosp1

First old-state value.

structrcu_gp_oldstate*rgosp2

Second old-state value.

Description

The two old-state values must have been obtained from eitherget_state_synchronize_rcu_full(),start_poll_synchronize_rcu_full(),orget_completed_synchronize_rcu_full(). Returnstrue if the twovalues are identical andfalse otherwise. This allows structureswhose lifetimes are tracked by old-state values to push these valuesto a list header, allowing those structures to be slightly smaller.

Note that equality is judged on a bitwise basis, so that anrcu_gp_oldstate structure with an already-completed state in one fieldwill compare not-equal to a structure with an already-completed statein the other field. After all, thercu_gp_oldstate structure is opaqueso how did such a situation come to pass in the first place?