Driver Basics¶

Driver Entry and Exit points¶

module_init(x)¶: driver initialization entry point

Parameters

x: function to be run at kernel boot time or module insertion

Description

module_init() will either be called during do_initcalls() (ifbuiltin) or at module insertion time (if a module). There can onlybe one per module.

module_exit(x)¶: driver exit entry point

Parameters

x: function to be run when driver is removed

Description

module_exit() will wrap the driver clean-up codewith cleanup_module() when used with rmmod whenthe driver is a module. If the driver is staticallycompiled into the kernel,module_exit() has no effect.There can only be one per module.

Driver device table¶

structpci_device_id¶: PCI device ID structure

Definition

struct pci_device_id {  __u32 vendor, device;  __u32 subvendor, subdevice;  __u32 class, class_mask;  kernel_ulong_t driver_data;};

Members

vendor: Vendor ID to match (or PCI_ANY_ID)
device: Device ID to match (or PCI_ANY_ID)
subvendor: Subsystem vendor ID to match (or PCI_ANY_ID)
subdevice: Subsystem device ID to match (or PCI_ANY_ID)
class: Device class, subclass, and “interface” to match.See Appendix D of the PCI Local Bus Spec orinclude/linux/pci_ids.h for a full list of classes.Most drivers do not need to specify class/class_maskas vendor/device is normally sufficient.
class_mask: Limit which sub-fields of the class field are compared.See drivers/scsi/sym53c8xx_2/ for example of usage.
driver_data: Data private to the driver.Most drivers don’t need to use driver_data field.Best practice is to use driver_data as an indexinto a static list of equivalent device types,instead of using it as a pointer.

structusb_device_id¶: identifies USB devices for probing and hotplugging

Definition

struct usb_device_id {  __u16 match_flags;  __u16 idVendor;  __u16 idProduct;  __u16 bcdDevice_lo;  __u16 bcdDevice_hi;  __u8 bDeviceClass;  __u8 bDeviceSubClass;  __u8 bDeviceProtocol;  __u8 bInterfaceClass;  __u8 bInterfaceSubClass;  __u8 bInterfaceProtocol;  __u8 bInterfaceNumber;  kernel_ulong_t driver_info ;};

Members

match_flags: Bit mask controlling which of the other fields are used tomatch against new devices. Any field except for driver_info may beused, although some only make sense in conjunction with other fields.This is usually set by a USB_DEVICE_*() macro, which sets allother fields in this structure except for driver_info.
idVendor: USB vendor ID for a device; numbers are assignedby the USB forum to its members.
idProduct: Vendor-assigned product ID.
bcdDevice_lo: Low end of range of vendor-assigned product version numbers.This is also used to identify individual product versions, fora range consisting of a single device.
bcdDevice_hi: High end of version number range. The range of productversions is inclusive.
bDeviceClass: Class of device; numbers are assignedby the USB forum. Products may choose to implement classes,or be vendor-specific. Device classes specify behavior of allthe interfaces on a device.
bDeviceSubClass: Subclass of device; associated with bDeviceClass.
bDeviceProtocol: Protocol of device; associated with bDeviceClass.
bInterfaceClass: Class of interface; numbers are assignedby the USB forum. Products may choose to implement classes,or be vendor-specific. Interface classes specify behavior onlyof a given interface; other interfaces may support other classes.
bInterfaceSubClass: Subclass of interface; associated with bInterfaceClass.
bInterfaceProtocol: Protocol of interface; associated with bInterfaceClass.
bInterfaceNumber: Number of interface; composite devices may usefixed interface numbers to differentiate between vendor-specificinterfaces.
driver_info: Holds information used by the driver. Usually it holdsa pointer to a descriptor understood by the driver, or perhapsdevice flags.

Description

In most cases, drivers will create a table of device IDs by usingUSB_DEVICE(), or similar macros designed for that purpose.They will then export it to userspace using MODULE_DEVICE_TABLE(),and provide it to the USB core through their usb_driver structure.

See theusb_match_id() function for information about how matches areperformed. Briefly, you will normally use one of several macros to helpconstruct these entries. Each entry you provide will either identifyone or more specific products, or will identify a class of productswhich have agreed to behave the same. You should put the more specificmatches towards the beginning of your table, so that driver_info canrecord quirks of specific products.

structmdio_device_id¶: identifies PHY devices on an MDIO/MII bus

Definition

struct mdio_device_id {  __u32 phy_id;  __u32 phy_id_mask;};

Members

phy_id: The result of(mdio_read(MII_PHYSID1) << 16 | mdio_read(MII_PHYSID2)) &phy_id_maskfor this PHY type
phy_id_mask: Defines the significant bits ofphy_id. A value of 0is used to terminate an array of struct mdio_device_id.

structamba_id¶: identifies a device on an AMBA bus

Definition

struct amba_id {  unsigned int            id;  unsigned int            mask;  void *data;};

Members

id: The significant bits if the hardware device ID
mask: Bitmask specifying which bits of the id field are significant whenmatching. A driver binds to a device when ((hardware device ID) & mask)== id.
data: Private data used by the driver.

structmips_cdmm_device_id¶: identifies devices in MIPS CDMM bus

Definition

struct mips_cdmm_device_id {  __u8 type;};

Members

type: Device type identifier.

structmei_cl_device_id¶: MEI client device identifier

Definition

struct mei_cl_device_id {  char name[MEI_CL_NAME_SIZE];  uuid_le uuid;  __u8 version;  kernel_ulong_t driver_info;};

Members

name: helper name
uuid: client uuid
version: client protocol version
driver_info: information used by the driver.

Description

identifies mei client device by uuid and name

structrio_device_id¶: RIO device identifier

Definition

struct rio_device_id {  __u16 did, vid;  __u16 asm_did, asm_vid;};

Members

did: RapidIO device ID
vid: RapidIO vendor ID
asm_did: RapidIO assembly device ID
asm_vid: RapidIO assembly vendor ID

Description

Identifies a RapidIO device based on both the device/vendor IDs andthe assembly device/vendor IDs.

structfsl_mc_device_id¶: MC object device identifier

Definition

struct fsl_mc_device_id {  __u16 vendor;  const char obj_type[16];};

Members

vendor: vendor ID
obj_type: MC object type

Description

Type of entries in the “device Id” table for MC object devices supported bya MC object device driver. The last entry of the table has vendor set to 0x0

structtb_service_id¶: Thunderbolt service identifiers

Definition

struct tb_service_id {  __u32 match_flags;  char protocol_key[8 + 1];  __u32 protocol_id;  __u32 protocol_version;  __u32 protocol_revision;  kernel_ulong_t driver_data;};

Members

match_flags: Flags used to match the structure
protocol_key: Protocol key the service supports
protocol_id: Protocol id the service supports
protocol_version: Version of the protocol
protocol_revision: Revision of the protocol software
driver_data: Driver specific data

Description

Thunderbolt XDomain services are exposed as devices where each devicecarries the protocol information the service supports. ThunderboltXDomain service drivers match against that information.

structtypec_device_id¶: USB Type-C alternate mode identifiers

Definition

struct typec_device_id {  __u16 svid;  __u8 mode;  kernel_ulong_t driver_data;};

Members

svid: Standard or Vendor ID
mode: Mode index
driver_data: Driver specific data

structtee_client_device_id¶: tee based device identifier

Definition

struct tee_client_device_id {  uuid_t uuid;};

Members

uuid: For TEE based client devices we use the device uuid asthe identifier.

structwmi_device_id¶: WMI device identifier

Definition

struct wmi_device_id {  const char guid_string[UUID_STRING_LEN+1];  const void *context;};

Members

guid_string: 36 char string of the form fa50ff2b-f2e8-45de-83fa-65417f2f49ba
context: pointer to driver specific data

structmhi_device_id¶: MHI device identification

Definition

struct mhi_device_id {  const char chan[MHI_NAME_SIZE];  kernel_ulong_t driver_data;};

Members

chan: MHI channel name
driver_data: driver data;

Delaying, scheduling, and timer routines¶

structprev_cputime¶: snapshot of system and user cputime

Definition

struct prev_cputime {#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE;  u64 utime;  u64 stime;  raw_spinlock_t lock;#endif;};

Members

utime: time spent in user mode
stime: time spent in system mode
lock: protects the above two fields

Description

Stores previous user/system time values such that we can guaranteemonotonicity.

structutil_est¶: Estimation utilization of FAIR tasks

Definition

struct util_est {  unsigned int                    enqueued;  unsigned int                    ewma;#define UTIL_EST_WEIGHT_SHIFT           2;};

Members

enqueued: instantaneous estimated utilization of a task/cpu
ewma: the Exponential Weighted Moving Average (EWMA)utilization of a task

Description

Support data structure to track an Exponential Weighted Moving Average(EWMA) of a FAIR task’s utilization. New samples are added to the movingaverage each time a task completes an activation. Sample’s weight is chosenso that the EWMA will be relatively insensitive to transient changes to thetask’s workload.

The enqueued attribute has a slightly different meaning for tasks and cpus:- task: the task’s util_avg at last task dequeue time- cfs_rq: the sum of util_est.enqueued for each RUNNABLE task on that CPUThus, the util_est.enqueued of a task represents the contribution on theestimated utilization of the CPU where that task is currently enqueued.

Only for tasks we track a moving average of the past instantaneousestimated utilization. This allows to absorb sporadic drops in utilizationof an otherwise almost periodic task.

intpid_alive(const struct task_struct * p)¶: check that a task structure is not stale

Parameters

conststructtask_struct*p: Task structure to be checked.

Description

Test if a process is not yet dead (at most zombie state)If pid_alive fails, then pointers within the task structurecan be stale and must not be dereferenced.

Return

1 if the process is alive. 0 otherwise.

intis_global_init(struct task_struct * tsk)¶: check if a task structure is init. Since init is free to have sub-threads we need to check tgid.

Parameters

structtask_struct*tsk: Task structure to be checked.

Description

Check if a task structure is the first user space task the kernel created.

Return

1 if the task structure is init. 0 otherwise.

inttask_nice(const struct task_struct * p)¶: return the nice value of a given task.

Parameters

conststructtask_struct*p: the task in question.

Return

The nice value [ -20 … 0 … 19 ].

boolis_idle_task(const struct task_struct * p)¶: is the specified task an idle task?

Parameters

conststructtask_struct*p: the task in question.

Return

1 ifp is an idle task. 0 otherwise.

intwake_up_process(struct task_struct * p)¶: Wake up a specific process

Parameters

structtask_struct*p: The process to be woken up.

Description

Attempt to wake up the nominated process and move it to the set of runnableprocesses.

This function executes a full memory barrier before accessing the task state.

Return

1 if the process was woken up, 0 if it was already running.

voidpreempt_notifier_register(struct preempt_notifier * notifier)¶: tell me when current is being preempted & rescheduled

Parameters

structpreempt_notifier*notifier: notifier struct to register

voidpreempt_notifier_unregister(struct preempt_notifier * notifier)¶: no longer interested in preemption notifications

Parameters

structpreempt_notifier*notifier: notifier struct to unregister

Description

This isnot safe to call from within a preemption notifier.

__visible void notracepreempt_schedule_notrace(void)¶: preempt_schedule called by tracing

Parameters

void: no arguments

Description

The tracing infrastructure uses preempt_enable_notrace to preventrecursion and tracing preempt enabling caused by the tracinginfrastructure itself. But as tracing can happen in areas comingfrom userspace or just about to enter userspace, a preempt enablecan occur before user_exit() is called. This will cause the schedulerto be called when the system is still in usermode.

To prevent this, the preempt_enable_notrace will use this functioninstead of preempt_schedule() to exit user context if needed beforecalling the scheduler.

intsched_setscheduler(struct task_struct * p, int policy, const struct sched_param * param)¶: change the scheduling policy and/or RT priority of a thread.

Parameters

structtask_struct*p: the task in question.
intpolicy: new policy.
conststructsched_param*param: structure containing the new RT priority.

Return

0 on success. An error code otherwise.

Description

NOTE that the task may be already dead.

intsched_setscheduler_nocheck(struct task_struct * p, int policy, const struct sched_param * param)¶: change the scheduling policy and/or RT priority of a thread from kernelspace.

Parameters

structtask_struct*p: the task in question.
intpolicy: new policy.
conststructsched_param*param: structure containing the new RT priority.

Description

Just like sched_setscheduler, only don’t bother checking if thecurrent context has permission. For example, this is needed instop_machine(): we create temporary high priority worker threads,but our caller might not have that capability.

Return

0 on success. An error code otherwise.

voidyield(void)¶: yield the current processor to other threads.

Parameters

void: no arguments

Description

Do not ever use this function, there’s a 99% chance you’re doing it wrong.

The scheduler is at all times free to pick the calling task as the mosteligible task to run, if removing theyield() call from your code breaksit, its already broken.

Typical broken usage is:

while (!event): yield();

where one assumes thatyield() will let ‘the other’ process run that willmake event true. If the current task is a SCHED_FIFO task that will neverhappen. Never useyield() as a progress guarantee!!

If you want to useyield() to wait for something, usewait_event().If you want to useyield() to be ‘nice’ for others, use cond_resched().If you still want to useyield(), do not!

intyield_to(struct task_struct * p, bool preempt)¶: yield the current processor to another thread in your thread group, or accelerate that thread toward the processor it’s on.

Parameters

structtask_struct*p: target task
boolpreempt: whether task preemption is allowed or not

Description

It’s the caller’s job to ensure that the target task structcan’t go away on us before we can do any checks.

Return

true (>0) if we indeed boosted the target task.false (0) if we failed to boost the target.-ESRCH if there’s no task to yield to.

intcpupri_find_fitness(struct cpupri * cp, struct task_struct * p, struct cpumask * lowest_mask, bool (*fitness_fn)(struct task_struct *p, int cpu))¶: find the best (lowest-pri) CPU in the system

Parameters

structcpupri*cp: The cpupri context
structtask_struct*p: The task
structcpumask*lowest_mask: A mask to fill in with selected CPUs (or NULL)
bool(*)(structtask_struct*p,intcpu)fitness_fn: A pointer to a function to do custom checks whether the CPUfits a specific criteria so that we only return those CPUs.

Note

This function returns the recommended CPUs as calculated during thecurrent invocation. By the time the call returns, the CPUs may have infact changed priorities any number of times. While not ideal, it is notan issue of correctness since the normal rebalancer logic will correctany discrepancies created by racing against the uncertainty of the currentpriority configuration.

Return

(int)bool - CPUs were found

voidcpupri_set(struct cpupri * cp, int cpu, int newpri)¶: update the CPU priority setting

Parameters

structcpupri*cp: The cpupri context
intcpu: The target CPU
intnewpri: The priority (INVALID-RT99) to assign to this CPU

Note

Assumes cpu_rq(cpu)->lock is locked

Return

(void)

intcpupri_init(struct cpupri * cp)¶: initialize the cpupri structure

Parameters

structcpupri*cp: The cpupri context

Return

-ENOMEM on memory allocation failure.

voidcpupri_cleanup(struct cpupri * cp)¶: clean up the cpupri structure

Parameters

structcpupri*cp: The cpupri context

voidupdate_tg_load_avg(struct cfs_rq * cfs_rq, int force)¶: update the tg’s load avg

Parameters

structcfs_rq*cfs_rq: the cfs_rq whose avg changed
intforce: update regardless of how small the difference

Description

This function ‘ensures’: tg->load_avg := Sum tg->cfs_rq[]->avg.load.However, because tg->load_avg is a global value there are performanceconsiderations.

In order to avoid having to look at the other cfs_rq’s, we use adifferential update where we store the last value we propagated. This inturn allows skipping updates if the differential is ‘small’.

Updating tg’s load_avg is necessary before update_cfs_share().

intupdate_cfs_rq_load_avg(u64 now, struct cfs_rq * cfs_rq)¶: update the cfs_rq’s load/util averages

Parameters

u64now: current time, as per cfs_rq_clock_pelt()
structcfs_rq*cfs_rq: cfs_rq to update

Description

The cfs_rq avg is the direct sum of all its entities (blocked and runnable)avg. The immediate corollary is that all (fair) tasks must be attached, seepost_init_entity_util_avg().

cfs_rq->avg is used for task_h_load() and update_cfs_share() for example.

Returns true if the load decayed or we removed load.

Since both these conditions indicate a changed cfs_rq->avg.load we shouldcallupdate_tg_load_avg() when this function returns true.

voidattach_entity_load_avg(struct cfs_rq * cfs_rq, struct sched_entity * se)¶: attach this entity to its cfs_rq load avg

Parameters

structcfs_rq*cfs_rq: cfs_rq to attach to
structsched_entity*se: sched_entity to attach

Description

Must callupdate_cfs_rq_load_avg() before this, since we rely oncfs_rq->avg.last_update_time being current.

voiddetach_entity_load_avg(struct cfs_rq * cfs_rq, struct sched_entity * se)¶: detach this entity from its cfs_rq load avg

Parameters

structcfs_rq*cfs_rq: cfs_rq to detach from
structsched_entity*se: sched_entity to detach

Description

Must callupdate_cfs_rq_load_avg() before this, since we rely oncfs_rq->avg.last_update_time being current.

unsigned longcpu_util(int cpu)¶

Parameters

intcpu: the CPU to get the utilization of

Description

The unit of the return value must be the one of capacity so we can comparethe utilization with the capacity of the CPU that is available for CFS task(ie cpu_capacity).

cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus therecent utilization of currently non-runnable tasks on a CPU. It representsthe amount of utilization of a CPU in the range [0..capacity_orig] wherecapacity_orig is the cpu_capacity available at the highest frequency(arch_scale_freq_capacity()).The utilization of a CPU converges towards a sum equal to or less than thecurrent capacity (capacity_curr <= capacity_orig) of the CPU because it isthe running time on this CPU scaled by capacity_curr.

The estimated utilization of a CPU is defined to be the maximum between itscfs_rq.avg.util_avg and the sum of the estimated utilization of the taskscurrently RUNNABLE on that CPU.This allows to properly represent the expected utilization of a CPU whichhas just got a big task running since a long sleep period. At the same timehowever it preserves the benefits of the “blocked utilization” indescribing the potential for other tasks waking up on the same CPU.

Nevertheless, cfs_rq.avg.util_avg can be higher than capacity_curr or evenhigher than capacity_orig because of unfortunate rounding incfs.avg.util_avg or just after migrating tasks and new task wakeups untilthe average stabilizes with the new running time. We need to check that theutilization stays within the range of [0..capacity_orig] and cap it ifnecessary. Without utilization capping, a group could be seen as overloaded(CPU0 utilization at 121% + CPU1 utilization at 80%) whereas CPU1 has 20% ofavailable capacity. We allow utilization to overshoot capacity_curr (but notcapacity_orig) as it useful for predicting the capacity required after taskmigrations (scheduler-driven DVFS).

Return

the (estimated) utilization for the specified CPU

voidupdate_sg_lb_stats(struct lb_env * env, struct sched_group * group, struct sg_lb_stats * sgs, int * sg_status)¶: Update sched_group’s statistics for load balancing.

Parameters

structlb_env*env: The load balancing environment.
structsched_group*group: sched_group whose statistics are to be updated.
structsg_lb_stats*sgs: variable to hold the statistics for this group.
int*sg_status: Holds flag indicating the status of the sched_group

boolupdate_sd_pick_busiest(struct lb_env * env, struct sd_lb_stats * sds, struct sched_group * sg, struct sg_lb_stats * sgs)¶: return 1 on busiest group

Parameters

structlb_env*env: The load balancing environment.
structsd_lb_stats*sds: sched_domain statistics
structsched_group*sg: sched_group candidate to be checked for being the busiest
structsg_lb_stats*sgs: sched_group statistics

Description

Determine ifsg is a busier group than the previously selectedbusiest group.

Return

true ifsg is a busier group than the previously selectedbusiest group.false otherwise.

intidle_cpu_without(int cpu, struct task_struct * p)¶: would a given CPU be idle without p ?

Parameters

intcpu: the processor on which idleness is tested.
structtask_struct*p: task which should be ignored.

Return

1 if the CPU would be idle. 0 otherwise.

voidupdate_sd_lb_stats(struct lb_env * env, struct sd_lb_stats * sds)¶: Update sched_domain’s statistics for load balancing.

Parameters

structlb_env*env: The load balancing environment.
structsd_lb_stats*sds: variable to hold the statistics for this sched_domain.

voidcalculate_imbalance(struct lb_env * env, struct sd_lb_stats * sds)¶: Calculate the amount of imbalance present within the groups of a given sched_domain during load balance.

Parameters

structlb_env*env: load balance environment
structsd_lb_stats*sds: statistics of the sched_domain whose imbalance is to be calculated.

struct sched_group *find_busiest_group(struct lb_env * env)¶: Returns the busiest group within the sched_domain if there is an imbalance.

Parameters

structlb_env*env: The load balancing environment.

Description

Also calculates the amount of runnable load which should be movedto restore balance.

Return

The busiest group if imbalance exists.

DECLARE_COMPLETION(work)¶: declare and initialize a completion structure

Parameters

work: identifier for the completion structure

Description

This macro declares and initializes a completion structure. Generally usedfor static declarations. You should use the _ONSTACK variant for automaticvariables.

DECLARE_COMPLETION_ONSTACK(work)¶: declare and initialize a completion structure

Parameters

work: identifier for the completion structure

Description

This macro declares and initializes a completion structure on the kernelstack.

void__init_completion(struct completion * x)¶: Initialize a dynamically allocated completion

Parameters

structcompletion*x: pointer to completion structure that is to be initialized

Description

This inline function will initialize a dynamically created completionstructure.

voidreinit_completion(struct completion * x)¶: reinitialize a completion structure

Parameters

structcompletion*x: pointer to completion structure that is to be reinitialized

Description

This inline function should be used to reinitialize a completion structure so it canbe reused. This is especially important after complete_all() is used.

unsigned long__round_jiffies(unsigned long j, int cpu)¶: function to round jiffies to a full second

Parameters

unsignedlongj: the time in (absolute) jiffies that should be rounded
intcpu: the processor number on which the timeout will happen

Description

__round_jiffies() rounds an absolute time in the future (in jiffies)up or down to (approximately) full seconds. This is useful for timersfor which the exact time they fire does not matter too much, as long asthey fire approximately every X seconds.

By rounding these timers to whole seconds, all such timers will fireat the same time, rather than at various times spread out. The goalof this is to have the CPU wake up less, which saves power.

The exact rounding is skewed for each processor to avoid allprocessors firing at the exact same time, which could leadto lock contention or spurious cache line bouncing.

The return value is the rounded version of thej parameter.

unsigned long__round_jiffies_relative(unsigned long j, int cpu)¶: function to round jiffies to a full second

Parameters

unsignedlongj: the time in (relative) jiffies that should be rounded
intcpu: the processor number on which the timeout will happen

Description

__round_jiffies_relative() rounds a time delta in the future (in jiffies)up or down to (approximately) full seconds. This is useful for timersfor which the exact time they fire does not matter too much, as long asthey fire approximately every X seconds.

By rounding these timers to whole seconds, all such timers will fireat the same time, rather than at various times spread out. The goalof this is to have the CPU wake up less, which saves power.

The exact rounding is skewed for each processor to avoid allprocessors firing at the exact same time, which could leadto lock contention or spurious cache line bouncing.

The return value is the rounded version of thej parameter.

unsigned longround_jiffies(unsigned long j)¶: function to round jiffies to a full second

Parameters

unsignedlongj: the time in (absolute) jiffies that should be rounded

Description

round_jiffies() rounds an absolute time in the future (in jiffies)up or down to (approximately) full seconds. This is useful for timersfor which the exact time they fire does not matter too much, as long asthey fire approximately every X seconds.

By rounding these timers to whole seconds, all such timers will fireat the same time, rather than at various times spread out. The goalof this is to have the CPU wake up less, which saves power.

The return value is the rounded version of thej parameter.

unsigned longround_jiffies_relative(unsigned long j)¶: function to round jiffies to a full second

Parameters

unsignedlongj: the time in (relative) jiffies that should be rounded

Description

round_jiffies_relative() rounds a time delta in the future (in jiffies)up or down to (approximately) full seconds. This is useful for timersfor which the exact time they fire does not matter too much, as long asthey fire approximately every X seconds.

By rounding these timers to whole seconds, all such timers will fireat the same time, rather than at various times spread out. The goalof this is to have the CPU wake up less, which saves power.

The return value is the rounded version of thej parameter.

unsigned long__round_jiffies_up(unsigned long j, int cpu)¶: function to round jiffies up to a full second

Parameters

unsignedlongj: the time in (absolute) jiffies that should be rounded
intcpu: the processor number on which the timeout will happen

Description

This is the same as__round_jiffies() except that it will neverround down. This is useful for timeouts for which the exact timeof firing does not matter too much, as long as they don’t fire tooearly.

unsigned long__round_jiffies_up_relative(unsigned long j, int cpu)¶: function to round jiffies up to a full second

Parameters

unsignedlongj: the time in (relative) jiffies that should be rounded
intcpu: the processor number on which the timeout will happen

Description

This is the same as__round_jiffies_relative() except that it will neverround down. This is useful for timeouts for which the exact timeof firing does not matter too much, as long as they don’t fire tooearly.

unsigned longround_jiffies_up(unsigned long j)¶: function to round jiffies up to a full second

Parameters

unsignedlongj: the time in (absolute) jiffies that should be rounded

Description

This is the same asround_jiffies() except that it will neverround down. This is useful for timeouts for which the exact timeof firing does not matter too much, as long as they don’t fire tooearly.

unsigned longround_jiffies_up_relative(unsigned long j)¶: function to round jiffies up to a full second

Parameters

unsignedlongj: the time in (relative) jiffies that should be rounded

Description

This is the same asround_jiffies_relative() except that it will neverround down. This is useful for timeouts for which the exact timeof firing does not matter too much, as long as they don’t fire tooearly.

voidinit_timer_key(struct timer_list * timer, void (*func)(struct timer_list *), unsigned int flags, const char * name, struct lock_class_key * key)¶: initialize a timer

Parameters

structtimer_list*timer: the timer to be initialized
void(*)(structtimer_list*)func: timer callback function
unsignedintflags: timer flags
constchar*name: name of the timer
structlock_class_key*key: lockdep class key of the fake lock used for tracking timersync lock dependencies

Description

init_timer_key() must be done to a timer prior callingany of theother timer functions.

intmod_timer_pending(struct timer_list * timer, unsigned long expires)¶: modify a pending timer’s timeout

Parameters

structtimer_list*timer: the pending timer to be modified
unsignedlongexpires: new timeout in jiffies

Description

mod_timer_pending() is the same for pending timers asmod_timer(),but will not re-activate and modify already deleted timers.

It is useful for unserialized use of timers.

intmod_timer(struct timer_list * timer, unsigned long expires)¶: modify a timer’s timeout

Parameters

structtimer_list*timer: the timer to be modified
unsignedlongexpires: new timeout in jiffies

Description

mod_timer() is a more efficient way to update the expire field of anactive timer (if the timer is inactive it will be activated)

mod_timer(timer, expires) is equivalent to:

del_timer(timer); timer->expires = expires; add_timer(timer);

Note that if there are multiple unserialized concurrent users of thesame timer, thenmod_timer() is the only safe way to modify the timeout,sinceadd_timer() cannot modify an already running timer.

The function returns whether it has modified a pending timer or not.(ie.mod_timer() of an inactive timer returns 0,mod_timer() of anactive timer returns 1.)

inttimer_reduce(struct timer_list * timer, unsigned long expires)¶: Modify a timer’s timeout if it would reduce the timeout

Parameters

structtimer_list*timer: The timer to be modified
unsignedlongexpires: New timeout in jiffies

Description

timer_reduce() is very similar tomod_timer(), except that it will onlymodify a running timer if that would reduce the expiration time (it willstart a timer that isn’t running).

voidadd_timer(struct timer_list * timer)¶: start a timer

Parameters

structtimer_list*timer: the timer to be added

Description

The kernel will do a ->function(timer) callback from thetimer interrupt at the ->expires point in the future. Thecurrent time is ‘jiffies’.

The timer’s ->expires, ->function fields must be set prior calling thisfunction.

Timers with an ->expires field in the past will be executed in the nexttimer tick.

voidadd_timer_on(struct timer_list * timer, int cpu)¶: start a timer on a particular CPU

Parameters

structtimer_list*timer: the timer to be added
intcpu: the CPU to start it on

Description

This is not very scalable on SMP. Double adds are not possible.

intdel_timer(struct timer_list * timer)¶: deactivate a timer.

Parameters

structtimer_list*timer: the timer to be deactivated

Description

del_timer() deactivates a timer - this works on both active and inactivetimers.

The function returns whether it has deactivated a pending timer or not.(ie.del_timer() of an inactive timer returns 0,del_timer() of anactive timer returns 1.)

inttry_to_del_timer_sync(struct timer_list * timer)¶: Try to deactivate a timer

Parameters

structtimer_list*timer: timer to delete

Description

This function tries to deactivate a timer. Upon successful (ret >= 0)exit the timer is not queued and the handler is not running on any CPU.

intdel_timer_sync(struct timer_list * timer)¶: deactivate a timer and wait for the handler to finish.

Parameters

structtimer_list*timer: the timer to be deactivated

Description

This function only differs fromdel_timer() on SMP: besides deactivatingthe timer it also makes sure the handler has finished executing on otherCPUs.

Synchronization rules: Callers must prevent restarting of the timer,otherwise this function is meaningless. It must not be called frominterrupt contexts unless the timer is an irqsafe one. The caller mustnot hold locks which would prevent completion of the timer’shandler. The timer’s handler must not calladd_timer_on(). Upon exit thetimer is not queued and the handler is not running on any CPU.

Nowdel_timer_sync() will never return and never release somelock.The interrupt on the other CPU is waiting to grab somelock butit has interrupted the softirq that CPU0 is waiting to finish.

The function returns whether it has deactivated a pending timer or not.

Note

For !irqsafe timers, you must not hold locks that are held in

interrupt context while calling this function. Even if the lock hasnothing to do with the timer in question. Here’s why:

CPU0                             CPU1----                             ----                                 <SOFTIRQ>                                   call_timer_fn();                                   base->running_timer = mytimer;spin_lock_irq(somelock);                                 <IRQ>                                    spin_lock(somelock);del_timer_sync(mytimer);while (base->running_timer == mytimer);

signed longschedule_timeout(signed long timeout)¶: sleep until timeout

Parameters

signedlongtimeout: timeout value in jiffies

Description

Make the current task sleep untiltimeout jiffies have elapsed.The function behavior depends on the current task state(see also set_current_state() description):

TASK_RUNNING - the scheduler is called, but the task does not sleepat all. That happens because sched_submit_work() does nothing fortasks inTASK_RUNNING state.

TASK_UNINTERRUPTIBLE - at leasttimeout jiffies are guaranteed topass before the routine returns unless the current task is explicitlywoken up, (e.g. bywake_up_process()).

TASK_INTERRUPTIBLE - the routine may return early if a signal isdelivered to the current task or the current task is explicitly wokenup.

The current task state is guaranteed to beTASK_RUNNING when thisroutine returns.

Specifying atimeout value ofMAX_SCHEDULE_TIMEOUT will schedulethe CPU away without a bound on the timeout. In this case the returnvalue will beMAX_SCHEDULE_TIMEOUT.

Returns 0 when the timer has expired otherwise the remaining time injiffies will be returned. In all cases the return value is guaranteedto be non-negative.

voidmsleep(unsigned int msecs)¶: sleep safely even with waitqueue interruptions

Parameters

unsignedintmsecs: Time in milliseconds to sleep for

unsigned longmsleep_interruptible(unsigned int msecs)¶: sleep waiting for signals

Parameters

unsignedintmsecs: Time in milliseconds to sleep for

voidusleep_range(unsigned long min, unsigned long max)¶: Sleep for an approximate time

Parameters

unsignedlongmin: Minimum time in usecs to sleep
unsignedlongmax: Maximum time in usecs to sleep

Description

In non-atomic context where the exact wakeup time is flexible, useusleep_range() instead of udelay(). The sleep improves responsivenessby avoiding the CPU-hogging busy-wait of udelay(), and the range reducespower usage by allowing hrtimers to take advantage of an already-scheduled interrupt instead of scheduling a new one just for this sleep.

Wait queues and Wake events¶

intwaitqueue_active(struct wait_queue_head * wq_head)¶

locklessly test for waiters on the queue

Parameters

structwait_queue_head*wq_head: the waitqueue to test for waiters

Description

returns true if the wait list is not empty

Use either while holding wait_queue_head::lock or when used for wakeupswith an extra smp_mb() like:

CPU0 - waker                    CPU1 - waiter                                for (;;) {@cond = true;                     prepare_to_wait(&wq_head, &wait, state);smp_mb();                         // smp_mb() from set_current_state()if (waitqueue_active(wq_head))         if (@cond)  wake_up(wq_head);                      break;                                  schedule();                                }                                finish_wait(&wq_head, &wait);

Because without the explicit smp_mb() it’s possible for thewaitqueue_active() load to get hoisted over thecond store such that we’llobserve an empty wait list while the waiter might not observecond.

Also note that this ‘optimization’ trades a spin_lock() for an smp_mb(),which (when the lock is uncontended) are of roughly equal cost.

NOTE

this function is lockless and requires care, incorrect usage _will_lead to sporadic and non-obvious failure.

boolwq_has_single_sleeper(struct wait_queue_head * wq_head)¶: check if there is only one sleeper

Parameters

structwait_queue_head*wq_head: wait queue head

Description

Returns true of wq_head has only one sleeper on the list.

Please refer to the comment for waitqueue_active.

boolwq_has_sleeper(struct wait_queue_head * wq_head)¶: check if there are any waiting processes

Parameters

structwait_queue_head*wq_head: wait queue head

Description

Returns true if wq_head has waiting processes

Please refer to the comment for waitqueue_active.

wait_event(wq_head,condition)¶: sleep until a condition gets true

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_UNINTERRUPTIBLE) until thecondition evaluates to true. Thecondition is checked each timethe waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

wait_event_freezable(wq_head,condition)¶: sleep (or freeze) until a condition gets true

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_INTERRUPTIBLE – so as not to contributeto system load) until thecondition evaluates to true. Thecondition is checked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

wait_event_timeout(wq_head,condition,timeout)¶: sleep until a condition gets true or a timeout elapses

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
timeout: timeout, in jiffies

Description

The process is put to sleep (TASK_UNINTERRUPTIBLE) until thecondition evaluates to true. Thecondition is checked each timethe waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

Return

0 if thecondition evaluated tofalse after thetimeout elapsed,1 if thecondition evaluated totrue after thetimeout elapsed,or the remaining jiffies (at least 1) if thecondition evaluatedtotrue before thetimeout elapsed.

wait_event_cmd(wq_head,condition,cmd1,cmd2)¶: sleep until a condition gets true

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
cmd1: the command will be executed before sleep
cmd2: the command will be executed after sleep

Description

The process is put to sleep (TASK_UNINTERRUPTIBLE) until thecondition evaluates to true. Thecondition is checked each timethe waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

wait_event_interruptible(wq_head,condition)¶: sleep until a condition gets true

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or a signal is received.Thecondition is checked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

The function will return -ERESTARTSYS if it was interrupted by asignal and 0 ifcondition evaluated to true.

wait_event_interruptible_timeout(wq_head,condition,timeout)¶: sleep until a condition gets true or a timeout elapses

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
timeout: timeout, in jiffies

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or a signal is received.Thecondition is checked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

Return

0 if thecondition evaluated tofalse after thetimeout elapsed,1 if thecondition evaluated totrue after thetimeout elapsed,the remaining jiffies (at least 1) if thecondition evaluatedtotrue before thetimeout elapsed, or -ERESTARTSYS if it wasinterrupted by a signal.

wait_event_hrtimeout(wq_head,condition,timeout)¶: sleep until a condition gets true or a timeout elapses

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
timeout: timeout, as a ktime_t

Description

The process is put to sleep (TASK_UNINTERRUPTIBLE) until thecondition evaluates to true or a signal is received.Thecondition is checked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

The function returns 0 ifcondition became true, or -ETIME if the timeoutelapsed.

wait_event_interruptible_hrtimeout(wq,condition,timeout)¶: sleep until a condition gets true or a timeout elapses

Parameters

wq: the waitqueue to wait on
condition: a C expression for the event to wait for
timeout: timeout, as a ktime_t

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or a signal is received.Thecondition is checked each time the waitqueuewq is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

The function returns 0 ifcondition became true, -ERESTARTSYS if it wasinterrupted by a signal, or -ETIME if the timeout elapsed.

wait_event_idle(wq_head,condition)¶: wait for a condition without contributing to system load

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_IDLE) until thecondition evaluates to true.Thecondition is checked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

wait_event_idle_exclusive(wq_head,condition)¶: wait for a condition with contributing to system load

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_IDLE) until thecondition evaluates to true.Thecondition is checked each time the waitqueuewq_head is woken up.

The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flagset thus if other processes wait on the same list, when thisprocess is woken further processes are not considered.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

wait_event_idle_timeout(wq_head,condition,timeout)¶: sleep without load until a condition becomes true or a timeout elapses

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
timeout: timeout, in jiffies

Description

The process is put to sleep (TASK_IDLE) until thecondition evaluates to true. Thecondition is checked each timethe waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

Return

wait_event_idle_exclusive_timeout(wq_head,condition,timeout)¶: sleep without load until a condition becomes true or a timeout elapses

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
timeout: timeout, in jiffies

Description

The process is put to sleep (TASK_IDLE) until thecondition evaluates to true. Thecondition is checked each timethe waitqueuewq_head is woken up.

The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flagset thus if other processes wait on the same list, when thisprocess is woken further processes are not considered.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

Return

wait_event_interruptible_locked(wq,condition)¶: sleep until a condition gets true

Parameters

wq: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or a signal is received.Thecondition is checked each time the waitqueuewq is woken up.

It must be called with wq.lock being held. This spinlock isunlocked while sleeping butcondition testing is done while lockis held and when this macro exits the lock is held.

The lock is locked/unlocked using spin_lock()/spin_unlock()functions which must match the way they are locked/unlocked outsideof this macro.

wake_up_locked() has to be called after changing any variable that couldchange the result of the wait condition.

The function will return -ERESTARTSYS if it was interrupted by asignal and 0 ifcondition evaluated to true.

wait_event_interruptible_locked_irq(wq,condition)¶: sleep until a condition gets true

Parameters

wq: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or a signal is received.Thecondition is checked each time the waitqueuewq is woken up.

It must be called with wq.lock being held. This spinlock isunlocked while sleeping butcondition testing is done while lockis held and when this macro exits the lock is held.

The lock is locked/unlocked using spin_lock_irq()/spin_unlock_irq()functions which must match the way they are locked/unlocked outsideof this macro.

wake_up_locked() has to be called after changing any variable that couldchange the result of the wait condition.

The function will return -ERESTARTSYS if it was interrupted by asignal and 0 ifcondition evaluated to true.

wait_event_interruptible_exclusive_locked(wq,condition)¶: sleep exclusively until a condition gets true

Parameters

wq: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or a signal is received.Thecondition is checked each time the waitqueuewq is woken up.

It must be called with wq.lock being held. This spinlock isunlocked while sleeping butcondition testing is done while lockis held and when this macro exits the lock is held.

The lock is locked/unlocked using spin_lock()/spin_unlock()functions which must match the way they are locked/unlocked outsideof this macro.

The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flagset thus when other process waits process on the list if thisprocess is awaken further processes are not considered.

wake_up_locked() has to be called after changing any variable that couldchange the result of the wait condition.

The function will return -ERESTARTSYS if it was interrupted by asignal and 0 ifcondition evaluated to true.

wait_event_interruptible_exclusive_locked_irq(wq,condition)¶: sleep until a condition gets true

Parameters

wq: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or a signal is received.Thecondition is checked each time the waitqueuewq is woken up.

It must be called with wq.lock being held. This spinlock isunlocked while sleeping butcondition testing is done while lockis held and when this macro exits the lock is held.

The lock is locked/unlocked using spin_lock_irq()/spin_unlock_irq()functions which must match the way they are locked/unlocked outsideof this macro.

The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flagset thus when other process waits process on the list if thisprocess is awaken further processes are not considered.

wake_up_locked() has to be called after changing any variable that couldchange the result of the wait condition.

The function will return -ERESTARTSYS if it was interrupted by asignal and 0 ifcondition evaluated to true.

wait_event_killable(wq_head,condition)¶: sleep until a condition gets true

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for

Description

The process is put to sleep (TASK_KILLABLE) until thecondition evaluates to true or a signal is received.Thecondition is checked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

The function will return -ERESTARTSYS if it was interrupted by asignal and 0 ifcondition evaluated to true.

wait_event_killable_timeout(wq_head,condition,timeout)¶: sleep until a condition gets true or a timeout elapses

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
timeout: timeout, in jiffies

Description

The process is put to sleep (TASK_KILLABLE) until thecondition evaluates to true or a kill signal is received.Thecondition is checked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

Only kill signals interrupt this process.

Return

wait_event_lock_irq_cmd(wq_head,condition,lock,cmd)¶: sleep until a condition gets true. The condition is checked under the lock. This is expected to be called with the lock taken.

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
lock: a locked spinlock_t, which will be released before cmdand schedule() and reacquired afterwards.
cmd: a command which is invoked outside the critical section beforesleep

Description

The process is put to sleep (TASK_UNINTERRUPTIBLE) until thecondition evaluates to true. Thecondition is checked each timethe waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

This is supposed to be called while holding the lock. The lock isdropped before invoking the cmd and going to sleep and is reacquiredafterwards.

wait_event_lock_irq(wq_head,condition,lock)¶: sleep until a condition gets true. The condition is checked under the lock. This is expected to be called with the lock taken.

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
lock: a locked spinlock_t, which will be released before schedule()and reacquired afterwards.

Description

The process is put to sleep (TASK_UNINTERRUPTIBLE) until thecondition evaluates to true. Thecondition is checked each timethe waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

This is supposed to be called while holding the lock. The lock isdropped before going to sleep and is reacquired afterwards.

wait_event_interruptible_lock_irq_cmd(wq_head,condition,lock,cmd)¶: sleep until a condition gets true. The condition is checked under the lock. This is expected to be called with the lock taken.

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
lock: a locked spinlock_t, which will be released before cmd andschedule() and reacquired afterwards.
cmd: a command which is invoked outside the critical section beforesleep

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or a signal is received. Thecondition ischecked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

This is supposed to be called while holding the lock. The lock isdropped before invoking the cmd and going to sleep and is reacquiredafterwards.

The macro will return -ERESTARTSYS if it was interrupted by a signaland 0 ifcondition evaluated to true.

wait_event_interruptible_lock_irq(wq_head,condition,lock)¶: sleep until a condition gets true. The condition is checked under the lock. This is expected to be called with the lock taken.

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
lock: a locked spinlock_t, which will be released before schedule()and reacquired afterwards.

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or signal is received. Thecondition ischecked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

This is supposed to be called while holding the lock. The lock isdropped before going to sleep and is reacquired afterwards.

The macro will return -ERESTARTSYS if it was interrupted by a signaland 0 ifcondition evaluated to true.

wait_event_interruptible_lock_irq_timeout(wq_head,condition,lock,timeout)¶: sleep until a condition gets true or a timeout elapses. The condition is checked under the lock. This is expected to be called with the lock taken.

Parameters

wq_head: the waitqueue to wait on
condition: a C expression for the event to wait for
lock: a locked spinlock_t, which will be released before schedule()and reacquired afterwards.
timeout: timeout, in jiffies

Description

The process is put to sleep (TASK_INTERRUPTIBLE) until thecondition evaluates to true or signal is received. Thecondition ischecked each time the waitqueuewq_head is woken up.

wake_up() has to be called after changing any variable that couldchange the result of the wait condition.

This is supposed to be called while holding the lock. The lock isdropped before going to sleep and is reacquired afterwards.

The function returns 0 if thetimeout elapsed, -ERESTARTSYS if itwas interrupted by a signal, and the remaining jiffies otherwiseif the condition evaluated to true before the timeout elapsed.

void__wake_up(struct wait_queue_head * wq_head, unsigned int mode, int nr_exclusive, void * key)¶: wake up threads blocked on a waitqueue.

Parameters

structwait_queue_head*wq_head: the waitqueue
unsignedintmode: which threads
intnr_exclusive: how many wake-one or wake-many threads to wake up
void*key: is directly passed to the wakeup function

Description

If this function wakes up a task, it executes a full memory barrier beforeaccessing the task state.

void__wake_up_sync_key(struct wait_queue_head * wq_head, unsigned int mode, void * key)¶: wake up threads blocked on a waitqueue.

Parameters

structwait_queue_head*wq_head: the waitqueue
unsignedintmode: which threads
void*key: opaque value to be passed to wakeup targets

Description

The sync wakeup differs that the waker knows that it will scheduleaway soon, so while the target thread will be woken up, it will notbe migrated to another CPU - ie. the two threads are ‘synchronized’with each other. This can prevent needless bouncing between CPUs.

On UP it can prevent extra preemption.

If this function wakes up a task, it executes a full memory barrier beforeaccessing the task state.

void__wake_up_locked_sync_key(struct wait_queue_head * wq_head, unsigned int mode, void * key)¶: wake up a thread blocked on a locked waitqueue.

Parameters

structwait_queue_head*wq_head: the waitqueue
unsignedintmode: which threads
void*key: opaque value to be passed to wakeup targets

Description

The sync wakeup differs in that the waker knows that it will scheduleaway soon, so while the target thread will be woken up, it will notbe migrated to another CPU - ie. the two threads are ‘synchronized’with each other. This can prevent needless bouncing between CPUs.

On UP it can prevent extra preemption.

If this function wakes up a task, it executes a full memory barrier beforeaccessing the task state.

voidfinish_wait(struct wait_queue_head * wq_head, struct wait_queue_entry * wq_entry)¶: clean up after waiting in a queue

Parameters

structwait_queue_head*wq_head: waitqueue waited on
structwait_queue_entry*wq_entry: wait descriptor

Description

Sets current thread back to running state and removesthe wait descriptor from the given waitqueue if stillqueued.

High-resolution timers¶

ktime_tktime_set(const s64 secs, const unsigned long nsecs)¶: Set a ktime_t variable from a seconds/nanoseconds value

Parameters

consts64secs: seconds to set
constunsignedlongnsecs: nanoseconds to set

Return

The ktime_t representation of the value.

intktime_compare(const ktime_t cmp1, const ktime_t cmp2)¶: Compares two ktime_t variables for less, greater or equal

Parameters

constktime_tcmp1: comparable1
constktime_tcmp2: comparable2

Return

…: cmp1 < cmp2: return <0cmp1 == cmp2: return 0cmp1 > cmp2: return >0

boolktime_after(const ktime_t cmp1, const ktime_t cmp2)¶: Compare if a ktime_t value is bigger than another one.

Parameters

constktime_tcmp1: comparable1
constktime_tcmp2: comparable2

Return

true if cmp1 happened after cmp2.

boolktime_before(const ktime_t cmp1, const ktime_t cmp2)¶: Compare if a ktime_t value is smaller than another one.

Parameters

constktime_tcmp1: comparable1
constktime_tcmp2: comparable2

Return

true if cmp1 happened before cmp2.

boolktime_to_timespec64_cond(const ktime_t kt, struct timespec64 * ts)¶: convert a ktime_t variable to timespec64 format only if the variable contains data

Parameters

constktime_tkt: the ktime_t variable to convert
structtimespec64*ts: the timespec variable to store the result in

Return

true if there was a successful conversion,false if kt was 0.

structhrtimer¶: the basic hrtimer structure

Definition

struct hrtimer {  struct timerqueue_node          node;  ktime_t _softexpires;  enum hrtimer_restart            (*function)(struct hrtimer *);  struct hrtimer_clock_base       *base;  u8 state;  u8 is_rel;  u8 is_soft;  u8 is_hard;};

Members

node: timerqueue node, which also manages node.expires,the absolute expiry time in the hrtimers internalrepresentation. The time is related to the clock onwhich the timer is based. Is setup by addingslack to the _softexpires value. For non range timersidentical to _softexpires.
_softexpires: the absolute earliest expiry time of the hrtimer.The time which was given as expiry time when the timerwas armed.
function: timer expiry callback function
base: pointer to the timer base (per cpu and per clock)
state: state information (See bit values above)
is_rel: Set if the timer was armed relative
is_soft: Set if hrtimer will be expired in soft interrupt context.
is_hard: Set if hrtimer will be expired in hard interrupt contexteven on RT.

Description

The hrtimer structure must be initialized byhrtimer_init()

structhrtimer_sleeper¶: simple sleeper structure

Definition

struct hrtimer_sleeper {  struct hrtimer timer;  struct task_struct *task;};

Members

timer: embedded timer structure
task: task to wake up

Description

task is set to NULL, when the timer expires.

structhrtimer_clock_base¶: the timer base for a specific clock

Definition

struct hrtimer_clock_base {  struct hrtimer_cpu_base *cpu_base;  unsigned int            index;  clockid_t clockid;  seqcount_t seq;  struct hrtimer          *running;  struct timerqueue_head  active;  ktime_t (*get_time)(void);  ktime_t offset;};

Members

cpu_base: per cpu clock base
index: clock type index for per_cpu support when moving atimer to a base on another cpu.
clockid: clock id for per_cpu support
seq: seqcount around __run_hrtimer
running: pointer to the currently running hrtimer
active: red black tree root node for the active timers
get_time: function to retrieve the current time of the clock
offset: offset of this clock to the monotonic base

structhrtimer_cpu_base¶: the per cpu clock bases

Definition

struct hrtimer_cpu_base {  raw_spinlock_t lock;  unsigned int                    cpu;  unsigned int                    active_bases;  unsigned int                    clock_was_set_seq;  unsigned int                    hres_active             : 1,in_hrtirq               : 1,hang_detected           : 1, softirq_activated       : 1;#ifdef CONFIG_HIGH_RES_TIMERS;  unsigned int                    nr_events;  unsigned short                  nr_retries;  unsigned short                  nr_hangs;  unsigned int                    max_hang_time;#endif;#ifdef CONFIG_PREEMPT_RT;  spinlock_t softirq_expiry_lock;  atomic_t timer_waiters;#endif;  ktime_t expires_next;  struct hrtimer                  *next_timer;  ktime_t softirq_expires_next;  struct hrtimer                  *softirq_next_timer;  struct hrtimer_clock_base       clock_base[HRTIMER_MAX_CLOCK_BASES];};

Members

lock: lock protecting the base and associated clock basesand timers
cpu: cpu number
active_bases: Bitfield to mark bases with active timers
clock_was_set_seq: Sequence counter of clock was set events
hres_active: State of high resolution mode
in_hrtirq: hrtimer_interrupt() is currently executing
hang_detected: The last hrtimer interrupt detected a hang
softirq_activated: displays, if the softirq is raised - update of softirqrelated settings is not required then.
nr_events: Total number of hrtimer interrupt events
nr_retries: Total number of hrtimer interrupt retries
nr_hangs: Total number of hrtimer interrupt hangs
max_hang_time: Maximum time spent in hrtimer_interrupt
softirq_expiry_lock: Lock which is taken while softirq based hrtimer areexpired
timer_waiters: Ahrtimer_cancel() invocation waits for the timercallback to finish.
expires_next: absolute time of the next event, is required for remotehrtimer enqueue; it is the total first expiry time (hardand soft hrtimer are taken into account)
next_timer: Pointer to the first expiring timer
softirq_expires_next: Time to check, if soft queues needs also to be expired
softirq_next_timer: Pointer to the first expiring softirq based timer
clock_base: array of clock bases for this cpu

Note

next_timer is just an optimization for __remove_hrtimer().: Do not dereference the pointer because it is not reliable oncross cpu removals.

voidhrtimer_start(structhrtimer * timer, ktime_t tim, const enum hrtimer_mode mode)¶: (re)start an hrtimer

Parameters

structhrtimer*timer: the timer to be added
ktime_ttim: expiry time
constenumhrtimer_modemode: timer mode: absolute (HRTIMER_MODE_ABS) orrelative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED);softirq based mode is considered for debug purpose only!

boolhrtimer_is_queued(structhrtimer * timer)¶

Parameters

structhrtimer*timer: Timer to check

Return

True if the timer is queued, false otherwise

Description

The function can be used lockless, but it gives only a current snapshot.

u64hrtimer_forward_now(structhrtimer * timer, ktime_t interval)¶: forward the timer expiry so it expires after now

Parameters

structhrtimer*timer: hrtimer to forward
ktime_tinterval: the interval to forward

Description

Forward the timer expiry so it will expire after the current timeof the hrtimer clock base. Returns the number of overruns.

Can be safely called from the callback function oftimer. Ifcalled from other contextstimer must neither be enqueued norrunning the callback and the caller needs to take care ofserialization.

Note

This only updates the timer expiry value and does not requeuethe timer.

u64hrtimer_forward(structhrtimer * timer, ktime_t now, ktime_t interval)¶: forward the timer expiry

Parameters

structhrtimer*timer: hrtimer to forward
ktime_tnow: forward past this time
ktime_tinterval: the interval to forward

Description

Forward the timer expiry so it will expire in the future.Returns the number of overruns.

Can be safely called from the callback function oftimer. Ifcalled from other contextstimer must neither be enqueued norrunning the callback and the caller needs to take care ofserialization.

Note

This only updates the timer expiry value and does not requeuethe timer.

voidhrtimer_start_range_ns(structhrtimer * timer, ktime_t tim, u64 delta_ns, const enum hrtimer_mode mode)¶: (re)start an hrtimer

Parameters

structhrtimer*timer: the timer to be added
ktime_ttim: expiry time
u64delta_ns: “slack” range for the timer
constenumhrtimer_modemode: timer mode: absolute (HRTIMER_MODE_ABS) orrelative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED);softirq based mode is considered for debug purpose only!

inthrtimer_try_to_cancel(structhrtimer * timer)¶: try to deactivate a timer

Parameters

structhrtimer*timer: hrtimer to stop

Return

0 when the timer was not active
1 when the timer was active
-1 when the timer is currently executing the callback function andcannot be stopped

inthrtimer_cancel(structhrtimer * timer)¶: cancel a timer and wait for the handler to finish.

Parameters

structhrtimer*timer: the timer to be cancelled

Return

0 when the timer was not active1 when the timer was active

ktime_t__hrtimer_get_remaining(const structhrtimer * timer, bool adjust)¶: get remaining time for the timer

Parameters

conststructhrtimer*timer: the timer to read
booladjust: adjust relative timers when CONFIG_TIME_LOW_RES=y

voidhrtimer_init(structhrtimer * timer, clockid_t clock_id, enum hrtimer_mode mode)¶: initialize a timer to the given clock

Parameters

structhrtimer*timer

the timer to be initialized

clockid_tclock_id

the clock to be used

enumhrtimer_modemode

The modes which are relevant for intitialization:HRTIMER_MODE_ABS, HRTIMER_MODE_REL, HRTIMER_MODE_ABS_SOFT,HRTIMER_MODE_REL_SOFT

The PINNED variants of the above can be handed in,but the PINNED bit is ignored as pinning happenswhen the hrtimer is started

voidhrtimer_sleeper_start_expires(structhrtimer_sleeper * sl, enum hrtimer_mode mode)¶: Start a hrtimer sleeper timer

Parameters

structhrtimer_sleeper*sl: sleeper to be started
enumhrtimer_modemode: timer mode abs/rel

Description

Wrapper around hrtimer_start_expires() for hrtimer_sleeper based timersto allow PREEMPT_RT to tweak the delivery mode (soft/hardirq context)

voidhrtimer_init_sleeper(structhrtimer_sleeper * sl, clockid_t clock_id, enum hrtimer_mode mode)¶: initialize sleeper to the given clock

Parameters

structhrtimer_sleeper*sl: sleeper to be initialized
clockid_tclock_id: the clock to be used
enumhrtimer_modemode: timer mode abs/rel

intschedule_hrtimeout_range(ktime_t * expires, u64 delta, const enum hrtimer_mode mode)¶: sleep until timeout

Parameters

ktime_t*expires: timeout value (ktime_t)
u64delta: slack in expires timeout (ktime_t)
constenumhrtimer_modemode: timer mode

Description

Make the current task sleep until the given expiry time haselapsed. The routine will return immediately unlessthe current task state has been set (see set_current_state()).

Thedelta argument gives the kernel the freedom to schedule theactual wakeup to a time that is both power and performance friendly.The kernel give the normal best effort behavior for “expires**+**delta”,but may decide to fire the timer earlier, but no earlier thanexpires.

You can set the task state as follows -

TASK_UNINTERRUPTIBLE - at leasttimeout time is guaranteed topass before the routine returns unless the current task is explicitlywoken up, (e.g. bywake_up_process()).

TASK_INTERRUPTIBLE - the routine may return early if a signal isdelivered to the current task or the current task is explicitly wokenup.

The current task state is guaranteed to be TASK_RUNNING when thisroutine returns.

Returns 0 when the timer has expired. If the task was woken before thetimer expired by a signal (only possible in state TASK_INTERRUPTIBLE) orby an explicit wakeup, it returns -EINTR.

intschedule_hrtimeout(ktime_t * expires, const enum hrtimer_mode mode)¶: sleep until timeout

Parameters

ktime_t*expires: timeout value (ktime_t)
constenumhrtimer_modemode: timer mode

Description

Make the current task sleep until the given expiry time haselapsed. The routine will return immediately unlessthe current task state has been set (see set_current_state()).

You can set the task state as follows -

TASK_UNINTERRUPTIBLE - at leasttimeout time is guaranteed topass before the routine returns unless the current task is explicitlywoken up, (e.g. bywake_up_process()).

TASK_INTERRUPTIBLE - the routine may return early if a signal isdelivered to the current task or the current task is explicitly wokenup.

The current task state is guaranteed to be TASK_RUNNING when thisroutine returns.

Returns 0 when the timer has expired. If the task was woken before thetimer expired by a signal (only possible in state TASK_INTERRUPTIBLE) orby an explicit wakeup, it returns -EINTR.

Workqueues and Kevents¶

structworkqueue_attrs¶: A struct for workqueue attributes.

Definition

struct workqueue_attrs {  int nice;  cpumask_var_t cpumask;  bool no_numa;};

Members

nice

nice level

cpumask

allowed CPUs

no_numa

disable NUMA affinity

Unlike other fields,no_numa isn’t a property of a worker_pool. Itonly modifies howapply_workqueue_attrs() select pools and thusdoesn’t participate in pool hash calculations or equality comparisons.

Description

This can be used to change attributes of an unbound workqueue.

work_pending(work)¶: Find out whether a work item is currently pending

Parameters

work: The work item in question

delayed_work_pending(w)¶: Find out whether a delayable work item is currently pending

Parameters

w: The work item in question

struct workqueue_struct *alloc_workqueue(const char * fmt, unsigned int flags, int max_active, ...)¶: allocate a workqueue

Parameters

constchar*fmt: printf format for the name of the workqueue
unsignedintflags: WQ_* flags
intmax_active: max in-flight work items, 0 for defaultremaining args: args forfmt
...: variable arguments

Description

Allocate a workqueue with the specified parameters. For detailedinformation on WQ_* flags, please refer toDocumentation/core-api/workqueue.rst.

Return

Pointer to the allocated workqueue on success,NULL on failure.

alloc_ordered_workqueue(fmt,flags,args)¶: allocate an ordered workqueue

Parameters

fmt: printf format for the name of the workqueue
flags: WQ_* flags (only WQ_FREEZABLE and WQ_MEM_RECLAIM are meaningful)
args: args forfmt

Description

Allocate an ordered workqueue. An ordered workqueue executes atmost one work item at any given time in the queued order. They areimplemented as unbound workqueues withmax_active of one.

Return

Pointer to the allocated workqueue on success,NULL on failure.

boolqueue_work(struct workqueue_struct * wq, struct work_struct * work)¶: queue work on a workqueue

Parameters

structworkqueue_struct*wq: workqueue to use
structwork_struct*work: work to queue

Description

Returnsfalse ifwork was already on a queue,true otherwise.

We queue the work to the CPU on which it was submitted, but if the CPU diesit can be processed by another CPU.

Memory-ordering properties: If it returnstrue, guarantees that all storespreceding the call toqueue_work() in the program order will be visible fromthe CPU which will executework by the time such work executes, e.g.,

{ x is initially 0 }

CPU0 CPU1
WRITE_ONCE(x, 1); [work is being executed ]r0 = queue_work(wq, work); r1 = READ_ONCE(x);

Forbids: r0 == true && r1 == 0

boolqueue_delayed_work(struct workqueue_struct * wq, struct delayed_work * dwork, unsigned long delay)¶: queue work on a workqueue after delay

Parameters

structworkqueue_struct*wq: workqueue to use
structdelayed_work*dwork: delayable work to queue
unsignedlongdelay: number of jiffies to wait before queueing

Description

Equivalent toqueue_delayed_work_on() but tries to use the local CPU.

boolmod_delayed_work(struct workqueue_struct * wq, struct delayed_work * dwork, unsigned long delay)¶: modify delay of or queue a delayed work

Parameters

structworkqueue_struct*wq: workqueue to use
structdelayed_work*dwork: work to queue
unsignedlongdelay: number of jiffies to wait before queueing

Description

mod_delayed_work_on() on local CPU.

boolschedule_work_on(int cpu, struct work_struct * work)¶: put work task on a specific cpu

Parameters

intcpu: cpu to put the work task on
structwork_struct*work: job to be done

Description

This puts a job on a specific cpu

boolschedule_work(struct work_struct * work)¶: put work task in global workqueue

Parameters

structwork_struct*work: job to be done

Description

Returnsfalse ifwork was already on the kernel-global workqueue andtrue otherwise.

This puts a job in the kernel-global workqueue if it was not alreadyqueued and leaves it in the same position on the kernel-globalworkqueue otherwise.

Shares the same memory-ordering properties ofqueue_work(), cf. theDocBook header ofqueue_work().

voidflush_scheduled_work(void)¶: ensure that any scheduled work has run to completion.

Parameters

void: no arguments

Description

Forces execution of the kernel-global workqueue and blocks until itscompletion.

Think twice before calling this function! It’s very easy to get intotrouble if you don’t take great care. Either of the following situationswill lead to deadlock:

One of the work items currently on the workqueue needs to acquirea lock held by your code or its caller.
Your code is running in the context of a work routine.

They will be detected by lockdep when they occur, but the first might notoccur very often. It depends on what work items are on the workqueue andwhat locks they need, which you have no control over.

In most situations flushing the entire workqueue is overkill; you merelyneed to know that a particular work item isn’t queued and isn’t running.In such cases you should usecancel_delayed_work_sync() orcancel_work_sync() instead.

boolschedule_delayed_work_on(int cpu, struct delayed_work * dwork, unsigned long delay)¶: queue work in global workqueue on CPU after delay

Parameters

intcpu: cpu to use
structdelayed_work*dwork: job to be done
unsignedlongdelay: number of jiffies to wait

Description

After waiting for a given time this puts a job in the kernel-globalworkqueue on the specified CPU.

boolschedule_delayed_work(struct delayed_work * dwork, unsigned long delay)¶: put work task in global workqueue after delay

Parameters

structdelayed_work*dwork: job to be done
unsignedlongdelay: number of jiffies to wait or 0 for immediate execution

Description

After waiting for a given time this puts a job in the kernel-globalworkqueue.

boolqueue_work_on(int cpu, struct workqueue_struct * wq, struct work_struct * work)¶: queue work on specific cpu

Parameters

intcpu: CPU number to execute work on
structworkqueue_struct*wq: workqueue to use
structwork_struct*work: work to queue

Description

We queue the work to a specific CPU, the caller must ensure itcan’t go away.

Return

false ifwork was already on a queue,true otherwise.

boolqueue_work_node(int node, struct workqueue_struct * wq, struct work_struct * work)¶: queue work on a “random” cpu for a given NUMA node

Parameters

intnode: NUMA node that we are targeting the work for
structworkqueue_struct*wq: workqueue to use
structwork_struct*work: work to queue

Description

We queue the work to a “random” CPU within a given NUMA node. The basicidea here is to provide a way to somehow associate work with a givenNUMA node.

This function will only make a best effort attempt at getting this ontothe right NUMA node. If no node is requested or the requested node isoffline then we just fall back to standard queue_work behavior.

Currently the “random” CPU ends up being the first available CPU in theintersection of cpu_online_mask and the cpumask of the node, unless weare running on the node. In that case we just use the current CPU.

Return

false ifwork was already on a queue,true otherwise.

boolqueue_delayed_work_on(int cpu, struct workqueue_struct * wq, struct delayed_work * dwork, unsigned long delay)¶: queue work on specific CPU after delay

Parameters

intcpu: CPU number to execute work on
structworkqueue_struct*wq: workqueue to use
structdelayed_work*dwork: work to queue
unsignedlongdelay: number of jiffies to wait before queueing

Return

false ifwork was already on a queue,true otherwise. Ifdelay is zero anddwork is idle, it will be scheduled for immediateexecution.

boolmod_delayed_work_on(int cpu, struct workqueue_struct * wq, struct delayed_work * dwork, unsigned long delay)¶: modify delay of or queue a delayed work on specific CPU

Parameters

intcpu: CPU number to execute work on
structworkqueue_struct*wq: workqueue to use
structdelayed_work*dwork: work to queue
unsignedlongdelay: number of jiffies to wait before queueing

Description

Ifdwork is idle, equivalent toqueue_delayed_work_on(); otherwise,modifydwork’s timer so that it expires afterdelay. Ifdelay iszero,work is guaranteed to be scheduled immediately regardless of itscurrent state.

This function is safe to call from any context including IRQ handler.See try_to_grab_pending() for details.

Return

false ifdwork was idle and queued,true ifdwork waspending and its timer was modified.

boolqueue_rcu_work(struct workqueue_struct * wq, struct rcu_work * rwork)¶: queue work after a RCU grace period

Parameters

structworkqueue_struct*wq: workqueue to use
structrcu_work*rwork: work to queue

Return

false ifrwork was already pending,true otherwise. Notethat a full RCU grace period is guaranteed only after atrue return.Whilerwork is guaranteed to be executed after afalse return, theexecution may happen before a full RCU grace period has passed.

voidflush_workqueue(struct workqueue_struct * wq)¶: ensure that any scheduled work has run to completion.

Parameters

structworkqueue_struct*wq: workqueue to flush

Description

This function sleeps until all work items which were queued on entryhave finished execution, but it is not livelocked by new incoming ones.

voiddrain_workqueue(struct workqueue_struct * wq)¶: drain a workqueue

Parameters

structworkqueue_struct*wq: workqueue to drain

Description

Wait until the workqueue becomes empty. While draining is in progress,only chain queueing is allowed. IOW, only currently pending or runningwork items onwq can queue further work items on it.wq is flushedrepeatedly until it becomes empty. The number of flushing is determinedby the depth of chaining and should be relatively short. Whine if ittakes too long.

boolflush_work(struct work_struct * work)¶: wait for a work to finish executing the last queueing instance

Parameters

structwork_struct*work: the work to flush

Description

Wait untilwork has finished execution.work is guaranteed to be idleon return if it hasn’t been requeued since flush started.

Return

true ifflush_work() waited for the work to finish execution,false if it was already idle.

boolcancel_work_sync(struct work_struct * work)¶: cancel a work and wait for it to finish

Parameters

structwork_struct*work: the work to cancel

Description

Cancelwork and wait for its execution to finish. This functioncan be used even if the work re-queues itself or migrates toanother workqueue. On return from this function,work isguaranteed to be not pending or executing on any CPU.

cancel_work_sync(delayed_work->work) must not be used fordelayed_work’s. Usecancel_delayed_work_sync() instead.

The caller must ensure that the workqueue on whichwork was lastqueued can’t be destroyed before this function returns.

Return

true ifwork was pending,false otherwise.

boolflush_delayed_work(struct delayed_work * dwork)¶: wait for a dwork to finish executing the last queueing

Parameters

structdelayed_work*dwork: the delayed work to flush

Description

Delayed timer is cancelled and the pending work is queued forimmediate execution. Likeflush_work(), this function onlyconsiders the last queueing instance ofdwork.

Return

true ifflush_work() waited for the work to finish execution,false if it was already idle.

boolflush_rcu_work(struct rcu_work * rwork)¶: wait for a rwork to finish executing the last queueing

Parameters

structrcu_work*rwork: the rcu work to flush

Return

true ifflush_rcu_work() waited for the work to finish execution,false if it was already idle.

boolcancel_delayed_work(struct delayed_work * dwork)¶: cancel a delayed work

Parameters

structdelayed_work*dwork: delayed_work to cancel

Description

Kill off a pending delayed_work.

This function is safe to call from any context including IRQ handler.

Return

true ifdwork was pending and canceled;false if it wasn’tpending.

Note

The work callback function may still be running on return, unlessit returnstrue and the work doesn’t re-arm itself. Explicitly flush orusecancel_delayed_work_sync() to wait on it.

boolcancel_delayed_work_sync(struct delayed_work * dwork)¶: cancel a delayed work and wait for it to finish

Parameters

structdelayed_work*dwork: the delayed work cancel

Description

This iscancel_work_sync() for delayed works.

Return

true ifdwork was pending,false otherwise.

intexecute_in_process_context(work_func_t fn, struct execute_work * ew)¶: reliably execute the routine with user context

Parameters

work_func_tfn: the function to execute
structexecute_work*ew: guaranteed storage for the execute work structure (mustbe available when the work executes)

Description

Executes the function immediately if process context is available,otherwise schedules the function for delayed execution.

Return

0 - function was executed: 1 - function was scheduled for execution

voiddestroy_workqueue(struct workqueue_struct * wq)¶: safely terminate a workqueue

Parameters

structworkqueue_struct*wq: target workqueue

Description

Safely destroy a workqueue. All work currently pending will be done first.

voidworkqueue_set_max_active(struct workqueue_struct * wq, int max_active)¶: adjust max_active of a workqueue

Parameters

structworkqueue_struct*wq: target workqueue
intmax_active: new max_active value.

Description

Set max_active ofwq tomax_active.

Context

Don’t call from IRQ context.

struct work_struct *current_work(void)¶: retrievecurrent task’s work struct

Parameters

void: no arguments

Description

Determine ifcurrent task is a workqueue worker and what it’s working on.Useful to find out the context that thecurrent task is running in.

Return

work struct ifcurrent task is a workqueue worker,NULL otherwise.

boolworkqueue_congested(int cpu, struct workqueue_struct * wq)¶: test whether a workqueue is congested

Parameters

intcpu: CPU in question
structworkqueue_struct*wq: target workqueue

Description

Test whetherwq’s cpu workqueue forcpu is congested. There isno synchronization around this function and the test result isunreliable and only useful as advisory hints or for debugging.

Ifcpu is WORK_CPU_UNBOUND, the test is performed on the local CPU.Note that both per-cpu and unbound workqueues may be associated withmultiple pool_workqueues which have separate congested states. Aworkqueue being congested on one CPU doesn’t mean the workqueue is alsocontested on other CPUs / NUMA nodes.

Return

true if congested,false otherwise.

unsigned intwork_busy(struct work_struct * work)¶: test whether a work is currently pending or running

Parameters

structwork_struct*work: the work to be tested

Description

Test whetherwork is currently pending or running. There is nosynchronization around this function and the test result isunreliable and only useful as advisory hints or for debugging.

Return

OR’d bitmask of WORK_BUSY_* bits.

voidset_worker_desc(const char * fmt, ...)¶: set description for the current work item

Parameters

constchar*fmt: printf-style format string
...: arguments for the format string

Description

This function can be called by a running work function to describe whatthe work item is about. If the worker task gets dumped, thisinformation will be printed out together to help debugging. Thedescription can be at most WORKER_DESC_LEN including the trailing ‘0’.

longwork_on_cpu(int cpu, long (*fn)(void *), void * arg)¶: run a function in thread context on a particular cpu

Parameters

intcpu: the cpu to run on
long(*)(void*)fn: the function to run
void*arg: the function arg

Description

It is up to the caller to ensure that the cpu doesn’t go offline.The caller must not hold any locks which would preventfn from completing.

Return

The valuefn returns.

longwork_on_cpu_safe(int cpu, long (*fn)(void *), void * arg)¶: run a function in thread context on a particular cpu

Parameters

intcpu: the cpu to run on
long(*)(void*)fn: the function to run
void*arg: the function argument

Description

Disables CPU hotplug and callswork_on_cpu(). The caller must not holdany locks which would preventfn from completing.

Return

The valuefn returns.

Internal Functions¶

intwait_task_stopped(struct wait_opts * wo, int ptrace, struct task_struct * p)¶: Wait forTASK_STOPPED orTASK_TRACED

Parameters

structwait_opts*wo: wait options
intptrace: is the wait for ptrace
structtask_struct*p: task to wait for

Description

Handle sys_wait4() work forp in stateTASK_STOPPED orTASK_TRACED.

Context

read_lock(tasklist_lock), which is released if return value isnon-zero. Also, grabs and releasesp->sighand->siglock.

Return

0 if wait condition didn’t exist and search for other wait conditionsshould continue. Non-zero return, -errno on failure andp’s pid onsuccess, implies that tasklist_lock is released and wait conditionsearch should terminate.

booltask_set_jobctl_pending(struct task_struct * task, unsigned long mask)¶: set jobctl pending bits

Parameters

structtask_struct*task: target task
unsignedlongmask: pending bits to set

Description

Clearmask fromtask->jobctl.mask must be subset ofJOBCTL_PENDING_MASK |JOBCTL_STOP_CONSUME |JOBCTL_STOP_SIGMASK |JOBCTL_TRAPPING. If stop signo is being set, the existing signo iscleared. Iftask is already being killed or exiting, this functionbecomes noop.

Context

Must be called withtask->sighand->siglock held.

Return

true ifmask is set,false if made noop becausetask was dying.

voidtask_clear_jobctl_trapping(struct task_struct * task)¶: clear jobctl trapping bit

Parameters

structtask_struct*task: target task

Description

If JOBCTL_TRAPPING is set, a ptracer is waiting for us to enter TRACED.Clear it and wake up the ptracer. Note that we don’t need any furtherlocking.task->siglock guarantees thattask->parent points to theptracer.

Context

Must be called withtask->sighand->siglock held.

voidtask_clear_jobctl_pending(struct task_struct * task, unsigned long mask)¶: clear jobctl pending bits

Parameters

structtask_struct*task: target task
unsignedlongmask: pending bits to clear

Description

Clearmask fromtask->jobctl.mask must be subset ofJOBCTL_PENDING_MASK. IfJOBCTL_STOP_PENDING is being cleared, otherSTOP bits are cleared together.

If clearing ofmask leaves no stop or trap pending, this function callstask_clear_jobctl_trapping().

Context

Must be called withtask->sighand->siglock held.

booltask_participate_group_stop(struct task_struct * task)¶: participate in a group stop

Parameters

structtask_struct*task: task participating in a group stop

Description

task hasJOBCTL_STOP_PENDING set and is participating in a group stop.Group stop states are cleared and the group stop count is consumed ifJOBCTL_STOP_CONSUME was set. If the consumption completes the groupstop, the appropriateSIGNAL_* flags are set.

Context

Must be called withtask->sighand->siglock held.

Return

true if group stop completion should be notified to the parent,falseotherwise.

voidptrace_trap_notify(struct task_struct * t)¶: schedule trap to notify ptracer

Parameters

structtask_struct*t: tracee wanting to notify tracer

Description

This function schedules sticky ptrace trap which is cleared on the nextTRAP_STOP to notify ptracer of an event.t must have been seized byptracer.

Ift is running, STOP trap will be taken. If trapped for STOP andptracer is listening for events, tracee is woken up so that it canre-trap for the new event. If trapped otherwise, STOP trap will beeventually taken without returning to userland after the existing trapsare finished by PTRACE_CONT.

Context

Must be called withtask->sighand->siglock held.

voiddo_notify_parent_cldstop(struct task_struct * tsk, bool for_ptracer, int why)¶: notify parent of stopped/continued state change

Parameters

structtask_struct*tsk: task reporting the state change
boolfor_ptracer: the notification is for ptracer
intwhy: CLD_{CONTINUED|STOPPED|TRAPPED} to report

Description

Notifytsk’s parent that the stopped/continued state has changed. Iffor_ptracer isfalse,tsk’s group leader notifies to its real parent.Iftrue,tsk reports totsk->parent which should be the ptracer.

Context

Must be called with tasklist_lock at least read locked.

booldo_signal_stop(int signr)¶: handle group stop for SIGSTOP and other stop signals

Parameters

intsignr: signr causing group stop if initiating

Description

IfJOBCTL_STOP_PENDING is not set yet, initiate group stop withsignrand participate in it. If already set, participate in the existinggroup stop. If participated in a group stop (and thus slept),true isreturned with siglock released.

If ptraced, this function doesn’t handle stop itself. Instead,JOBCTL_TRAP_STOP is scheduled andfalse is returned with siglockuntouched. The caller must ensure that INTERRUPT trap handling takesplaces afterwards.

Context

Must be called withcurrent->sighand->siglock held, which is releasedontrue return.

Return

false if group stop is already cancelled or ptrace trap is scheduled.true if participated in group stop.

voiddo_jobctl_trap(void)¶: take care of ptrace jobctl traps

Parameters

void: no arguments

Description

When PT_SEIZED, it’s used for both group stop and explicitSEIZE/INTERRUPT traps. Both generate PTRACE_EVENT_STOP trap withaccompanying siginfo. If stopped, lower eight bits of exit_code containthe stop signal; otherwise,SIGTRAP.

When !PT_SEIZED, it’s used only for group stop trap with stop signalnumber as exit_code and no siginfo.

Context

Must be called withcurrent->sighand->siglock held, which may bereleased and re-acquired before returning with intervening sleep.

voiddo_freezer_trap(void)¶: handle the freezer jobctl trap

Parameters

void: no arguments

Description

Puts the task into frozen state, if only the task is not about to quit.In this case it drops JOBCTL_TRAP_FREEZE.

Context

Must be called withcurrent->sighand->siglock held,which is always released before returning.

voidsignal_delivered(struct ksignal * ksig, int stepping)¶

Parameters

structksignal*ksig: kernel signal struct
intstepping: nonzero if debugger single-step or block-step in use

Description

This function should be called when a signal has successfully beendelivered. It updates the blocked signals accordingly (ksig->ka.sa.sa_maskis always blocked, and the signal itself is blocked unlessSA_NODEFERis set inksig->ka.sa.sa_flags. Tracing is notified.

longsys_restart_syscall(void)¶: restart a system call

Parameters

void: no arguments

voidset_current_blocked(sigset_t * newset)¶: change current->blocked mask

Parameters

sigset_t*newset: new mask

Description

It is wrong to change ->blocked directly, this helper should be usedto ensure the process can’t miss a shared signal we are going to block.

longsys_rt_sigprocmask(int how, sigset_t __user * nset, sigset_t __user * oset, size_t sigsetsize)¶: change the list of currently blocked signals

Parameters

inthow: whether to add, remove, or set signals
sigset_t__user*nset: stores pending signals
sigset_t__user*oset: previous value of signal mask if non-null
size_tsigsetsize: size of sigset_t type

longsys_rt_sigpending(sigset_t __user * uset, size_t sigsetsize)¶: examine a pending signal that has been raised while blocked

Parameters

sigset_t__user*uset: stores pending signals
size_tsigsetsize: size of sigset_t type or larger

voidcopy_siginfo_to_external32(struct compat_siginfo * to, const struct kernel_siginfo * from)¶: copy a kernel siginfo into a compat user siginfo

Parameters

structcompat_siginfo*to: compat siginfo destination
conststructkernel_siginfo*from: kernel siginfo source

Note

This function does not work properly for the SIGCHLD on x32, butfortunately it doesn’t have to. The only valid callers for this function arecopy_siginfo_to_user32, which is overriden for x32 and the coredump code.The latter does not care because SIGCHLD will never cause a coredump.

intdo_sigtimedwait(const sigset_t * which, kernel_siginfo_t * info, const struct timespec64 * ts)¶: wait for queued signals specified inwhich

Parameters

constsigset_t*which: queued signals to wait for
kernel_siginfo_t*info: if non-null, the signal’s siginfo is returned here
conststructtimespec64*ts: upper bound on process time suspension

longsys_rt_sigtimedwait(const sigset_t __user * uthese, siginfo_t __user * uinfo, const struct __kernel_timespec __user * uts, size_t sigsetsize)¶: synchronously wait for queued signals specified inuthese

Parameters

constsigset_t__user*uthese: queued signals to wait for
siginfo_t__user*uinfo: if non-null, the signal’s siginfo is returned here
conststruct__kernel_timespec__user*uts: upper bound on process time suspension
size_tsigsetsize: size of sigset_t type

longsys_kill(pid_t pid, int sig)¶: send a signal to a process

Parameters

pid_tpid: the PID of the process
intsig: signal to be sent

longsys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user * info, unsigned int flags)¶: Signal a process through a pidfd

Parameters

intpidfd: file descriptor of the process
intsig: signal to send
siginfo_t__user*info: signal info
unsignedintflags: future flags

Description

The syscall currently only signals via PIDTYPE_PID which coverskill(<positive-pid>, <signal>. It does not signal threads or processgroups.In order to extend the syscall to threads and process groups theflagsargument should be used. In essence, theflags argument will determinewhat is signaled and not the file descriptor itself. Put in other words,grouping is a property of the flags argument not a property of the filedescriptor.

Return

0 on success, negative errno on failure

longsys_tgkill(pid_t tgid, pid_t pid, int sig)¶: send signal to one specific thread

Parameters

pid_ttgid

the thread group ID of the thread

pid_tpid

the PID of the thread

intsig

signal to be sent

This syscall also checks thetgid and returns -ESRCH even if the PIDexists but it’s not belonging to the target process anymore. Thismethod solves the problem of threads exiting and PIDs getting reused.

longsys_tkill(pid_t pid, int sig)¶: send signal to one specific task

Parameters

pid_tpid

the PID of the task

intsig

signal to be sent

Send a signal to only one task, even if it’s a CLONE_THREAD task.

longsys_rt_sigqueueinfo(pid_t pid, int sig, siginfo_t __user * uinfo)¶: send signal information to a signal

Parameters

pid_tpid: the PID of the thread
intsig: signal to be sent
siginfo_t__user*uinfo: signal info to be sent

longsys_sigpending(old_sigset_t __user * uset)¶: examine pending signals

Parameters

old_sigset_t__user*uset: where mask of pending signal is returned

longsys_sigprocmask(int how, old_sigset_t __user * nset, old_sigset_t __user * oset)¶: examine and change blocked signals

Parameters

inthow: whether to add, remove, or set signals
old_sigset_t__user*nset: signals to add or remove (if non-null)
old_sigset_t__user*oset: previous value of signal mask if non-null

Description

Some platforms have their own version with special arguments;others support only sys_rt_sigprocmask.

longsys_rt_sigaction(int sig, const struct sigaction __user * act, struct sigaction __user * oact, size_t sigsetsize)¶: alter an action taken by a process

Parameters

intsig: signal to be sent
conststructsigaction__user*act: new sigaction
structsigaction__user*oact: used to save the previous sigaction
size_tsigsetsize: size of sigset_t type

longsys_rt_sigsuspend(sigset_t __user * unewset, size_t sigsetsize)¶: replace the signal mask for a value with theunewset value until a signal is received

Parameters

sigset_t__user*unewset: new signal mask value
size_tsigsetsize: size of sigset_t type

kthread_create(threadfn,data,namefmt,arg)¶: create a kthread on the current node

Parameters

threadfn: the function to run in the thread
data: data pointer forthreadfn()
namefmt: printf-style format string for the thread name
arg: arguments fornamefmt.

Description

This macro will create a kthread on the current node, leaving it inthe stopped state. This is just a helper forkthread_create_on_node();see the documentation there for more details.

kthread_run(threadfn,data,namefmt,…)¶: create and wake a thread.

Parameters

threadfn: the function to run until signal_pending(current).
data: data ptr forthreadfn.
namefmt: printf-style name for the thread.
...: variable arguments

Description

Convenient wrapper forkthread_create() followed bywake_up_process(). Returns the kthread or ERR_PTR(-ENOMEM).

boolkthread_should_stop(void)¶: should this kthread return now?

Parameters

void: no arguments

Description

When someone callskthread_stop() on your kthread, it will be wokenand this will return true. You should then return, and your returnvalue will be passed through tokthread_stop().

boolkthread_should_park(void)¶: should this kthread park now?

Parameters

void: no arguments

Description

When someone callskthread_park() on your kthread, it will be wokenand this will return true. You should then do the necessarycleanup and call kthread_parkme()

Similar tokthread_should_stop(), but this keeps the thread aliveand in a park position.kthread_unpark() “restarts” the thread andcalls the thread function again.

boolkthread_freezable_should_stop(bool * was_frozen)¶: should this freezable kthread return now?

Parameters

bool*was_frozen: optional out parameter, indicates whethercurrent was frozen

Description

kthread_should_stop() for freezable kthreads, which will enterrefrigerator if necessary. This function is safe fromkthread_stop() /freezer deadlock and freezable kthreads should use this function insteadof calling try_to_freeze() directly.

void *kthread_func(struct task_struct * task)¶: return the function specified on kthread creation

Parameters

structtask_struct*task: kthread task in question

Description

Returns NULL if the task is not a kthread.

void *kthread_data(struct task_struct * task)¶: return data value specified on kthread creation

Parameters

structtask_struct*task: kthread task in question

Description

Return the data value specified when kthreadtask was created.The caller is responsible for ensuring the validity oftask whencalling this function.

struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), void * data, int node, const char namefmt, ...)¶: create a kthread.

Parameters

int(*)(void*data)threadfn: the function to run until signal_pending(current).
void*data: data ptr forthreadfn.
intnode: task and thread structures for the thread are allocated on this node
constcharnamefmt: printf-style name for the thread.
...: variable arguments

Description

This helper function creates and names a kernelthread. The thread will be stopped: usewake_up_process() to startit. See alsokthread_run(). The new thread has SCHED_NORMAL policy andis affine to all CPUs.

If thread is going to be bound on a particular cpu, give its nodeinnode, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE.When woken, the thread will runthreadfn() withdata as itsargument.threadfn() can either call do_exit() directly if it is astandalone thread for which no one will callkthread_stop(), orreturn when ‘kthread_should_stop()’ is true (which meanskthread_stop() has been called). The return value should be zeroor a negative error number; it will be passed tokthread_stop().

Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).

voidkthread_bind(struct task_struct * p, unsigned int cpu)¶: bind a just-created kthread to a cpu.

Parameters

structtask_struct*p: thread created bykthread_create().
unsignedintcpu: cpu (might not be online, must be possible) fork to run on.

Description

This function is equivalent to set_cpus_allowed(),except thatcpu doesn’t need to be online, and the thread must bestopped (i.e., just returned fromkthread_create()).

voidkthread_unpark(struct task_struct * k)¶: unpark a thread created bykthread_create().

Parameters

structtask_struct*k: thread created bykthread_create().

Description

Setskthread_should_park() fork to return false, wakes it, andwaits for it to return. If the thread is marked percpu then itsbound to the cpu again.

intkthread_park(struct task_struct * k)¶: park a thread created bykthread_create().

Parameters

structtask_struct*k: thread created bykthread_create().

Description

Setskthread_should_park() fork to return true, wakes it, andwaits for it to return. This can also be called afterkthread_create()instead of callingwake_up_process(): the thread will park withoutcalling threadfn().

Returns 0 if the thread is parked, -ENOSYS if the thread exited.If called by the kthread itself just the park bit is set.

intkthread_stop(struct task_struct * k)¶: stop a thread created bykthread_create().

Parameters

structtask_struct*k: thread created bykthread_create().

Description

Setskthread_should_stop() fork to return true, wakes it, andwaits for it to exit. This can also be called afterkthread_create()instead of callingwake_up_process(): the thread will exit withoutcalling threadfn().

If threadfn() may call do_exit() itself, the caller must ensuretask_struct can’t go away.

Returns the result of threadfn(), or-EINTR ifwake_up_process()was never called.

intkthread_worker_fn(void * worker_ptr)¶: kthread function to process kthread_worker

Parameters

void*worker_ptr: pointer to initialized kthread_worker

Description

This function implements the main cycle of kthread worker. It processeswork_list until it is stopped withkthread_stop(). It sleeps when the queueis empty.

The works are not allowed to keep any locks, disable preemption or interruptswhen they finish. There is defined a safe point for freezing when one workfinishes and before a new one is started.

Also the works must not be handled by more than one worker at the same time,see alsokthread_queue_work().

struct kthread_worker *kthread_create_worker(unsigned int flags, const char namefmt, ...)¶: create a kthread worker

Parameters

unsignedintflags: flags modifying the default behavior of the worker
constcharnamefmt: printf-style name for the kthread worker (task).
...: variable arguments

Description

Returns a pointer to the allocated worker on success, ERR_PTR(-ENOMEM)when the needed structures could not get allocated, and ERR_PTR(-EINTR)when the worker was SIGKILLed.

struct kthread_worker *kthread_create_worker_on_cpu(int cpu, unsigned int flags, const char namefmt, ...)¶: create a kthread worker and bind it it to a given CPU and the associated NUMA node.

Parameters

intcpu: CPU number
unsignedintflags: flags modifying the default behavior of the worker
constcharnamefmt: printf-style name for the kthread worker (task).
...: variable arguments

Description

Use a valid CPU number if you want to bind the kthread workerto the given CPU and the associated NUMA node.

A good practice is to add the cpu number also into the worker name.For example, use kthread_create_worker_on_cpu(cpu, “helper/d”, cpu).

Returns a pointer to the allocated worker on success, ERR_PTR(-ENOMEM)when the needed structures could not get allocated, and ERR_PTR(-EINTR)when the worker was SIGKILLed.

boolkthread_queue_work(struct kthread_worker * worker, struct kthread_work * work)¶: queue a kthread_work

Parameters

structkthread_worker*worker: target kthread_worker
structkthread_work*work: kthread_work to queue

Description

Queuework to work processortask for async execution.taskmust have been created with kthread_worker_create(). Returnstrueifwork was successfully queued,false if it was already pending.

Reinitialize the work if it needs to be used by another worker.For example, when the worker was stopped and started again.

voidkthread_delayed_work_timer_fn(struct timer_list * t)¶: callback that queues the associated kthread delayed work when the timer expires.

Parameters

structtimer_list*t: pointer to the expired timer

Description

The format of the function is defined by struct timer_list.It should have been called from irqsafe timer with irq already off.

boolkthread_queue_delayed_work(struct kthread_worker * worker, struct kthread_delayed_work * dwork, unsigned long delay)¶: queue the associated kthread work after a delay.

Parameters

structkthread_worker*worker: target kthread_worker
structkthread_delayed_work*dwork: kthread_delayed_work to queue
unsignedlongdelay: number of jiffies to wait before queuing

Description

If the work has not been pending it starts a timer that will queuethe work after the givendelay. Ifdelay is zero, it queues thework immediately.

Return

false if thework has already been pending. It means thateither the timer was running or the work was queued. It returnstrueotherwise.

voidkthread_flush_work(struct kthread_work * work)¶: flush a kthread_work

Parameters

structkthread_work*work: work to flush

Description

Ifwork is queued or executing, wait for it to finish execution.

boolkthread_mod_delayed_work(struct kthread_worker * worker, struct kthread_delayed_work * dwork, unsigned long delay)¶: modify delay of or queue a kthread delayed work

Parameters

structkthread_worker*worker: kthread worker to use
structkthread_delayed_work*dwork: kthread delayed work to queue
unsignedlongdelay: number of jiffies to wait before queuing

Description

Ifdwork is idle, equivalent tokthread_queue_delayed_work(). Otherwise,modifydwork’s timer so that it expires afterdelay. Ifdelay is zero,work is guaranteed to be queued immediately.

A special case is when the work is being canceled in parallel.It might be caused either by the realkthread_cancel_delayed_work_sync()or yet anotherkthread_mod_delayed_work() call. We let the other commandwin and returnfalse here. The caller is supposed to synchronize theseoperations a reasonable way.

This function is safe to call from any context including IRQ handler.See __kthread_cancel_work() andkthread_delayed_work_timer_fn()for details.

Return

true ifdwork was pending and its timer was modified,false otherwise.

boolkthread_cancel_work_sync(struct kthread_work * work)¶: cancel a kthread work and wait for it to finish

Parameters

structkthread_work*work: the kthread work to cancel

Description

Cancelwork and wait for its execution to finish. This functioncan be used even if the work re-queues itself. On return from thisfunction,work is guaranteed to be not pending or executing on any CPU.

kthread_cancel_work_sync(delayed_work->work) must not be used fordelayed_work’s. Usekthread_cancel_delayed_work_sync() instead.

The caller must ensure that the worker on whichwork was lastqueued can’t be destroyed before this function returns.

Return

true ifwork was pending,false otherwise.

boolkthread_cancel_delayed_work_sync(struct kthread_delayed_work * dwork)¶: cancel a kthread delayed work and wait for it to finish.

Parameters

structkthread_delayed_work*dwork: the kthread delayed work to cancel

Description

This iskthread_cancel_work_sync() for delayed works.

Return

true ifdwork was pending,false otherwise.

voidkthread_flush_worker(struct kthread_worker * worker)¶: flush all current works on a kthread_worker

Parameters

structkthread_worker*worker: worker to flush

Description

Wait until all currently executing or pending works onworker arefinished.

voidkthread_destroy_worker(struct kthread_worker * worker)¶: destroy a kthread worker

Parameters

structkthread_worker*worker: worker to be destroyed

Description

Flush and destroyworker. The simple flush is enough because the kthreadworker API is used only in trivial scenarios. There are no multi-step statemachines needed.

voidkthread_use_mm(struct mm_struct * mm)¶: make the calling kthread operate on an address space

Parameters

structmm_struct*mm: address space to operate on

voidkthread_unuse_mm(struct mm_struct * mm)¶: reverse the effect ofkthread_use_mm()

Parameters

structmm_struct*mm: address space to operate on

voidkthread_associate_blkcg(struct cgroup_subsys_state * css)¶: associate blkcg to current kthread

Parameters

structcgroup_subsys_state*css: the cgroup info

Description

Current thread must be a kthread. The thread is running jobs on behalf ofother threads. In some cases, we expect the jobs attach cgroup info oforiginal threads instead of that of current thread. This function storesoriginal thread’s cgroup info in current kthread context for laterretrieval.

struct cgroup_subsys_state *kthread_blkcg(void)¶: get associated blkcg css of current kthread

Parameters

void: no arguments

Description

Current thread must be a kthread.

Reference counting¶

structrefcount_struct¶: variant of atomic_t specialized for reference counts

Definition

struct refcount_struct {  atomic_t refs;};

Members

refs: atomic_t counter field

Description

The counter saturates at REFCOUNT_SATURATED and will not move oncethere. This avoids wrapping the counter and causing ‘spurious’use-after-free bugs.

voidrefcount_set(refcount_t * r, int n)¶: set a refcount’s value

Parameters

refcount_t*r: the refcount
intn: value to which the refcount will be set

unsigned intrefcount_read(const refcount_t * r)¶: get a refcount’s value

Parameters

constrefcount_t*r: the refcount

Return

the refcount’s value

boolrefcount_add_not_zero(int i, refcount_t * r)¶: add a value to a refcount unless it is 0

Parameters

inti: the value to add to the refcount
refcount_t*r: the refcount

Description

Will saturate at REFCOUNT_SATURATED and WARN.

Provides no memory ordering, it is assumed the caller has guaranteed theobject memory to be stable (RCU, etc.). It does provide a control dependencyand thereby orders future stores. See the comment on top.

Use of this function is not recommended for the normal reference countinguse case in which references are taken and released one at a time. In thesecases,refcount_inc(), or one of its variants, should instead be used toincrement a reference count.

Return

false if the passed refcount is 0, true otherwise

voidrefcount_add(int i, refcount_t * r)¶: add a value to a refcount

Parameters

inti: the value to add to the refcount
refcount_t*r: the refcount

Description

Similar to atomic_add(), but will saturate at REFCOUNT_SATURATED and WARN.

boolrefcount_inc_not_zero(refcount_t * r)¶: increment a refcount unless it is 0

Parameters

refcount_t*r: the refcount to increment

Description

Similar to atomic_inc_not_zero(), but will saturate at REFCOUNT_SATURATEDand WARN.

Return

true if the increment was successful, false otherwise

voidrefcount_inc(refcount_t * r)¶: increment a refcount

Parameters

refcount_t*r: the refcount to increment

Description

Similar to atomic_inc(), but will saturate at REFCOUNT_SATURATED and WARN.

Provides no memory ordering, it is assumed the caller already has areference on the object.

Will WARN if the refcount is 0, as this represents a possible use-after-freecondition.

boolrefcount_sub_and_test(int i, refcount_t * r)¶: subtract from a refcount and test if it is 0

Parameters

inti: amount to subtract from the refcount
refcount_t*r: the refcount

Description

Similar to atomic_dec_and_test(), but it will WARN, return false andultimately leak on underflow and will fail to decrement when saturatedat REFCOUNT_SATURATED.

Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such that free()must come after.

Use of this function is not recommended for the normal reference countinguse case in which references are taken and released one at a time. In thesecases,refcount_dec(), or one of its variants, should instead be used todecrement a reference count.

Return

true if the resulting refcount is 0, false otherwise

boolrefcount_dec_and_test(refcount_t * r)¶: decrement a refcount and test if it is 0

Parameters

refcount_t*r: the refcount

Description

Similar to atomic_dec_and_test(), it will WARN on underflow and fail todecrement when saturated at REFCOUNT_SATURATED.

Provides release memory ordering, such that prior loads and stores are donebefore, and provides an acquire ordering on success such that free()must come after.

Return

true if the resulting refcount is 0, false otherwise

voidrefcount_dec(refcount_t * r)¶: decrement a refcount

Parameters

refcount_t*r: the refcount

Description

Similar to atomic_dec(), it will WARN on underflow and fail to decrementwhen saturated at REFCOUNT_SATURATED.

Provides release memory ordering, such that prior loads and stores are donebefore.

boolrefcount_dec_if_one(refcount_t * r)¶: decrement a refcount if it is 1

Parameters

refcount_t*r: the refcount

Description

No atomic_t counterpart, it attempts a 1 -> 0 transition and returns thesuccess thereof.

Like all decrement operations, it provides release memory order and providesa control dependency.

It can be used like a try-delete operator; this explicit case is providedand not cmpxchg in generic, because that would allow implementing unsafeoperations.

Return

true if the resulting refcount is 0, false otherwise

boolrefcount_dec_not_one(refcount_t * r)¶: decrement a refcount if it is not 1

Parameters

refcount_t*r: the refcount

Description

No atomic_t counterpart, it decrements unless the value is 1, in which caseit will return false.

Was often done like: atomic_add_unless(var, -1, 1)

Return

true if the decrement operation was successful, false otherwise

boolrefcount_dec_and_mutex_lock(refcount_t * r, struct mutex * lock)¶: return holding mutex if able to decrement refcount to 0

Parameters

refcount_t*r: the refcount
structmutex*lock: the mutex to be locked

Description

Similar toatomic_dec_and_mutex_lock(), it will WARN on underflow and failto decrement when saturated at REFCOUNT_SATURATED.

Provides release memory ordering, such that prior loads and stores are donebefore, and provides a control dependency such that free() must come after.See the comment on top.

Return

true and hold mutex if able to decrement refcount to 0, false: otherwise

boolrefcount_dec_and_lock(refcount_t * r, spinlock_t * lock)¶: return holding spinlock if able to decrement refcount to 0

Parameters

refcount_t*r: the refcount
spinlock_t*lock: the spinlock to be locked

Description

Similar to atomic_dec_and_lock(), it will WARN on underflow and fail todecrement when saturated at REFCOUNT_SATURATED.

Provides release memory ordering, such that prior loads and stores are donebefore, and provides a control dependency such that free() must come after.See the comment on top.

Return

true and hold spinlock if able to decrement refcount to 0, false: otherwise

boolrefcount_dec_and_lock_irqsave(refcount_t * r, spinlock_t * lock, unsigned long * flags)¶: return holding spinlock with disabled interrupts if able to decrement refcount to 0

Parameters

refcount_t*r: the refcount
spinlock_t*lock: the spinlock to be locked
unsignedlong*flags: saved IRQ-flags if the is acquired

Description

Same asrefcount_dec_and_lock() above except that the spinlock is acquiredwith disabled interupts.

Return

true and hold spinlock if able to decrement refcount to 0, false: otherwise

Atomics¶

intarch_atomic_read(const atomic_t * v)¶: read atomic variable

Parameters

constatomic_t*v: pointer of type atomic_t

Description

Atomically reads the value ofv.

voidarch_atomic_set(atomic_t * v, int i)¶: set atomic variable

Parameters

atomic_t*v: pointer of type atomic_t
inti: required value

Description

Atomically sets the value ofv toi.

voidarch_atomic_add(int i, atomic_t * v)¶: add integer to atomic variable

Parameters

inti: integer value to add
atomic_t*v: pointer of type atomic_t

Description

Atomically addsi tov.

voidarch_atomic_sub(int i, atomic_t * v)¶: subtract integer from atomic variable

Parameters

inti: integer value to subtract
atomic_t*v: pointer of type atomic_t

Description

Atomically subtractsi fromv.

boolarch_atomic_sub_and_test(int i, atomic_t * v)¶: subtract value from variable and test result

Parameters

inti: integer value to subtract
atomic_t*v: pointer of type atomic_t

Description

Atomically subtractsi fromv and returnstrue if the result is zero, or false for allother cases.

voidarch_atomic_inc(atomic_t * v)¶: increment atomic variable

Parameters

atomic_t*v: pointer of type atomic_t

Description

Atomically incrementsv by 1.

voidarch_atomic_dec(atomic_t * v)¶: decrement atomic variable

Parameters

atomic_t*v: pointer of type atomic_t

Description

Atomically decrementsv by 1.

boolarch_atomic_dec_and_test(atomic_t * v)¶: decrement and test

Parameters

atomic_t*v: pointer of type atomic_t

Description

Atomically decrementsv by 1 andreturns true if the result is 0, or false for all othercases.

boolarch_atomic_inc_and_test(atomic_t * v)¶: increment and test

Parameters

atomic_t*v: pointer of type atomic_t

Description

Atomically incrementsv by 1and returns true if the result is zero, or false for allother cases.

boolarch_atomic_add_negative(int i, atomic_t * v)¶: add and test if negative

Parameters

inti: integer value to add
atomic_t*v: pointer of type atomic_t

Description

Atomically addsi tov and returns trueif the result is negative, or false whenresult is greater than or equal to zero.

intarch_atomic_add_return(int i, atomic_t * v)¶: add integer and return

Parameters

inti: integer value to add
atomic_t*v: pointer of type atomic_t

Description

Atomically addsi tov and returnsi +v

intarch_atomic_sub_return(int i, atomic_t * v)¶: subtract integer and return

Parameters

inti: integer value to subtract
atomic_t*v: pointer of type atomic_t

Description

Atomically subtractsi fromv and returnsv -i

Kernel objects manipulation¶

char *kobject_get_path(struct kobject * kobj, gfp_t gfp_mask)¶: Allocate memory and fill in the path forkobj.

Parameters

structkobject*kobj: kobject in question, with which to build the path
gfp_tgfp_mask: the allocation type used to allocate the path

Return

The newly allocated memory, caller must free withkfree().

intkobject_set_name(struct kobject * kobj, const char * fmt, ...)¶: Set the name of a kobject.

Parameters

structkobject*kobj: struct kobject to set the name of
constchar*fmt: format string used to build the name
...: variable arguments

Description

This sets the name of the kobject. If you have already added thekobject to the system, you must callkobject_rename() in order tochange the name of the kobject.

voidkobject_init(struct kobject * kobj, struct kobj_type * ktype)¶: Initialize a kobject structure.

Parameters

structkobject*kobj: pointer to the kobject to initialize
structkobj_type*ktype: pointer to the ktype for this kobject.

Description

This function will properly initialize a kobject such that it can thenbe passed to thekobject_add() call.

After this function is called, the kobject MUST be cleaned up by a calltokobject_put(), not by a call to kfree directly to ensure that all ofthe memory is cleaned up properly.

intkobject_add(struct kobject * kobj, struct kobject * parent, const char * fmt, ...)¶: The main kobject add function.

Parameters

structkobject*kobj: the kobject to add
structkobject*parent: pointer to the parent of the kobject.
constchar*fmt: format to name the kobject with.
...: variable arguments

Description

The kobject name is set and added to the kobject hierarchy in thisfunction.

Ifparent is set, then the parent of thekobj will be set to it.Ifparent is NULL, then the parent of thekobj will be set to thekobject associated with the kset assigned to this kobject. If no ksetis assigned to the kobject, then the kobject will be located in theroot of the sysfs tree.

Note, no “add” uevent will be created with this call, the caller should setup all of the necessary sysfs files for the object and then callkobject_uevent() with the UEVENT_ADD parameter to ensure thatuserspace is properly notified of this kobject’s creation.

Return

If this function returns an error, kobject_put() must be

called to properly clean up the memory associated with theobject. Under no instance should the kobject that is passedto this function be directly freed with a call tokfree(),that can leak memory.

If this function returns success,kobject_put() must also be calledin order to properly clean up the memory associated with the object.

In short, once this function is called,kobject_put() MUST be calledwhen the use of the object is finished in order to properly freeeverything.

intkobject_init_and_add(struct kobject * kobj, struct kobj_type * ktype, struct kobject * parent, const char * fmt, ...)¶: Initialize a kobject structure and add it to the kobject hierarchy.

Parameters

structkobject*kobj: pointer to the kobject to initialize
structkobj_type*ktype: pointer to the ktype for this kobject.
structkobject*parent: pointer to the parent of this kobject.
constchar*fmt: the name of the kobject.
...: variable arguments

Description

This function combines the call tokobject_init() andkobject_add().

If this function returns an error,kobject_put() must be called toproperly clean up the memory associated with the object. This is thesame type of error handling after a call tokobject_add() and kobjectlifetime rules are the same here.

intkobject_rename(struct kobject * kobj, const char * new_name)¶: Change the name of an object.

Parameters

structkobject*kobj: object in question.
constchar*new_name: object’s new name

Description

It is the responsibility of the caller to provide mutualexclusion between two different calls of kobject_renameon the same kobject and to ensure that new_name is valid andwon’t conflict with other kobjects.

intkobject_move(struct kobject * kobj, struct kobject * new_parent)¶: Move object to another parent.

Parameters

structkobject*kobj: object in question.
structkobject*new_parent: object’s new parent (can be NULL)

voidkobject_del(struct kobject * kobj)¶: Unlink kobject from hierarchy.

Parameters

structkobject*kobj: object.

Description

This is the function that should be called to delete an objectsuccessfully added viakobject_add().

struct kobject *kobject_get(struct kobject * kobj)¶: Increment refcount for object.

Parameters

structkobject*kobj: object.

voidkobject_put(struct kobject * kobj)¶: Decrement refcount for object.

Parameters

structkobject*kobj: object.

Description

Decrement the refcount, and if 0, call kobject_cleanup().

struct kobject *kobject_create_and_add(const char * name, struct kobject * parent)¶: Create a struct kobject dynamically and register it with sysfs.

Parameters

constchar*name: the name for the kobject
structkobject*parent: the parent kobject of this kobject, if any.

Description

This function creates a kobject structure dynamically and registers itwith sysfs. When you are finished with this structure, callkobject_put() and the structure will be dynamically freed whenit is no longer being used.

If the kobject was not able to be created, NULL will be returned.

intkset_register(struct kset * k)¶: Initialize and add a kset.

Parameters

structkset*k: kset.

voidkset_unregister(struct kset * k)¶: Remove a kset.

Parameters

structkset*k: kset.

struct kobject *kset_find_obj(struct kset * kset, const char * name)¶: Search for object in kset.

Parameters

structkset*kset: kset we’re looking in.
constchar*name: object’s name.

Description

Lock kset viakset->subsys, and iterate overkset->list,looking for a matching kobject. If matching object is foundtake a reference and return the object.

struct kset *kset_create_and_add(const char * name, const struct kset_uevent_ops * uevent_ops, struct kobject * parent_kobj)¶: Create a struct kset dynamically and add it to sysfs.

Parameters

constchar*name: the name for the kset
conststructkset_uevent_ops*uevent_ops: a struct kset_uevent_ops for the kset
structkobject*parent_kobj: the parent kobject of this kset, if any.

Description

This function creates a kset structure dynamically and registers itwith sysfs. When you are finished with this structure, callkset_unregister() and the structure will be dynamically freed when itis no longer being used.

If the kset was not able to be created, NULL will be returned.

Kernel utility functions¶

REPEAT_BYTE(x)¶: repeat the valuex multiple times as an unsigned long value

Parameters

x: value to repeat

NOTE

x is not checked for > 0xff; larger values produce odd results.

ARRAY_SIZE(arr)¶: get the number of elements in arrayarr

Parameters

arr: array to be sized

round_up(x,y)¶: round up to next specified power of 2

Parameters

x: the value to round
y: multiple to round up to (must be a power of 2)

Description

Roundsx up to next multiple ofy (which must be a power of 2).To perform arbitrary rounding up, useroundup() below.

round_down(x,y)¶: round down to next specified power of 2

Parameters

x: the value to round
y: multiple to round down to (must be a power of 2)

Description

Roundsx down to next multiple ofy (which must be a power of 2).To perform arbitrary rounding down, userounddown() below.

roundup(x,y)¶: round up to the next specified multiple

Parameters

x: the value to up
y: multiple to round up to

Description

Roundsx up to next multiple ofy. Ify will always be a powerof 2, consider using the fasterround_up().

rounddown(x,y)¶: round down to next specified multiple

Parameters

x: the value to round
y: multiple to round down to

Description

Roundsx down to next multiple ofy. Ify will always be a powerof 2, consider using the fasterround_down().

upper_32_bits(n)¶: return bits 32-63 of a number

Parameters

n: the number we’re accessing

Description

A basic shift-right of a 64- or 32-bit quantity. Use this to suppressthe “right shift count >= width of type” warning when that quantity is32-bits.

lower_32_bits(n)¶: return bits 0-31 of a number

Parameters

n: the number we’re accessing

might_sleep()¶: annotation for functions that can sleep

Parameters

Description

this macro will print a stack trace if it is executed in an atomiccontext (spinlock, irq-handler, …). Additional sections where blocking isnot allowed can be annotated withnon_block_start() andnon_block_end()pairs.

This is a useful debugging help to be able to catch problems early and notbe bitten later when the calling function happens to sleep when it is notsupposed to.

cant_sleep()¶: annotation for functions that cannot sleep

Parameters

Description

this macro will print a stack trace if it is executed with preemption enabled

non_block_start()¶: annotate the start of section where sleeping is prohibited

Parameters

Description

This is on behalf of the oom reaper, specifically when it is calling the mmunotifiers. The problem is that if the notifier were to block on, for example,mutex_lock() and if the process which holds that mutex were to perform asleeping memory allocation, the oom reaper is now blocked on completion ofthat memory allocation. Other blocking calls likewait_event() pose similarissues.

non_block_end()¶: annotate the end of section where sleeping is prohibited

Parameters

Description

Closes a section opened bynon_block_start().

abs(x)¶: return absolute value of an argument

Parameters

x: the value. If it is unsigned type, it is converted to signed type first.char is treated as if it was signed (regardless of whether it really is)but the macro’s return type is preserved as char.

Return

an absolute value of x.

u32reciprocal_scale(u32 val, u32 ep_ro)¶: “scale” a value into range [0, ep_ro)

Parameters

u32val: value
u32ep_ro: right open interval endpoint

Description

Perform a “reciprocal multiplication” in order to “scale” a value intorange [0,ep_ro), where the upper interval endpoint is right-open.This is useful, e.g. for accessing a index of an array containingep_ro elements, for example. Think of it as sort of modulus, only thatthe result isn’t that of modulo. ;) Note that if initial input is asmall value, then result will return 0.

Return

a result based onval in interval [0,ep_ro).

intkstrtoul(const char * s, unsigned int base, unsigned long * res)¶: convert a string to an unsigned long

Parameters

constchar*s: The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign, but not a minus sign.
unsignedintbase: The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
unsignedlong*res: Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Used as a replacement for the simple_strtoull. Return code must be checked.

intkstrtol(const char * s, unsigned int base, long * res)¶: convert a string to a long

Parameters

constchar*s: The start of the string. The string must be null-terminated, and may alsoinclude a single newline before its terminating null. The first charactermay also be a plus sign or a minus sign.
unsignedintbase: The number base to use. The maximum supported base is 16. If base isgiven as 0, then the base of the string is automatically detected with theconventional semantics - If it begins with 0x the number will be parsed as ahexadecimal (case insensitive), if it otherwise begins with 0, it will beparsed as an octal number. Otherwise it will be parsed as a decimal.
long*res: Where to write the result of the conversion on success.

Description

Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.Used as a replacement for the simple_strtoull. Return code must be checked.

trace_printk(fmt,…)¶: printf formatting in the ftrace buffer

Parameters

fmt: the printf format for printing
...: variable arguments

Note

__trace_printk is an internal function for trace_printk() and: theip is passed in via thetrace_printk() macro.

Description

This function allows a kernel developer to debug fast path sectionsthat printk is not appropriate for. By scattering in variousprintk like tracing in the code, a developer can quickly seewhere problems are occurring.

This is intended as a debugging tool for the developer only.Please refrain from leaving trace_printks scattered around inyour code. (Extra memory is used for special buffers that areallocated whentrace_printk() is used.)

A little optimization trick is done here. If there’s only oneargument, there’s no need to scan the string for printf formats.Thetrace_puts() will suffice. But how can we take advantage ofusingtrace_puts() whentrace_printk() has only one argument?By stringifying the args and checking the size we can tellwhether or not there are args. __stringify((__VA_ARGS__)) willturn into “()0” with a size of 3 when there are no args, anythingelse will be bigger. All we need to do is define a string to this,and then take its size and compare to 3. If it’s bigger, usedo_trace_printk() otherwise, optimize it totrace_puts(). Then justlet gcc optimize the rest.

trace_puts(str)¶: write a string into the ftrace buffer

Parameters

str: the string to record

Note

__trace_bputs is an internal function for trace_puts and: theip is passed in via the trace_puts macro.

Description

This is similar totrace_printk() but is made for those really fastpaths that a developer wants the least amount of “Heisenbug” effects,where the processing of the print format is still too much.

This is intended as a debugging tool for the developer only.Please refrain from leaving trace_puts scattered around inyour code. (Extra memory is used for special buffers that areallocated whentrace_puts() is used.)

Return

0 if nothing was written, positive # if string was.: (1 when __trace_bputs is used, strlen(str) when __trace_puts is used)

min(x,y)¶: return minimum of two values of the same or compatible types

Parameters

x: first value
y: second value

max(x,y)¶: return maximum of two values of the same or compatible types

Parameters

x: first value
y: second value

min3(x,y,z)¶: return minimum of three values

Parameters

x: first value
y: second value
z: third value

max3(x,y,z)¶: return maximum of three values

Parameters

x: first value
y: second value
z: third value

min_not_zero(x,y)¶: return the minimum that is _not_ zero, unless both are zero

Parameters

x: value1
y: value2

clamp(val,lo,hi)¶: return a value clamped to a given range with strict typechecking

Parameters

val: current value
lo: lowest allowable value
hi: highest allowable value

Description

This macro does strict typechecking oflo/hi to make sure they are of thesame type asval. See the unnecessary pointer comparisons.

min_t(type,x,y)¶: return minimum of two values, using the specified type

Parameters

type: data type to use
x: first value
y: second value

max_t(type,x,y)¶: return maximum of two values, using the specified type

Parameters

type: data type to use
x: first value
y: second value

clamp_t(type,val,lo,hi)¶: return a value clamped to a given range using a given type

Parameters

type: the type of variable to use
val: current value
lo: minimum allowable value
hi: maximum allowable value

Description

This macro does no typechecking and uses temporary variables of typetype to make all the comparisons.

clamp_val(val,lo,hi)¶: return a value clamped to a given range using val’s type

Parameters

val: current value
lo: minimum allowable value
hi: maximum allowable value

Description

This macro does no typechecking and uses temporary variables of whatevertype the input argumentval is. This is useful whenval is an unsignedtype andlo andhi are literals that will otherwise be assigned a signedinteger type.

swap(a,b)¶: swap values ofa andb

Parameters

a: first value
b: second value

container_of(ptr,type,member)¶: cast a member of a structure out to the containing structure

Parameters

ptr: the pointer to the member.
type: the type of the container struct this is embedded in.
member: the name of the member within the struct.

container_of_safe(ptr,type,member)¶: cast a member of a structure out to the containing structure

Parameters

ptr: the pointer to the member.
type: the type of the container struct this is embedded in.
member: the name of the member within the struct.

Description

If IS_ERR_OR_NULL(ptr), ptr is returned unchanged.

__visible intprintk(const char * fmt, ...)¶: print a kernel message

Parameters

constchar*fmt: format string
...: variable arguments

Description

This isprintk(). It can be called from any context. We want it to work.

We try to grab the console_lock. If we succeed, it’s easy - we log theoutput and call the console drivers. If we fail to get the semaphore, weplace the output into the log buffer and return. The current holder ofthe console_sem will notice the new output inconsole_unlock(); and willsend it to the consoles before releasing the lock.

One effect of this deferred printing is that code which callsprintk() andthen changes console_loglevel may break. This is because console_loglevelis inspected when the actual printing occurs.

Device Resource Management¶

void *devres_alloc_node(dr_release_t release, size_t size, gfp_t gfp, int nid)¶: Allocate device resource data

Parameters

dr_release_trelease: Release function devres will be associated with
size_tsize: Allocation size
gfp_tgfp: Allocation flags
intnid: NUMA node

Description

Allocate devres ofsize bytes. The allocated area is zeroed, thenassociated withrelease. The returned pointer can be passed toother devres_*() functions.

Return

Pointer to allocated devres on success, NULL on failure.

voiddevres_for_each_res(structdevice * dev, dr_release_t release, dr_match_t match, void * match_data, void (*fn)(structdevice *, void *, void *), void * data)¶: Resource iterator

Parameters

structdevice*dev: Device to iterate resource from
dr_release_trelease: Look for resources associated with this release function
dr_match_tmatch: Match function (optional)
void*match_data: Data for the match function
void(*)(structdevice*,void*,void*)fn: Function to be called for each matched resource.
void*data: Data forfn, the 3rd parameter offn

Description

Callfn for each devres ofdev which is associated withreleaseand for whichmatch returns 1.

Return

void

voiddevres_free(void * res)¶: Free device resource data

Parameters

void*res: Pointer to devres data to free

Description

Free devres created with devres_alloc().

voiddevres_add(structdevice * dev, void * res)¶: Register device resource

Parameters

structdevice*dev: Device to add resource to
void*res: Resource to register

Description

Register devresres todev.res should have been allocatedusing devres_alloc(). On driver detach, the associated releasefunction will be invoked and devres will be freed automatically.

void *devres_find(structdevice * dev, dr_release_t release, dr_match_t match, void * match_data)¶: Find device resource

Parameters

structdevice*dev: Device to lookup resource from
dr_release_trelease: Look for resources associated with this release function
dr_match_tmatch: Match function (optional)
void*match_data: Data for the match function

Description

Find the latest devres ofdev which is associated withreleaseand for whichmatch returns 1. Ifmatch is NULL, it’s consideredto match all.

Return

Pointer to found devres, NULL if not found.

void *devres_get(structdevice * dev, void * new_res, dr_match_t match, void * match_data)¶: Find devres, if non-existent, add one atomically

Parameters

structdevice*dev: Device to lookup or add devres for
void*new_res: Pointer to new initialized devres to add if not found
dr_match_tmatch: Match function (optional)
void*match_data: Data for the match function

Description

Find the latest devres ofdev which has the same release functionasnew_res and for whichmatch return 1. If found,new_res isfreed; otherwise,new_res is added atomically.

Return

Pointer to found or added devres.

void *devres_remove(structdevice * dev, dr_release_t release, dr_match_t match, void * match_data)¶: Find a device resource and remove it

Parameters

structdevice*dev: Device to find resource from
dr_release_trelease: Look for resources associated with this release function
dr_match_tmatch: Match function (optional)
void*match_data: Data for the match function

Description

Find the latest devres ofdev associated withrelease and forwhichmatch returns 1. Ifmatch is NULL, it’s considered tomatch all. If found, the resource is removed atomically andreturned.

Return

Pointer to removed devres on success, NULL if not found.

intdevres_destroy(structdevice * dev, dr_release_t release, dr_match_t match, void * match_data)¶: Find a device resource and destroy it

Parameters

structdevice*dev: Device to find resource from
dr_release_trelease: Look for resources associated with this release function
dr_match_tmatch: Match function (optional)
void*match_data: Data for the match function

Description

Find the latest devres ofdev associated withrelease and forwhichmatch returns 1. Ifmatch is NULL, it’s considered tomatch all. If found, the resource is removed atomically and freed.

Note that the release function for the resource will not be called,only the devres-allocated data will be freed. The caller becomesresponsible for freeing any other data.

Return

0 if devres is found and freed, -ENOENT if not found.

intdevres_release(structdevice * dev, dr_release_t release, dr_match_t match, void * match_data)¶: Find a device resource and destroy it, calling release

Parameters

structdevice*dev: Device to find resource from
dr_release_trelease: Look for resources associated with this release function
dr_match_tmatch: Match function (optional)
void*match_data: Data for the match function

Description

Find the latest devres ofdev associated withrelease and forwhichmatch returns 1. Ifmatch is NULL, it’s considered tomatch all. If found, the resource is removed atomically, therelease function called and the resource freed.

Return

0 if devres is found and freed, -ENOENT if not found.

void *devres_open_group(structdevice * dev, void * id, gfp_t gfp)¶: Open a new devres group

Parameters

structdevice*dev: Device to open devres group for
void*id: Separator ID
gfp_tgfp: Allocation flags

Description

Open a new devres group fordev withid. Forid, using apointer to an object which won’t be used for another group isrecommended. Ifid is NULL, address-wise unique ID is created.

Return

ID of the new group, NULL on failure.

voiddevres_close_group(structdevice * dev, void * id)¶: Close a devres group

Parameters

structdevice*dev: Device to close devres group for
void*id: ID of target group, can be NULL

Description

Close the group identified byid. Ifid is NULL, the latest opengroup is selected.

voiddevres_remove_group(structdevice * dev, void * id)¶: Remove a devres group

Parameters

structdevice*dev: Device to remove group for
void*id: ID of target group, can be NULL

Description

Remove the group identified byid. Ifid is NULL, the latestopen group is selected. Note that removing a group doesn’t affectany other resources.

intdevres_release_group(structdevice * dev, void * id)¶: Release resources in a devres group

Parameters

structdevice*dev: Device to release group for
void*id: ID of target group, can be NULL

Description

Release all resources in the group identified byid. Ifid isNULL, the latest open group is selected. The selected group andgroups properly nested inside the selected group are removed.

Return

The number of released non-group resources.

intdevm_add_action(structdevice * dev, void (*action)(void *), void * data)¶: add a custom action to list of managed resources

Parameters

structdevice*dev: Device that owns the action
void(*)(void*)action: Function that should be called
void*data: Pointer to data passed toaction implementation

Description

This adds a custom action to the list of managed resources so thatit gets executed as part of standard resource unwinding.

voiddevm_remove_action(structdevice * dev, void (*action)(void *), void * data)¶: removes previously added custom action

Parameters

structdevice*dev: Device that owns the action
void(*)(void*)action: Function implementing the action
void*data: Pointer to data passed toaction implementation

Description

Removes instance ofaction previously added bydevm_add_action().Both action and data should match one of the existing entries.

voiddevm_release_action(structdevice * dev, void (*action)(void *), void * data)¶: release previously added custom action

Parameters

structdevice*dev: Device that owns the action
void(*)(void*)action: Function implementing the action
void*data: Pointer to data passed toaction implementation

Description

Releases and removes instance ofaction previously added bydevm_add_action(). Both action and data should match one of theexisting entries.

void *devm_kmalloc(structdevice * dev, size_t size, gfp_t gfp)¶: Resource-managed kmalloc

Parameters

structdevice*dev: Device to allocate memory for
size_tsize: Allocation size
gfp_tgfp: Allocation gfp flags

Description

Managed kmalloc. Memory allocated with this function isautomatically freed on driver detach. Like all other devresresources, guaranteed alignment is unsigned long long.

Return

Pointer to allocated memory on success, NULL on failure.

char *devm_kstrdup(structdevice * dev, const char * s, gfp_t gfp)¶: Allocate resource managed space and copy an existing string into that.

Parameters

structdevice*dev: Device to allocate memory for
constchar*s: the string to duplicate
gfp_tgfp: the GFP mask used in thedevm_kmalloc() call whenallocating memory

Return

Pointer to allocated string on success, NULL on failure.

const char *devm_kstrdup_const(structdevice * dev, const char * s, gfp_t gfp)¶: resource managed conditional string duplication

Parameters

structdevice*dev: device for which to duplicate the string
constchar*s: the string to duplicate
gfp_tgfp: the GFP mask used in thekmalloc() call when allocating memory

Description

Strings allocated by devm_kstrdup_const will be automatically freed whenthe associated device is detached.

Return

Source string if it is in .rodata section otherwise it falls back todevm_kstrdup.

char *devm_kvasprintf(structdevice * dev, gfp_t gfp, const char * fmt, va_list ap)¶: Allocate resource managed space and format a string into that.

Parameters

structdevice*dev: Device to allocate memory for
gfp_tgfp: the GFP mask used in thedevm_kmalloc() call whenallocating memory
constchar*fmt: The printf()-style format string
va_listap: Arguments for the format string

Return

Pointer to allocated string on success, NULL on failure.

char *devm_kasprintf(structdevice * dev, gfp_t gfp, const char * fmt, ...)¶: Allocate resource managed space and format a string into that.

Parameters

structdevice*dev: Device to allocate memory for
gfp_tgfp: the GFP mask used in thedevm_kmalloc() call whenallocating memory
constchar*fmt: The printf()-style format string
...: Arguments for the format string

Return

Pointer to allocated string on success, NULL on failure.

voiddevm_kfree(structdevice * dev, const void * p)¶: Resource-managed kfree

Parameters

structdevice*dev: Device this memory belongs to
constvoid*p: Memory to free

Description

Free memory allocated withdevm_kmalloc().

void *devm_kmemdup(structdevice * dev, const void * src, size_t len, gfp_t gfp)¶: Resource-managed kmemdup

Parameters

structdevice*dev: Device this memory belongs to
constvoid*src: Memory region to duplicate
size_tlen: Memory region length
gfp_tgfp: GFP mask to use

Description

Duplicate region of a memory using resource managed kmalloc

unsigned longdevm_get_free_pages(structdevice * dev, gfp_t gfp_mask, unsigned int order)¶: Resource-managed __get_free_pages

Parameters

structdevice*dev: Device to allocate memory for
gfp_tgfp_mask: Allocation gfp flags
unsignedintorder: Allocation size is (1 << order) pages

Description

Managed get_free_pages. Memory allocated with this function isautomatically freed on driver detach.

Return

Address of allocated memory on success, 0 on failure.

voiddevm_free_pages(structdevice * dev, unsigned long addr)¶: Resource-managed free_pages

Parameters

structdevice*dev: Device this memory belongs to
unsignedlongaddr: Memory to free

Description

Free memory allocated withdevm_get_free_pages(). Unlike free_pages,there is no need to supply theorder.

void __percpu *__devm_alloc_percpu(structdevice * dev, size_t size, size_t align)¶: Resource-managed alloc_percpu

Parameters

structdevice*dev: Device to allocate per-cpu memory for
size_tsize: Size of per-cpu memory to allocate
size_talign: Alignment of per-cpu memory to allocate

Description

Managed alloc_percpu. Per-cpu memory allocated with this function isautomatically freed on driver detach.

Return

Pointer to allocated memory on success, NULL on failure.

voiddevm_free_percpu(structdevice * dev, void __percpu * pdata)¶: Resource-managed free_percpu

Parameters

structdevice*dev: Device this memory belongs to
void__percpu*pdata: Per-cpu memory to free

Description

Free memory allocated withdevm_alloc_percpu().

Movatterモバイル変換

Driver Basics¶

Driver Entry and Exit points¶

Driver device table¶

Delaying, scheduling, and timer routines¶

Wait queues and Wake events¶

High-resolution timers¶

Workqueues and Kevents¶

Internal Functions¶

Reference counting¶

Atomics¶

Kernel objects manipulation¶

Kernel utility functions¶

Device Resource Management¶