Workqueue¶
- Date:
September, 2010
- Author:
Tejun Heo <tj@kernel.org>
- Author:
Florian Mickler <florian@mickler.org>
Introduction¶
There are many cases where an asynchronous process execution contextis needed and the workqueue (wq) API is the most commonly usedmechanism for such cases.
When such an asynchronous execution context is needed, a work itemdescribing which function to execute is put on a queue. Anindependent thread serves as the asynchronous execution context. Thequeue is called workqueue and the thread is called worker.
While there are work items on the workqueue the worker executes thefunctions associated with the work items one after the other. Whenthere is no work item left on the workqueue the worker becomes idle.When a new work item gets queued, the worker begins executing again.
Why Concurrency Managed Workqueue?¶
In the original wq implementation, a multi threaded (MT) wq had oneworker thread per CPU and a single threaded (ST) wq had one workerthread system-wide. A single MT wq needed to keep around the samenumber of workers as the number of CPUs. The kernel grew a lot of MTwq users over the years and with the number of CPU cores continuouslyrising, some systems saturated the default 32k PID space just bootingup.
Although MT wq wasted a lot of resource, the level of concurrencyprovided was unsatisfactory. The limitation was common to both ST andMT wq albeit less severe on MT. Each wq maintained its own separateworker pool. An MT wq could provide only one execution context per CPUwhile an ST wq one for the whole system. Work items had to compete forthose very limited execution contexts leading to various problemsincluding proneness to deadlocks around the single execution context.
The tension between the provided level of concurrency and resourceusage also forced its users to make unnecessary tradeoffs like libatachoosing to use ST wq for polling PIOs and accepting an unnecessarylimitation that no two polling PIOs can progress at the same time. AsMT wq don’t provide much better concurrency, users which requirehigher level of concurrency, like async or fscache, had to implementtheir own thread pool.
Concurrency Managed Workqueue (cmwq) is a reimplementation of wq withfocus on the following goals.
Maintain compatibility with the original workqueue API.
Use per-CPU unified worker pools shared by all wq to provideflexible level of concurrency on demand without wasting a lot ofresource.
Automatically regulate worker pool and level of concurrency so thatthe API users don’t need to worry about such details.
The Design¶
In order to ease the asynchronous execution of functions a newabstraction, the work item, is introduced.
A work item is a simplestructthat holds a pointer to the functionthat is to be executed asynchronously. Whenever a driver or subsystemwants a function to be executed asynchronously it has to set up a workitem pointing to that function and queue that work item on aworkqueue.
A work item can be executed in either a thread or the BH (softirq) context.
For threaded workqueues, special purpose threads, called [k]workers, executethe functions off of the queue, one after the other. If no work is queued,the worker threads become idle. These worker threads are managed inworker-pools.
The cmwq design differentiates between the user-facing workqueues thatsubsystems and drivers queue work items on and the backend mechanismwhich manages worker-pools and processes the queued work items.
There are two worker-pools, one for normal work items and the otherfor high priority ones, for each possible CPU and some extraworker-pools to serve work items queued on unbound workqueues - thenumber of these backing pools is dynamic.
BH workqueues use the same framework. However, as there can only be oneconcurrent execution context, there’s no need to worry about concurrency.Each per-CPU BH worker pool contains only one pseudo worker which representsthe BH execution context. A BH workqueue can be considered a convenienceinterface to softirq.
Subsystems and drivers can create and queue work items through specialworkqueue API functions as they see fit. They can influence someaspects of the way the work items are executed by setting flags on theworkqueue they are putting the work item on. These flags includethings like CPU locality, concurrency limits, priority and more. Toget a detailed overview refer to the API description ofalloc_workqueue() below.
When a work item is queued to a workqueue, the target worker-pool isdetermined according to the queue parameters and workqueue attributesand appended on the shared worklist of the worker-pool. For example,unless specifically overridden, a work item of a bound workqueue willbe queued on the worklist of either normal or highpri worker-pool thatis associated to the CPU the issuer is running on.
For any thread pool implementation, managing the concurrency level(how many execution contexts are active) is an important issue. cmwqtries to keep the concurrency at a minimal but sufficient level.Minimal to save resources and sufficient in that the system is used atits full capacity.
Each worker-pool bound to an actual CPU implements concurrencymanagement by hooking into the scheduler. The worker-pool is notifiedwhenever an active worker wakes up or sleeps and keeps track of thenumber of the currently runnable workers. Generally, work items arenot expected to hog a CPU and consume many cycles. That meansmaintaining just enough concurrency to prevent work processing fromstalling should be optimal. As long as there are one or more runnableworkers on the CPU, the worker-pool doesn’t start execution of a newwork, but, when the last running worker goes to sleep, it immediatelyschedules a new worker so that the CPU doesn’t sit idle while thereare pending work items. This allows using a minimal number of workerswithout losing execution bandwidth.
Keeping idle workers around doesn’t cost other than the memory spacefor kthreads, so cmwq holds onto idle ones for a while before killingthem.
For unbound workqueues, the number of backing pools is dynamic.Unbound workqueue can be assigned custom attributes usingapply_workqueue_attrs() and workqueue will automatically createbacking worker pools matching the attributes. The responsibility ofregulating concurrency level is on the users. There is also a flag tomark a bound wq to ignore the concurrency management. Please refer tothe API section for details.
Forward progress guarantee relies on that workers can be created whenmore execution contexts are necessary, which in turn is guaranteedthrough the use of rescue workers. All work items which might be usedon code paths that handle memory reclaim are required to be queued onwq’s that have a rescue-worker reserved for execution under memorypressure. Else it is possible that the worker-pool deadlocks waitingfor execution contexts to free up.
Application Programming Interface (API)¶
alloc_workqueue() allocates a wq. The originalcreate_*workqueue() functions are deprecated and scheduled forremoval.alloc_workqueue() takes three arguments -@name,@flags and@max_active.@name is the name of the wq andalso used as the name of the rescuer thread if there is one.
A wq no longer manages execution resources but serves as a domain forforward progress guarantee, flush and work item attributes.@flagsand@max_active control how work items are assigned executionresources, scheduled and executed.
flags¶
WQ_BHBH workqueues can be considered a convenience interface to softirq. BHworkqueues are always per-CPU and all BH work items are executed in thequeueing CPU’s softirq context in the queueing order.
All BH workqueues must have 0
max_activeandWQ_HIGHPRIis theonly allowed additional flag.BH work items cannot sleep. All other features such as delayed queueing,flushing and canceling are supported.
WQ_PERCPUWork items queued to a per-cpu wq are bound to a specific CPU.This flag is the right choice when cpu locality is important.
This flag is the complement of
WQ_UNBOUND.WQ_UNBOUNDWork items queued to an unbound wq are served by the specialworker-pools which host workers which are not bound to anyspecific CPU. This makes the wq behave as a simple executioncontext provider without concurrency management. The unboundworker-pools try to start execution of work items as soon aspossible. Unbound wq sacrifices locality but is useful forthe following cases.
Wide fluctuation in the concurrency level requirement isexpected and using bound wq may end up creating large numberof mostly unused workers across different CPUs as the issuerhops through different CPUs.
Long running CPU intensive workloads which can be bettermanaged by the system scheduler.
WQ_FREEZABLEA freezable wq participates in the freeze phase of the systemsuspend operations. Work items on the wq are drained and nonew work item starts execution until thawed.
WQ_MEM_RECLAIMAll wq which might be used in the memory reclaim pathsMUSThave this flag set. The wq is guaranteed to have at least oneexecution context regardless of memory pressure.
WQ_HIGHPRIWork items of a highpri wq are queued to the highpriworker-pool of the target cpu. Highpri worker-pools areserved by worker threads with elevated nice level.
Note that normal and highpri worker-pools don’t interact witheach other. Each maintains its separate pool of workers andimplements concurrency management among its workers.
WQ_CPU_INTENSIVEWork items of a CPU intensive wq do not contribute to theconcurrency level. In other words, runnable CPU intensivework items will not prevent other work items in the sameworker-pool from starting execution. This is useful for boundwork items which are expected to hog CPU cycles so that theirexecution is regulated by the system scheduler.
Although CPU intensive work items don’t contribute to theconcurrency level, start of their executions is stillregulated by the concurrency management and runnablenon-CPU-intensive work items can delay execution of CPUintensive work items.
This flag is meaningless for unbound wq.
max_active¶
@max_active determines the maximum number of execution contexts perCPU which can be assigned to the work items of a wq. For example, with@max_active of 16, at most 16 work items of the wq can be executingat the same time per CPU. This is always a per-CPU attribute, even forunbound workqueues.
The maximum limit for@max_active is 2048 and the default value usedwhen 0 is specified is 1024. These values are chosen sufficiently highsuch that they are not the limiting factor while providing protection inrunaway cases.
The number of active work items of a wq is usually regulated by theusers of the wq, more specifically, by how many work items the usersmay queue at the same time. Unless there is a specific need forthrottling the number of active work items, specifying ‘0’ isrecommended.
Some users depend on strict execution ordering where only one work itemis in flight at any given time and the work items are processed inqueueing order. While the combination of@max_active of 1 andWQ_UNBOUND used to achieve this behavior, this is no longer thecase. Usealloc_ordered_workqueue() instead.
Example Execution Scenarios¶
The following example execution scenarios try to illustrate how cmwqbehave under different configurations.
Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU.w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5msagain before finishing. w1 and w2 burn CPU for 5ms then sleep for10ms.
Ignoring all other tasks, works and processing overhead, and assumingsimple FIFO scheduling, the following is one highly simplified versionof possible sequences of events with the original wq.
TIME IN MSECS EVENT0 w0 starts and burns CPU5 w0 sleeps15 w0 wakes up and burns CPU20 w0 finishes20 w1 starts and burns CPU25 w1 sleeps35 w1 wakes up and finishes35 w2 starts and burns CPU40 w2 sleeps50 w2 wakes up and finishes
And with cmwq with@max_active >= 3,
TIME IN MSECS EVENT0 w0 starts and burns CPU5 w0 sleeps5 w1 starts and burns CPU10 w1 sleeps10 w2 starts and burns CPU15 w2 sleeps15 w0 wakes up and burns CPU20 w0 finishes20 w1 wakes up and finishes25 w2 wakes up and finishes
If@max_active == 2,
TIME IN MSECS EVENT0 w0 starts and burns CPU5 w0 sleeps5 w1 starts and burns CPU10 w1 sleeps15 w0 wakes up and burns CPU20 w0 finishes20 w1 wakes up and finishes20 w2 starts and burns CPU25 w2 sleeps35 w2 wakes up and finishes
Now, let’s assume w1 and w2 are queued to a different wq q1 which hasWQ_CPU_INTENSIVE set,
TIME IN MSECS EVENT0 w0 starts and burns CPU5 w0 sleeps5 w1 and w2 start and burn CPU10 w1 sleeps15 w2 sleeps15 w0 wakes up and burns CPU20 w0 finishes20 w1 wakes up and finishes25 w2 wakes up and finishes
Guidelines¶
Do not forget to use
WQ_MEM_RECLAIMif a wq may process workitems which are used during memory reclaim. Each wq withWQ_MEM_RECLAIMset has an execution context reserved for it. Ifthere is dependency among multiple work items used during memoryreclaim, they should be queued to separate wq each withWQ_MEM_RECLAIM.Unless strict ordering is required, there is no need to use ST wq.
Unless there is a specific need, using 0 for @max_active isrecommended. In most use cases, concurrency level usually stayswell under the default limit.
A wq serves as a domain for forward progress guarantee(
WQ_MEM_RECLAIM, flush and work item attributes. Work itemswhich are not involved in memory reclaim and don’t need to beflushed as a part of a group of work items, and don’t require anyspecial attribute, can use one of the system wq. There is nodifference in execution characteristics between using a dedicated wqand a system wq.Note: If something may generate more than @max_active outstandingwork items (do stress test your producers), it may saturate a systemwq and potentially lead to deadlock. It should utilize its owndedicated workqueue rather than the system wq.
Unless work items are expected to consume a huge amount of CPUcycles, using a bound wq is usually beneficial due to the increasedlevel of locality in wq operations and work item execution.
Affinity Scopes¶
An unbound workqueue groups CPUs according to its affinity scope to improvecache locality. For example, if a workqueue is using the default affinityscope of “cache”, it will group CPUs according to last level cacheboundaries. A work item queued on the workqueue will be assigned to a workeron one of the CPUs which share the last level cache with the issuing CPU.Once started, the worker may or may not be allowed to move outside the scopedepending on theaffinity_strict setting of the scope.
Workqueue currently supports the following affinity scopes.
defaultUse the scope in module parameter
workqueue.default_affinity_scopewhich is always set to one of the scopes below.cpuCPUs are not grouped. A work item issued on one CPU is processed by aworker on the same CPU. This makes unbound workqueues behave as per-cpuworkqueues without concurrency management.
smtCPUs are grouped according to SMT boundaries. This usually means that thelogical threads of each physical CPU core are grouped together.
cacheCPUs are grouped according to cache boundaries. Which specific cacheboundary is used is determined by the arch code. L3 is used in a lot ofcases. This is the default affinity scope.
numaCPUs are grouped according to NUMA boundaries.
systemAll CPUs are put in the same group. Workqueue makes no effort to process awork item on a CPU close to the issuing CPU.
The default affinity scope can be changed with the module parameterworkqueue.default_affinity_scope and a specific workqueue’s affinityscope can be changed usingapply_workqueue_attrs().
IfWQ_SYSFS is set, the workqueue will have the following affinity scoperelated interface files under its/sys/devices/virtual/workqueue/WQ_NAME/directory.
affinity_scopeRead to see the current affinity scope. Write to change.
When default is the current scope, reading this file will also show thecurrent effective scope in parentheses, for example,
default(cache).affinity_strict0 by default indicating that affinity scopes are not strict. When a workitem starts execution, workqueue makes a best-effort attempt to ensurethat the worker is inside its affinity scope, which is calledrepatriation. Once started, the scheduler is free to move the workeranywhere in the system as it sees fit. This enables benefiting from scopelocality while still being able to utilize other CPUs if necessary andavailable.
If set to 1, all workers of the scope are guaranteed always to be in thescope. This may be useful when crossing affinity scopes has otherimplications, for example, in terms of power consumption or workloadisolation. Strict NUMA scope can also be used to match the workqueuebehavior of older kernels.
Affinity Scopes and Performance¶
It’d be ideal if an unbound workqueue’s behavior is optimal for vastmajority of use cases without further tuning. Unfortunately, in the currentkernel, there exists a pronounced trade-off between locality and utilizationnecessitating explicit configurations when workqueues are heavily used.
Higher locality leads to higher efficiency where more work is performed forthe same number of consumed CPU cycles. However, higher locality may alsocause lower overall system utilization if the work items are not spreadenough across the affinity scopes by the issuers. The following performancetesting with dm-crypt clearly illustrates this trade-off.
The tests are run on a CPU with 12-cores/24-threads split across four L3caches (AMD Ryzen 9 3900x). CPU clock boost is turned off for consistency./dev/dm-0 is a dm-crypt device created on NVME SSD (Samsung 990 PRO) andopened withcryptsetup with default settings.
Scenario 1: Enough issuers and work spread across the machine¶
The command used:
$ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k --ioengine=libaio \ --iodepth=64 --runtime=60 --numjobs=24 --time_based --group_reporting \ --name=iops-test-job --verify=sha512
There are 24 issuers, each issuing 64 IOs concurrently.--verify=sha512makesfio generate and read back the content each time which makesexecution locality matter between the issuer andkcryptd. The followingare the read bandwidths and CPU utilizations depending on different affinityscope settings onkcryptd measured over five runs. Bandwidths are inMiBps, and CPU util in percents.
Affinity | Bandwidth (MiBps) | CPU util (%) |
|---|---|---|
system | 1159.40 ±1.34 | 99.31 ±0.02 |
cache | 1166.40 ±0.89 | 99.34 ±0.01 |
cache (strict) | 1166.00 ±0.71 | 99.35 ±0.01 |
With enough issuers spread across the system, there is no downside to“cache”, strict or otherwise. All three configurations saturate the wholemachine but the cache-affine ones outperform by 0.6% thanks to improvedlocality.
Scenario 2: Fewer issuers, enough work for saturation¶
The command used:
$ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k \ --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=8 \ --time_based --group_reporting --name=iops-test-job --verify=sha512
The only difference from the previous scenario is--numjobs=8. There area third of the issuers but is still enough total work to saturate thesystem.
Affinity | Bandwidth (MiBps) | CPU util (%) |
|---|---|---|
system | 1155.40 ±0.89 | 97.41 ±0.05 |
cache | 1154.40 ±1.14 | 96.15 ±0.09 |
cache (strict) | 1112.00 ±4.64 | 93.26 ±0.35 |
This is more than enough work to saturate the system. Both “system” and“cache” are nearly saturating the machine but not fully. “cache” is usingless CPU but the better efficiency puts it at the same bandwidth as“system”.
Eight issuers moving around over four L3 cache scope still allow “cache(strict)” to mostly saturate the machine but the loss of work conservationis now starting to hurt with 3.7% bandwidth loss.
Scenario 3: Even fewer issuers, not enough work to saturate¶
The command used:
$ fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=32k \ --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=4 \ --time_based --group_reporting --name=iops-test-job --verify=sha512
Again, the only difference is--numjobs=4. With the number of issuersreduced to four, there now isn’t enough work to saturate the whole systemand the bandwidth becomes dependent on completion latencies.
Affinity | Bandwidth (MiBps) | CPU util (%) |
|---|---|---|
system | 993.60 ±1.82 | 75.49 ±0.06 |
cache | 973.40 ±1.52 | 74.90 ±0.07 |
cache (strict) | 828.20 ±4.49 | 66.84 ±0.29 |
Now, the tradeoff between locality and utilization is clearer. “cache” shows2% bandwidth loss compared to “system” and “cache (struct)” whopping 20%.
Conclusion and Recommendations¶
In the above experiments, the efficiency advantage of the “cache” affinityscope over “system” is, while consistent and noticeable, small. However, theimpact is dependent on the distances between the scopes and may be morepronounced in processors with more complex topologies.
While the loss of work-conservation in certain scenarios hurts, it is a lotbetter than “cache (strict)” and maximizing workqueue utilization isunlikely to be the common case anyway. As such, “cache” is the defaultaffinity scope for unbound pools.
As there is no one option which is great for most cases, workqueue usagesthat may consume a significant amount of CPU are recommended to configurethe workqueues using
apply_workqueue_attrs()and/or enableWQ_SYSFS.An unbound workqueue with strict “cpu” affinity scope behaves the same as
WQ_CPU_INTENSIVEper-cpu workqueue. There is no real advanage to thelatter and an unbound workqueue provides a lot more flexibility.Affinity scopes are introduced in Linux v6.5. To emulate the previousbehavior, use strict “numa” affinity scope.
The loss of work-conservation in non-strict affinity scopes is likelyoriginating from the scheduler. There is no theoretical reason why thekernel wouldn’t be able to do the right thing and maintainwork-conservation in most cases. As such, it is possible that futurescheduler improvements may make most of these tunables unnecessary.
Examining Configuration¶
Use tools/workqueue/wq_dump.py to examine unbound CPU affinityconfiguration, worker pools and how workqueues map to the pools:
$ tools/workqueue/wq_dump.pyAffinity Scopes===============wq_unbound_cpumask=0000000fCPU nr_pods 4 pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 pod_node [0]=0 [1]=0 [2]=1 [3]=1 cpu_pod [0]=0 [1]=1 [2]=2 [3]=3SMT nr_pods 4 pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 pod_node [0]=0 [1]=0 [2]=1 [3]=1 cpu_pod [0]=0 [1]=1 [2]=2 [3]=3CACHE (default) nr_pods 2 pod_cpus [0]=00000003 [1]=0000000c pod_node [0]=0 [1]=1 cpu_pod [0]=0 [1]=0 [2]=1 [3]=1NUMA nr_pods 2 pod_cpus [0]=00000003 [1]=0000000c pod_node [0]=0 [1]=1 cpu_pod [0]=0 [1]=0 [2]=1 [3]=1SYSTEM nr_pods 1 pod_cpus [0]=0000000f pod_node [0]=-1 cpu_pod [0]=0 [1]=0 [2]=0 [3]=0Worker Pools============pool[00] ref= 1 nice= 0 idle/workers= 4/ 4 cpu= 0pool[01] ref= 1 nice=-20 idle/workers= 2/ 2 cpu= 0pool[02] ref= 1 nice= 0 idle/workers= 4/ 4 cpu= 1pool[03] ref= 1 nice=-20 idle/workers= 2/ 2 cpu= 1pool[04] ref= 1 nice= 0 idle/workers= 4/ 4 cpu= 2pool[05] ref= 1 nice=-20 idle/workers= 2/ 2 cpu= 2pool[06] ref= 1 nice= 0 idle/workers= 3/ 3 cpu= 3pool[07] ref= 1 nice=-20 idle/workers= 2/ 2 cpu= 3pool[08] ref=42 nice= 0 idle/workers= 6/ 6 cpus=0000000fpool[09] ref=28 nice= 0 idle/workers= 3/ 3 cpus=00000003pool[10] ref=28 nice= 0 idle/workers= 17/ 17 cpus=0000000cpool[11] ref= 1 nice=-20 idle/workers= 1/ 1 cpus=0000000fpool[12] ref= 2 nice=-20 idle/workers= 1/ 1 cpus=00000003pool[13] ref= 2 nice=-20 idle/workers= 1/ 1 cpus=0000000cWorkqueue CPU -> pool=====================[ workqueue \ CPU 0 1 2 3 dfl]events percpu 0 2 4 6events_highpri percpu 1 3 5 7events_long percpu 0 2 4 6events_unbound unbound 9 9 10 10 8events_freezable percpu 0 2 4 6events_power_efficient percpu 0 2 4 6events_freezable_pwr_ef percpu 0 2 4 6rcu_gp percpu 0 2 4 6rcu_par_gp percpu 0 2 4 6slub_flushwq percpu 0 2 4 6netns ordered 8 8 8 8 8...
See the command’s help message for more info.
Monitoring¶
Use tools/workqueue/wq_monitor.py to monitor workqueue operations:
$ tools/workqueue/wq_monitor.py events total infl CPUtime CPUhog CMW/RPR mayday rescuedevents 18545 0 6.1 0 5 - -events_highpri 8 0 0.0 0 0 - -events_long 3 0 0.0 0 0 - -events_unbound 38306 0 0.1 - 7 - -events_freezable 0 0 0.0 0 0 - -events_power_efficient 29598 0 0.2 0 0 - -events_freezable_pwr_ef 10 0 0.0 0 0 - -sock_diag_events 0 0 0.0 0 0 - - total infl CPUtime CPUhog CMW/RPR mayday rescuedevents 18548 0 6.1 0 5 - -events_highpri 8 0 0.0 0 0 - -events_long 3 0 0.0 0 0 - -events_unbound 38322 0 0.1 - 7 - -events_freezable 0 0 0.0 0 0 - -events_power_efficient 29603 0 0.2 0 0 - -events_freezable_pwr_ef 10 0 0.0 0 0 - -sock_diag_events 0 0 0.0 0 0 - -...
See the command’s help message for more info.
Debugging¶
Because the work functions are executed by generic worker threadsthere are a few tricks needed to shed some light on misbehavingworkqueue users.
Worker threads show up in the process list as:
root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1]root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2]root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0]root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0]
If kworkers are going crazy (using too much cpu), there are two typesof possible problems:
Something being scheduled in rapid succession
A single work item that consumes lots of cpu cycles
The first one can be tracked using tracing:
$ echo workqueue:workqueue_queue_work > /sys/kernel/tracing/set_event$ cat /sys/kernel/tracing/trace_pipe > out.txt(wait a few secs)^C
If something is busy looping on work queueing, it would be dominatingthe output and the offender can be determined with the work itemfunction.
For the second type of problems it should be possible to just checkthe stack trace of the offending worker thread.
$ cat /proc/THE_OFFENDING_KWORKER/stack
The work item’s function should be trivially visible in the stacktrace.
Non-reentrance Conditions¶
Workqueue guarantees that a work item cannot be re-entrant if the followingconditions hold after a work item gets queued:
The work function hasn’t been changed.
No one queues the work item to another workqueue.
The work item hasn’t been reinitiated.
In other words, if the above conditions hold, the work item is guaranteed to beexecuted by at most one worker system-wide at any given time.
Note that requeuing the work item (to the same queue) in the self functiondoesn’t break these conditions, so it’s safe to do. Otherwise, caution isrequired when breaking the conditions inside a work function.
Kernel Inline Documentations Reference¶
- structworkqueue_attrs¶
A struct for workqueue attributes.
Definition:
struct workqueue_attrs { int nice; cpumask_var_t cpumask; cpumask_var_t __pod_cpumask; bool affn_strict; enum wq_affn_scope affn_scope; bool ordered;};Members
nicenice level
cpumaskallowed CPUs
Work items in this workqueue are affine to these CPUs and not allowedto execute on other CPUs. A pool serving a workqueue must have thesamecpumask.
__pod_cpumaskinternal attribute used to create per-pod pools
Internal use only.
Per-pod unbound worker pools are used to improve locality. Always asubset of ->cpumask. A workqueue can be associated with multipleworker pools with disjoint__pod_cpumask’s. Whether the enforcementof a pool’s__pod_cpumask is strict depends onaffn_strict.
affn_strictaffinity scope is strict
If clear, workqueue will make a best-effort attempt at starting theworker inside__pod_cpumask but the scheduler is free to migrate itoutside.
If set, workers are only allowed to run inside__pod_cpumask.
affn_scopeunbound CPU affinity scope
CPU pods are used to improve execution locality of unbound workitems. There are multiple pod types, one for each wq_affn_scope, andevery CPU in the system belongs to one pod in every pod type. CPUsthat belong to the same pod share the worker pool. For example,selecting
WQ_AFFN_NUMAmakes the workqueue use a separate workerpool for each NUMA node.orderedwork items must be executed one by one in queueing order
Description
This can be used to change attributes of an unbound workqueue.
- work_pending¶
work_pending(work)
Find out whether a work item is currently pending
Parameters
workThe work item in question
- delayed_work_pending¶
delayed_work_pending(w)
Find out whether a delayable work item is currently pending
Parameters
wThe work item in question
- structworkqueue_struct*alloc_workqueue(constchar*fmt,unsignedintflags,intmax_active,...)¶
allocate a workqueue
Parameters
constchar*fmtprintf format for the name of the workqueue
unsignedintflagsWQ_* flags
intmax_activemax in-flight work items, 0 for default
...args forfmt
Description
For a per-cpu workqueue,max_active limits the number of in-flight workitems for each CPU. e.g.max_active of 1 indicates that each CPU can beexecuting at most one work item for the workqueue.
For unbound workqueues,max_active limits the number of in-flight work itemsfor the whole system. e.g.max_active of 16 indicates that there can beat most 16 work items executing for the workqueue in the whole system.
As sharing the same active counter for an unbound workqueue across multipleNUMA nodes can be expensive,max_active is distributed to each NUMA nodeaccording to the proportion of the number of online CPUs and enforcedindependently.
Depending on online CPU distribution, a node may end up with per-nodemax_active which is significantly lower thanmax_active, which can lead todeadlocks if the per-node concurrency limit is lower than the maximum numberof interdependent work items for the workqueue.
To guarantee forward progress regardless of online CPU distribution, theconcurrency limit on every node is guaranteed to be equal to or greater thanmin_active which is set to min(max_active,WQ_DFL_MIN_ACTIVE). This meansthat the sum of per-node max_active’s may be larger thanmax_active.
For detailed information onWQ_* flags, please refer toWorkqueue.
Return
Pointer to the allocated workqueue on success,NULL on failure.
- structworkqueue_struct*alloc_workqueue_lockdep_map(constchar*fmt,unsignedintflags,intmax_active,structlockdep_map*lockdep_map,...)¶
allocate a workqueue with user-defined lockdep_map
Parameters
constchar*fmtprintf format for the name of the workqueue
unsignedintflagsWQ_* flags
intmax_activemax in-flight work items, 0 for default
structlockdep_map*lockdep_mapuser-defined lockdep_map
...args forfmt
Description
Same as alloc_workqueue but with the a user-define lockdep_map. Useful forworkqueues created with the same purpose and to avoid leaking a lockdep_mapon each workqueue creation.
Return
Pointer to the allocated workqueue on success,NULL on failure.
- alloc_ordered_workqueue_lockdep_map¶
alloc_ordered_workqueue_lockdep_map(fmt,flags,lockdep_map,args...)
allocate an ordered workqueue with user-defined lockdep_map
Parameters
fmtprintf format for the name of the workqueue
flagsWQ_* flags (only WQ_FREEZABLE and WQ_MEM_RECLAIM are meaningful)
lockdep_mapuser-defined lockdep_map
args...args forfmt
Description
Same as alloc_ordered_workqueue but with the a user-define lockdep_map.Useful for workqueues created with the same purpose and to avoid leaking alockdep_map on each workqueue creation.
Return
Pointer to the allocated workqueue on success,NULL on failure.
- alloc_ordered_workqueue¶
alloc_ordered_workqueue(fmt,flags,args...)
allocate an ordered workqueue
Parameters
fmtprintf format for the name of the workqueue
flagsWQ_* flags (only WQ_FREEZABLE and WQ_MEM_RECLAIM are meaningful)
args...args forfmt
Description
Allocate an ordered workqueue. An ordered workqueue executes atmost one work item at any given time in the queued order. They areimplemented as unbound workqueues withmax_active of one.
Return
Pointer to the allocated workqueue on success,NULL on failure.
- boolqueue_work(structworkqueue_struct*wq,structwork_struct*work)¶
queue work on a workqueue
Parameters
structworkqueue_struct*wqworkqueue to use
structwork_struct*workwork to queue
Description
Returnsfalse ifwork was already on a queue,true otherwise.
We queue the work to the CPU on which it was submitted, but if the CPU diesit can be processed by another CPU.
Memory-ordering properties: If it returnstrue, guarantees that all storespreceding the call toqueue_work() in the program order will be visible fromthe CPU which will executework by the time such work executes, e.g.,
{ x is initially 0 }
CPU0 CPU1
WRITE_ONCE(x, 1); [work is being executed ]r0 = queue_work(wq, work); r1 = READ_ONCE(x);
Forbids: r0 == true && r1 == 0
- boolqueue_delayed_work(structworkqueue_struct*wq,structdelayed_work*dwork,unsignedlongdelay)¶
queue work on a workqueue after delay
Parameters
structworkqueue_struct*wqworkqueue to use
structdelayed_work*dworkdelayable work to queue
unsignedlongdelaynumber of jiffies to wait before queueing
Description
Equivalent toqueue_delayed_work_on() but tries to use the local CPU.
- boolmod_delayed_work(structworkqueue_struct*wq,structdelayed_work*dwork,unsignedlongdelay)¶
modify delay of or queue a delayed work
Parameters
structworkqueue_struct*wqworkqueue to use
structdelayed_work*dworkwork to queue
unsignedlongdelaynumber of jiffies to wait before queueing
Description
mod_delayed_work_on() on local CPU.
- boolschedule_work_on(intcpu,structwork_struct*work)¶
put work task on a specific cpu
Parameters
intcpucpu to put the work task on
structwork_struct*workjob to be done
Description
This puts a job on a specific cpu
- boolschedule_work(structwork_struct*work)¶
put work task in global workqueue
Parameters
structwork_struct*workjob to be done
Description
Returnsfalse ifwork was already on the kernel-global workqueue andtrue otherwise.
This puts a job in the kernel-global workqueue if it was not alreadyqueued and leaves it in the same position on the kernel-globalworkqueue otherwise.
Shares the same memory-ordering properties ofqueue_work(), cf. theDocBook header ofqueue_work().
- boolenable_and_queue_work(structworkqueue_struct*wq,structwork_struct*work)¶
Enable and queue a work item on a specific workqueue
Parameters
structworkqueue_struct*wqThe target workqueue
structwork_struct*workThe work item to be enabled and queued
Description
This function combines the operations ofenable_work() andqueue_work(),providing a convenient way to enable and queue a work item in a single call.It invokesenable_work() onwork and then queues it if the disable depthreached 0. Returnstrue if the disable depth reached 0 andwork is queued,andfalse otherwise.
Note thatwork is always queued when disable depth reaches zero. If thedesired behavior is queueing only if certain events took place whilework isdisabled, the user should implement the necessary state tracking and performexplicit conditional queueing afterenable_work().
- boolschedule_delayed_work_on(intcpu,structdelayed_work*dwork,unsignedlongdelay)¶
queue work in global workqueue on CPU after delay
Parameters
intcpucpu to use
structdelayed_work*dworkjob to be done
unsignedlongdelaynumber of jiffies to wait
Description
After waiting for a given time this puts a job in the kernel-globalworkqueue on the specified CPU.
- boolschedule_delayed_work(structdelayed_work*dwork,unsignedlongdelay)¶
put work task in global workqueue after delay
Parameters
structdelayed_work*dworkjob to be done
unsignedlongdelaynumber of jiffies to wait or 0 for immediate execution
Description
After waiting for a given time this puts a job in the kernel-globalworkqueue.
- for_each_pool¶
for_each_pool(pool,pi)
iterate through all worker_pools in the system
Parameters
pooliteration cursor
piinteger used for iteration
Description
This must be called either with wq_pool_mutex held or RCU readlocked. If the pool needs to be used beyond the locking in effect, thecaller is responsible for guaranteeing that the pool stays online.
The if/else clause exists only for the lockdep assertion and can beignored.
- for_each_pool_worker¶
for_each_pool_worker(worker,pool)
iterate through all workers of a worker_pool
Parameters
workeriteration cursor
poolworker_pool to iterate workers of
Description
This must be called with wq_pool_attach_mutex.
The if/else clause exists only for the lockdep assertion and can beignored.
- for_each_pwq¶
for_each_pwq(pwq,wq)
iterate through all pool_workqueues of the specified workqueue
Parameters
pwqiteration cursor
wqthe target workqueue
Description
This must be called either with wq->mutex held or RCU read locked.If the pwq needs to be used beyond the locking in effect, the caller isresponsible for guaranteeing that the pwq stays online.
The if/else clause exists only for the lockdep assertion and can beignored.
- intworker_pool_assign_id(structworker_pool*pool)¶
allocate ID and assign it topool
Parameters
structworker_pool*poolthe pool pointer of interest
Description
Returns 0 if ID in [0, WORK_OFFQ_POOL_NONE) is allocated and assignedsuccessfully, -errno on failure.
- structcpumask*unbound_effective_cpumask(structworkqueue_struct*wq)¶
effective cpumask of an unbound workqueue
Parameters
structworkqueue_struct*wqworkqueue of interest
Description
wq->unbound_attrs->cpumask contains the cpumask requested by the user whichis masked with wq_unbound_cpumask to determine the effective cpumask. Thedefault pwq is always mapped to the pool with the current effective cpumask.
- structworker_pool*get_work_pool(structwork_struct*work)¶
return the worker_pool a given work was associated with
Parameters
structwork_struct*workthe work item of interest
Description
Pools are created and destroyed under wq_pool_mutex, and allows readaccess under RCU read lock. As such, this function should becalled under wq_pool_mutex or inside of arcu_read_lock() region.
All fields of the returned pool are accessible as long as the abovementioned locking is in effect. If the returned pool needs to be usedbeyond the critical section, the caller is responsible for ensuring thereturned pool is and stays online.
Return
The worker_poolwork was last associated with.NULL if none.
- voidworker_set_flags(structworker*worker,unsignedintflags)¶
set worker flags and adjust nr_running accordingly
Parameters
structworker*workerself
unsignedintflagsflags to set
Description
Setflags inworker->flags and adjust nr_running accordingly.
- voidworker_clr_flags(structworker*worker,unsignedintflags)¶
clear worker flags and adjust nr_running accordingly
Parameters
structworker*workerself
unsignedintflagsflags to clear
Description
Clearflags inworker->flags and adjust nr_running accordingly.
Parameters
structworker*workerworker which is entering idle state
Description
worker is entering idle state. Update stats and idle timer ifnecessary.
LOCKING:raw_spin_lock_irq(pool->lock).
Parameters
structworker*workerworker which is leaving idle state
Description
worker is leaving idle state. Update stats.
LOCKING:raw_spin_lock_irq(pool->lock).
- structworker*find_worker_executing_work(structworker_pool*pool,structwork_struct*work)¶
find worker which is executing a work
Parameters
structworker_pool*poolpool of interest
structwork_struct*workwork to find worker for
Description
Find a worker which is executingwork onpool by searchingpool->busy_hash which is keyed by the address ofwork. For a workerto match, its current execution should match the address ofwork andits work function. This is to avoid unwanted dependency betweenunrelated work executions through a work item being recycled while stillbeing executed.
This is a bit tricky. A work item may be freed once its executionstarts and nothing prevents the freed area from being recycled foranother work item. If the same work item address ends up being reusedbefore the original execution finishes, workqueue will identify therecycled work item as currently executing and make it wait until thecurrent execution finishes, introducing an unwanted dependency.
This function checks the work item address and work function to avoidfalse positives. Note that this isn’t complete as one may construct awork function which can introduce dependency onto itself through arecycled work item. Well, if somebody wants to shoot oneself in thefoot that badly, there’s only so much we can do, and if such deadlockactually occurs, it should be easy to locate the culprit work function.
Context
raw_spin_lock_irq(pool->lock).
Return
Pointer to worker which is executingwork if found,NULLotherwise.
- voidmove_linked_works(structwork_struct*work,structlist_head*head,structwork_struct**nextp)¶
move linked works to a list
Parameters
structwork_struct*workstart of series of works to be scheduled
structlist_head*headtarget list to appendwork to
structwork_struct**nextpout parameter for nested worklist walking
Description
Schedule linked works starting fromwork tohead. Work series to bescheduled starts atwork and includes any consecutive work withWORK_STRUCT_LINKED set in its predecessor. Seeassign_work() for details onnextp.
Context
raw_spin_lock_irq(pool->lock).
- boolassign_work(structwork_struct*work,structworker*worker,structwork_struct**nextp)¶
assign a work item and its linked work items to a worker
Parameters
structwork_struct*workwork to assign
structworker*workerworker to assign to
structwork_struct**nextpout parameter for nested worklist walking
Description
Assignwork and its linked work items toworker. Ifwork is already beingexecuted by another worker in the same pool, it’ll be punted there.
Ifnextp is not NULL, it’s updated to point to the next work of the lastscheduled work. This allowsassign_work() to be nested insidelist_for_each_entry_safe().
Returnstrue ifwork was successfully assigned toworker.false ifworkwas punted to another worker already executing it.
- boolkick_pool(structworker_pool*pool)¶
wake up an idle worker if necessary
Parameters
structworker_pool*poolpool to kick
Description
pool may have pending work items. Wake up worker if necessary. Returnswhether a worker was woken up.
- voidwq_worker_running(structtask_struct*task)¶
a worker is running again
Parameters
structtask_struct*tasktask waking up
Description
This function is called when a worker returns fromschedule()
- voidwq_worker_sleeping(structtask_struct*task)¶
a worker is going to sleep
Parameters
structtask_struct*tasktask going to sleep
Description
This function is called fromschedule() when a busy worker isgoing to sleep.
- voidwq_worker_tick(structtask_struct*task)¶
a scheduler tick occurred while a kworker is running
Parameters
structtask_struct*tasktask currently running
Description
Called fromsched_tick(). We’re in the IRQ context and the currentworker’s fields which follow the ‘K’ locking rule can be accessed safely.
- work_func_twq_worker_last_func(structtask_struct*task)¶
retrieve worker’s last work function
Parameters
structtask_struct*taskTask to retrieve last work function of.
Description
Determine the last function a worker executed. This is called fromthe scheduler to get a worker’s last known identity.
This function is called duringschedule() when a kworker is goingto sleep. It’s used by psi to identify aggregation workers duringdequeuing, to allow periodic aggregation to shut-off when thatworker is the last task in the system or cgroup to go to sleep.
As this function doesn’t involve any workqueue-related locking, itonly returns stable values when called from inside the scheduler’squeuing and dequeuing paths, whentask, which must be a kworker,is guaranteed to not be processing any works.
Context
raw_spin_lock_irq(rq->lock)
Return
The last work functioncurrent executed as a worker, NULL if ithasn’t executed any work yet.
- structwq_node_nr_active*wq_node_nr_active(structworkqueue_struct*wq,intnode)¶
Determine wq_node_nr_active to use
Parameters
structworkqueue_struct*wqworkqueue of interest
intnodeNUMA node, can be
NUMA_NO_NODE
Description
Determine wq_node_nr_active to use forwq onnode. Returns:
NULLfor per-cpu workqueues as they don’t need to use shared nr_active.node_nr_active[nr_node_ids] ifnode is
NUMA_NO_NODE.Otherwise, node_nr_active[node].
- voidwq_update_node_max_active(structworkqueue_struct*wq,intoff_cpu)¶
Update per-node max_actives to use
Parameters
structworkqueue_struct*wqworkqueue to update
intoff_cpuCPU that’s going down, -1 if a CPU is not going down
Description
Updatewq->node_nr_active**[]->max. **wq must be unbound. max_active isdistributed among nodes according to the proportions of numbers of onlinecpus. The result is always betweenwq->min_active and max_active.
- voidget_pwq(structpool_workqueue*pwq)¶
get an extra reference on the specified pool_workqueue
Parameters
structpool_workqueue*pwqpool_workqueue to get
Description
Obtain an extra reference onpwq. The caller should guarantee thatpwq has positive refcnt and be holding the matching pool->lock.
- voidput_pwq(structpool_workqueue*pwq)¶
put a pool_workqueue reference
Parameters
structpool_workqueue*pwqpool_workqueue to put
Description
Drop a reference ofpwq. If its refcnt reaches zero, schedule itsdestruction. The caller should be holding the matching pool->lock.
Parameters
structpool_workqueue*pwqpool_workqueue to put (can be
NULL)
Description
put_pwq() with locking. This function also allowsNULLpwq.
- boolpwq_tryinc_nr_active(structpool_workqueue*pwq,boolfill)¶
Try to increment nr_active for a pwq
Parameters
structpool_workqueue*pwqpool_workqueue of interest
boolfillmax_active may have increased, try to increase concurrency level
Description
Try to increment nr_active forpwq. Returnstrue if an nr_active count issuccessfully obtained.false otherwise.
- boolpwq_activate_first_inactive(structpool_workqueue*pwq,boolfill)¶
Activate the first inactive work item on a pwq
Parameters
structpool_workqueue*pwqpool_workqueue of interest
boolfillmax_active may have increased, try to increase concurrency level
Description
Activate the first inactive work item ofpwq if available and allowed bymax_active limit.
Returnstrue if an inactive work item has been activated.false if noinactive work item is found or max_active limit is reached.
- voidunplug_oldest_pwq(structworkqueue_struct*wq)¶
unplug the oldest pool_workqueue
Parameters
structworkqueue_struct*wqworkqueue_struct where its oldest pwq is to be unplugged
Description
This function should only be called for ordered workqueues where only theoldest pwq is unplugged, the others are plugged to suspend execution toensure proper work item ordering:
dfl_pwq --------------+ [P] - plugged | vpwqs -> A -> B [P] -> C [P] (newest) | | | 1 3 5 | | | 2 4 6
When the oldest pwq is drained and removed, this function should be calledto unplug the next oldest one to start its work item execution. Note thatpwq’s are linked into wq->pwqs with the oldest first, so the first one inthe list is the oldest.
- voidnode_activate_pending_pwq(structwq_node_nr_active*nna,structworker_pool*caller_pool)¶
Activate a pending pwq on a wq_node_nr_active
Parameters
structwq_node_nr_active*nnawq_node_nr_active to activate a pending pwq for
structworker_pool*caller_poolworker_pool the caller is locking
Description
Activate a pwq innna->pending_pwqs. Called withcaller_pool locked.caller_pool may be unlocked and relocked to lock other worker_pools.
- voidpwq_dec_nr_active(structpool_workqueue*pwq)¶
Retire an active count
Parameters
structpool_workqueue*pwqpool_workqueue of interest
Description
Decrementpwq’s nr_active and try to activate the first inactive work item.For unbound workqueues, this function may temporarily droppwq->pool->lock.
- voidpwq_dec_nr_in_flight(structpool_workqueue*pwq,unsignedlongwork_data)¶
decrement pwq’s nr_in_flight
Parameters
structpool_workqueue*pwqpwq of interest
unsignedlongwork_datawork_data of work which left the queue
Description
A work either has completed or is removed from pending queue,decrement nr_in_flight of its pwq and handle workqueue flushing.
NOTE
For unbound workqueues, this function may temporarily droppwq->pool->lockand thus should be called after all other state updates for the in-flightwork item is complete.
Context
raw_spin_lock_irq(pool->lock).
- inttry_to_grab_pending(structwork_struct*work,u32cflags,unsignedlong*irq_flags)¶
steal work item from worklist and disable irq
Parameters
structwork_struct*workwork item to steal
u32cflagsWORK_CANCEL_flagsunsignedlong*irq_flagsplace to store irq state
Description
Try to grab PENDING bit ofwork. This function can handlework in anystable state - idle, on timer or on worklist.
1
ifwork was pending and we successfully stole PENDING
0
ifwork was idle and we claimed PENDING
-EAGAIN
if PENDING couldn’t be grabbed at the moment, safe to busy-retry
Note
On >= 0 return, the caller ownswork’s PENDING bit. To avoid gettinginterrupted while holding PENDING andwork off queue, irq must bedisabled on entry. This, combined with delayed_work->timer beingirqsafe, ensures that we return -EAGAIN for finite short period of time.
On successful return, >= 0, irq is disabled and the caller isresponsible for releasing it using local_irq_restore(*irq_flags).
This function is safe to call from any context including IRQ handler.
- boolwork_grab_pending(structwork_struct*work,u32cflags,unsignedlong*irq_flags)¶
steal work item from worklist and disable irq
Parameters
structwork_struct*workwork item to steal
u32cflagsWORK_CANCEL_flagsunsignedlong*irq_flagsplace to store IRQ state
Description
Grab PENDING bit ofwork.work can be in any stable state - idle, on timeror on worklist.
Can be called from any context. IRQ is disabled on return with IRQ statestored in*irq_flags. The caller is responsible for re-enabling it usinglocal_irq_restore().
Returnstrue ifwork was pending.false if idle.
- voidinsert_work(structpool_workqueue*pwq,structwork_struct*work,structlist_head*head,unsignedintextra_flags)¶
insert a work into a pool
Parameters
structpool_workqueue*pwqpwqwork belongs to
structwork_struct*workwork to insert
structlist_head*headinsertion point
unsignedintextra_flagsextra WORK_STRUCT_* flags to set
Description
Insertwork which belongs topwq afterhead.extra_flags is or’d towork_struct flags.
Context
raw_spin_lock_irq(pool->lock).
- boolqueue_work_on(intcpu,structworkqueue_struct*wq,structwork_struct*work)¶
queue work on specific cpu
Parameters
intcpuCPU number to execute work on
structworkqueue_struct*wqworkqueue to use
structwork_struct*workwork to queue
Description
We queue the work to a specific CPU, the caller must ensure itcan’t go away. Callers that fail to ensure that the specifiedCPU cannot go away will execute on a randomly chosen CPU.But note well that callers specifying a CPU that never has beenonline will get a splat.
Return
false ifwork was already on a queue,true otherwise.
- intselect_numa_node_cpu(intnode)¶
Select a CPU based on NUMA node
Parameters
intnodeNUMA node ID that we want to select a CPU from
Description
This function will attempt to find a “random” cpu available on a givennode. If there are no CPUs available on the given node it will returnWORK_CPU_UNBOUND indicating that we should just schedule to anyavailable CPU if we need to schedule this work.
- boolqueue_work_node(intnode,structworkqueue_struct*wq,structwork_struct*work)¶
queue work on a “random” cpu for a given NUMA node
Parameters
intnodeNUMA node that we are targeting the work for
structworkqueue_struct*wqworkqueue to use
structwork_struct*workwork to queue
Description
We queue the work to a “random” CPU within a given NUMA node. The basicidea here is to provide a way to somehow associate work with a givenNUMA node.
This function will only make a best effort attempt at getting this ontothe right NUMA node. If no node is requested or the requested node isoffline then we just fall back to standard queue_work behavior.
Currently the “random” CPU ends up being the first available CPU in theintersection of cpu_online_mask and the cpumask of the node, unless weare running on the node. In that case we just use the current CPU.
Return
false ifwork was already on a queue,true otherwise.
- boolqueue_delayed_work_on(intcpu,structworkqueue_struct*wq,structdelayed_work*dwork,unsignedlongdelay)¶
queue work on specific CPU after delay
Parameters
intcpuCPU number to execute work on
structworkqueue_struct*wqworkqueue to use
structdelayed_work*dworkwork to queue
unsignedlongdelaynumber of jiffies to wait before queueing
Description
We queue the delayed_work to a specific CPU, for non-zero delays thecaller must ensure it is online and can’t go away. Callers that failto ensure this, may getdwork->timer queued to an offlined CPU andthis will prevent queueing ofdwork->work unless the offlined CPUbecomes online again.
Return
false ifwork was already on a queue,true otherwise. Ifdelay is zero anddwork is idle, it will be scheduled for immediateexecution.
- boolmod_delayed_work_on(intcpu,structworkqueue_struct*wq,structdelayed_work*dwork,unsignedlongdelay)¶
modify delay of or queue a delayed work on specific CPU
Parameters
intcpuCPU number to execute work on
structworkqueue_struct*wqworkqueue to use
structdelayed_work*dworkwork to queue
unsignedlongdelaynumber of jiffies to wait before queueing
Description
Ifdwork is idle, equivalent toqueue_delayed_work_on(); otherwise,modifydwork’s timer so that it expires afterdelay. Ifdelay iszero,work is guaranteed to be scheduled immediately regardless of itscurrent state.
This function is safe to call from any context including IRQ handler.Seetry_to_grab_pending() for details.
Return
false ifdwork was idle and queued,true ifdwork waspending and its timer was modified.
- boolqueue_rcu_work(structworkqueue_struct*wq,structrcu_work*rwork)¶
queue work after a RCU grace period
Parameters
structworkqueue_struct*wqworkqueue to use
structrcu_work*rworkwork to queue
Return
false ifrwork was already pending,true otherwise. Notethat a full RCU grace period is guaranteed only after atrue return.Whilerwork is guaranteed to be executed after afalse return, theexecution may happen before a full RCU grace period has passed.
Parameters
structworker*workerworker to be attached
structworker_pool*poolthe target pool
Description
Attachworker topool. Once attached, theWORKER_UNBOUND flag andcpu-binding ofworker are kept coordinated with the pool acrosscpu-[un]hotplugs.
Parameters
structworker*workerworker which is attached to its pool
Description
Undo the attaching which had been done inworker_attach_to_pool(). Thecaller worker shouldn’t access to the pool after detached except it hasother reference to the pool.
- structworker*create_worker(structworker_pool*pool)¶
create a new workqueue worker
Parameters
structworker_pool*poolpool the new worker will belong to
Description
Create and start a new worker which is attached topool.
Context
Might sleep. Does GFP_KERNEL allocations.
Return
Pointer to the newly created worker.
Parameters
structworker*workerworker to be destroyed
structlist_head*listtransfer worker away from its pool->idle_list and into list
Description
Tagworker for destruction and adjustpool stats accordingly. The workershould be idle.
Context
raw_spin_lock_irq(pool->lock).
- voididle_worker_timeout(structtimer_list*t)¶
check if some idle workers can now be deleted.
Parameters
structtimer_list*tThe pool’s idle_timer that just expired
Description
The timer is armed inworker_enter_idle(). Note that it isn’t disarmed inworker_leave_idle(), as a worker flicking between idle and active while itspool is at thetoo_many_workers() tipping point would cause too much timerhousekeeping overhead. Since IDLE_WORKER_TIMEOUT is long enough, we just letit expire and re-evaluate things from there.
- voididle_cull_fn(structwork_struct*work)¶
cull workers that have been idle for too long.
Parameters
structwork_struct*workthe pool’s work for handling these idle workers
Description
This goes through a pool’s idle workers and gets rid of those that have beenidle for at least IDLE_WORKER_TIMEOUT seconds.
We don’t want to disturb isolated CPUs because of a pcpu kworker beingculled, so this also resets worker affinity. This requires a sleepablecontext, hence the split between timer callback and work item.
- voidmaybe_create_worker(structworker_pool*pool)¶
create a new worker if necessary
Parameters
structworker_pool*poolpool to create a new worker for
Description
Create a new worker forpool if necessary.pool is guaranteed tohave at least one idle worker on return from this function. Ifcreating a new worker takes longer than MAYDAY_INTERVAL, mayday issent to all rescuers with works scheduled onpool to resolvepossible allocation deadlock.
On return,need_to_create_worker() is guaranteed to befalse andmay_start_working()true.
LOCKING:raw_spin_lock_irq(pool->lock) which may be released and regrabbedmultiple times. Does GFP_KERNEL allocations. Called only frommanager.
Parameters
structworker*workerself
Description
Assume the manager role and manage the worker poolworker belongsto. At any given time, there can be only zero or one manager perpool. The exclusion is handled automatically by this function.
The caller can safely start processing works on false return. Ontrue return, it’s guaranteed thatneed_to_create_worker() is falseandmay_start_working() is true.
Context
raw_spin_lock_irq(pool->lock) which may be released and regrabbedmultiple times. Does GFP_KERNEL allocations.
Return
false if the pool doesn’t need management and the caller can safelystart processing works,true if management function was performed andthe conditions that the caller verified before calling the function mayno longer be true.
Parameters
structworker*workerself
structwork_struct*workwork to process
Description
Processwork. This function contains all the logics necessary toprocess a single work including synchronization against andinteraction with other workers on the same cpu, queueing andflushing. As long as context requirement is met, any worker cancall this function to process a work.
Context
raw_spin_lock_irq(pool->lock) which is released and regrabbed.
Parameters
structworker*workerself
Description
Process all scheduled works. Please note that the scheduled listmay change while processing a work, so this function repeatedlyfetches a work from the top and executes it.
Context
raw_spin_lock_irq(pool->lock) which may be released and regrabbedmultiple times.
- intworker_thread(void*__worker)¶
the worker thread function
Parameters
void*__workerself
Description
The worker thread function. All workers belong to a worker_pool -either a per-cpu one or dynamic unbound one. These workers process allwork items regardless of their specific target workqueue. The onlyexception is work items which belong to workqueues with a rescuer whichwill be explained inrescuer_thread().
Return
0
- intrescuer_thread(void*__rescuer)¶
the rescuer thread function
Parameters
void*__rescuerself
Description
Workqueue rescuer thread function. There’s one rescuer for eachworkqueue which has WQ_MEM_RECLAIM set.
Regular work processing on a pool may block trying to create a newworker which uses GFP_KERNEL allocation which has slight chance ofdeveloping into deadlock if some works currently on the same queueneed to be processed to satisfy the GFP_KERNEL allocation. This isthe problem rescuer solves.
When such condition is possible, the pool summons rescuers of allworkqueues which have works queued on the pool and let them processthose works so that forward progress can be guaranteed.
This should happen rarely.
Return
0
- voidcheck_flush_dependency(structworkqueue_struct*target_wq,structwork_struct*target_work,boolfrom_cancel)¶
check for flush dependency sanity
Parameters
structworkqueue_struct*target_wqworkqueue being flushed
structwork_struct*target_workwork item being flushed (NULL for workqueue flushes)
boolfrom_cancelare we called from the work cancel path
Description
current is trying to flush the wholetarget_wq ortarget_work on it.If this is not the cancel path (which implies work being flushed is eitheralready running, or will not be at all), check iftarget_wq doesn’t haveWQ_MEM_RECLAIM and verify thatcurrent is not reclaiming memory or runningon a workqueue which doesn’t haveWQ_MEM_RECLAIM as that can break forward-progress guarantee leading to a deadlock.
- voidinsert_wq_barrier(structpool_workqueue*pwq,structwq_barrier*barr,structwork_struct*target,structworker*worker)¶
insert a barrier work
Parameters
structpool_workqueue*pwqpwq to insert barrier into
structwq_barrier*barrwq_barrier to insert
structwork_struct*targettarget work to attachbarr to
structworker*workerworker currently executingtarget, NULL iftarget is not executing
Description
barr is linked totarget such thatbarr is completed only aftertarget finishes execution. Please note that the orderingguarantee is observed only with respect totarget and on the localcpu.
Currently, a queued barrier can’t be canceled. This is becausetry_to_grab_pending() can’t determine whether the work to begrabbed is at the head of the queue and thus can’t clear LINKEDflag of the previous work while there must be a valid next workafter a work with LINKED flag set.
Note that whenworker is non-NULL,target may be modifiedunderneath us, so we can’t reliably determine pwq fromtarget.
Context
raw_spin_lock_irq(pool->lock).
- boolflush_workqueue_prep_pwqs(structworkqueue_struct*wq,intflush_color,intwork_color)¶
prepare pwqs for workqueue flushing
Parameters
structworkqueue_struct*wqworkqueue being flushed
intflush_colornew flush color, < 0 for no-op
intwork_colornew work color, < 0 for no-op
Description
Prepare pwqs for workqueue flushing.
Ifflush_color is non-negative, flush_color on all pwqs should be-1. If no pwq has in-flight commands at the specified color, allpwq->flush_color’s stay at -1 andfalse is returned. If any pwqhas in flight commands, its pwq->flush_color is set toflush_color,wq->nr_pwqs_to_flush is updated accordingly, pwqwakeup logic is armed andtrue is returned.
The caller should have initializedwq->first_flusher prior tocalling this function with non-negativeflush_color. Ifflush_color is negative, no flush color update is done andfalseis returned.
Ifwork_color is non-negative, all pwqs should have the samework_color which is previous towork_color and all will beadvanced towork_color.
Context
mutex_lock(wq->mutex).
Return
true ifflush_color >= 0 and there’s something to flush.falseotherwise.
- void__flush_workqueue(structworkqueue_struct*wq)¶
ensure that any scheduled work has run to completion.
Parameters
structworkqueue_struct*wqworkqueue to flush
Description
This function sleeps until all work items which were queued on entryhave finished execution, but it is not livelocked by new incoming ones.
- voiddrain_workqueue(structworkqueue_struct*wq)¶
drain a workqueue
Parameters
structworkqueue_struct*wqworkqueue to drain
Description
Wait until the workqueue becomes empty. While draining is in progress,only chain queueing is allowed. IOW, only currently pending or runningwork items onwq can queue further work items on it.wq is flushedrepeatedly until it becomes empty. The number of flushing is determinedby the depth of chaining and should be relatively short. Whine if ittakes too long.
- boolflush_work(structwork_struct*work)¶
wait for a work to finish executing the last queueing instance
Parameters
structwork_struct*workthe work to flush
Description
Wait untilwork has finished execution.work is guaranteed to be idleon return if it hasn’t been requeued since flush started.
Return
true ifflush_work() waited for the work to finish execution,false if it was already idle.
- boolflush_delayed_work(structdelayed_work*dwork)¶
wait for a dwork to finish executing the last queueing
Parameters
structdelayed_work*dworkthe delayed work to flush
Description
Delayed timer is cancelled and the pending work is queued forimmediate execution. Likeflush_work(), this function onlyconsiders the last queueing instance ofdwork.
Return
true ifflush_work() waited for the work to finish execution,false if it was already idle.
- boolflush_rcu_work(structrcu_work*rwork)¶
wait for a rwork to finish executing the last queueing
Parameters
structrcu_work*rworkthe rcu work to flush
Return
true ifflush_rcu_work() waited for the work to finish execution,false if it was already idle.
- boolcancel_work_sync(structwork_struct*work)¶
cancel a work and wait for it to finish
Parameters
structwork_struct*workthe work to cancel
Description
Cancelwork and wait for its execution to finish. This function can be usedeven if the work re-queues itself or migrates to another workqueue. On returnfrom this function,work is guaranteed to be not pending or executing on anyCPU as long as there aren’t racing enqueues.
cancel_work_sync(delayed_work->work) must not be used for delayed_work’s.Usecancel_delayed_work_sync() instead.
Must be called from a sleepable context ifwork was last queued on a non-BHworkqueue. Can also be called from non-hardirq atomic contexts including BHifwork was last queued on a BH workqueue.
Returnstrue ifwork was pending,false otherwise.
- boolcancel_delayed_work(structdelayed_work*dwork)¶
cancel a delayed work
Parameters
structdelayed_work*dworkdelayed_work to cancel
Description
Kill off a pending delayed_work.
Return
true ifdwork was pending and canceled;false if it wasn’tpending.
Note
The work callback function may still be running on return, unlessit returnstrue and the work doesn’t re-arm itself. Explicitly flush orusecancel_delayed_work_sync() to wait on it.
This function is safe to call from any context including IRQ handler.
- boolcancel_delayed_work_sync(structdelayed_work*dwork)¶
cancel a delayed work and wait for it to finish
Parameters
structdelayed_work*dworkthe delayed work cancel
Description
This iscancel_work_sync() for delayed works.
Return
true ifdwork was pending,false otherwise.
- booldisable_work(structwork_struct*work)¶
Disable and cancel a work item
Parameters
structwork_struct*workwork item to disable
Description
Disablework by incrementing its disable count and cancel it if currentlypending. As long as the disable count is non-zero, any attempt to queueworkwill fail and returnfalse. The maximum supported disable depth is 2 to thepower ofWORK_OFFQ_DISABLE_BITS, currently 65536.
Can be called from any context. Returnstrue ifwork was pending,falseotherwise.
- booldisable_work_sync(structwork_struct*work)¶
Disable, cancel and drain a work item
Parameters
structwork_struct*workwork item to disable
Description
Similar todisable_work() but also wait forwork to finish if currentlyexecuting.
Must be called from a sleepable context ifwork was last queued on a non-BHworkqueue. Can also be called from non-hardirq atomic contexts including BHifwork was last queued on a BH workqueue.
Returnstrue ifwork was pending,false otherwise.
- boolenable_work(structwork_struct*work)¶
Enable a work item
Parameters
structwork_struct*workwork item to enable
Description
Undo disable_work[_sync]() by decrementingwork’s disable count.work canonly be queued if its disable count is 0.
Can be called from any context. Returnstrue if the disable count reached 0.Otherwise,false.
- booldisable_delayed_work(structdelayed_work*dwork)¶
Disable and cancel a delayed work item
Parameters
structdelayed_work*dworkdelayed work item to disable
Description
disable_work() for delayed work items.
- booldisable_delayed_work_sync(structdelayed_work*dwork)¶
Disable, cancel and drain a delayed work item
Parameters
structdelayed_work*dworkdelayed work item to disable
Description
disable_work_sync() for delayed work items.
- boolenable_delayed_work(structdelayed_work*dwork)¶
Enable a delayed work item
Parameters
structdelayed_work*dworkdelayed work item to enable
Description
enable_work() for delayed work items.
- intschedule_on_each_cpu(work_func_tfunc)¶
execute a function synchronously on each online CPU
Parameters
work_func_tfuncthe function to call
Description
schedule_on_each_cpu() executesfunc on each online CPU using thesystem workqueue and blocks until all CPUs have completed.schedule_on_each_cpu() is very slow.
Return
0 on success, -errno on failure.
- intexecute_in_process_context(work_func_tfn,structexecute_work*ew)¶
reliably execute the routine with user context
Parameters
work_func_tfnthe function to execute
structexecute_work*ewguaranteed storage for the execute work structure (mustbe available when the work executes)
Description
Executes the function immediately if process context is available,otherwise schedules the function for delayed execution.
Return
0 - function was executed1 - function was scheduled for execution
- voidfree_workqueue_attrs(structworkqueue_attrs*attrs)¶
free a workqueue_attrs
- structworkqueue_attrs*alloc_workqueue_attrs(void)¶
allocate a workqueue_attrs
Parameters
voidno arguments
Description
Allocate a new workqueue_attrs, initialize with default settings andreturn it.
Return
The allocated new workqueue_attr on success.NULL on failure.
- intinit_worker_pool(structworker_pool*pool)¶
initialize a newly zalloc’d worker_pool
Parameters
structworker_pool*poolworker_pool to initialize
Description
Initialize a newly zalloc’dpool. It also allocatespool->attrs.
Return
0 on success, -errno on failure. Even on failure, all fieldsinsidepool proper are initialized andput_unbound_pool() can be calledonpool safely to release it.
- voidput_unbound_pool(structworker_pool*pool)¶
put a worker_pool
Parameters
structworker_pool*poolworker_pool to put
Description
Putpool. If its refcnt reaches zero, it gets destroyed in RCUsafe manner.get_unbound_pool() calls this function on its failure pathand this function should be able to release pools which went through,successfully or not,init_worker_pool().
Should be called with wq_pool_mutex held.
- structworker_pool*get_unbound_pool(conststructworkqueue_attrs*attrs)¶
get a worker_pool with the specified attributes
Parameters
conststructworkqueue_attrs*attrsthe attributes of the worker_pool to get
Description
Obtain a worker_pool which has the same attributes asattrs, bump thereference count and return it. If there already is a matchingworker_pool, it will be used; otherwise, this function attempts tocreate a new one.
Should be called with wq_pool_mutex held.
Return
On success, a worker_pool with the same attributes asattrs.On failure,NULL.
- voidwq_calc_pod_cpumask(structworkqueue_attrs*attrs,intcpu)¶
calculate a wq_attrs’ cpumask for a pod
Parameters
structworkqueue_attrs*attrsthe wq_attrs of the default pwq of the target workqueue
intcputhe target CPU
Description
Calculate the cpumask a workqueue withattrs should use onpod.The result is stored inattrs->__pod_cpumask.
If pod affinity is not enabled,attrs->cpumask is always used. If enabledandpod has online CPUs requested byattrs, the returned cpumask is theintersection of the possible CPUs ofpod andattrs->cpumask.
The caller is responsible for ensuring that the cpumask ofpod stays stable.
- intapply_workqueue_attrs(structworkqueue_struct*wq,conststructworkqueue_attrs*attrs)¶
apply new workqueue_attrs to an unbound workqueue
Parameters
structworkqueue_struct*wqthe target workqueue
conststructworkqueue_attrs*attrsthe workqueue_attrs to apply, allocated with
alloc_workqueue_attrs()
Description
Applyattrs to an unbound workqueuewq. Unless disabled, this function mapsa separate pwq to each CPU pod with possibles CPUs inattrs->cpumask so thatwork items are affine to the pod it was issued on. Older pwqs are released asin-flight work items finish. Note that a work item which repeatedly requeuesitself back-to-back will stay on its current pwq.
Performs GFP_KERNEL allocations.
Return
0 on success and -errno on failure.
- voidunbound_wq_update_pwq(structworkqueue_struct*wq,intcpu)¶
update a pwq slot for CPU hot[un]plug
Parameters
structworkqueue_struct*wqthe target workqueue
intcputhe CPU to update the pwq slot for
Description
This function is to be called fromCPU_DOWN_PREPARE,CPU_ONLINE andCPU_DOWN_FAILED.cpu is in the same pod of the CPU being hot[un]plugged.
If pod affinity can’t be adjusted due to memory allocation failure, it fallsback towq->dfl_pwq which may not be optimal but is always correct.
Note that when the last allowed CPU of a pod goes offline for a workqueuewith a cpumask spanning multiple pods, the workers which were alreadyexecuting the work items for the workqueue will lose their CPU affinity andmay execute on any CPU. This is similar to how per-cpu workqueues behave onCPU_DOWN. If a workqueue user wants strict affinity, it’s the user’sresponsibility to flush the work item from CPU_DOWN_PREPARE.
- voidwq_adjust_max_active(structworkqueue_struct*wq)¶
update a wq’s max_active to the current setting
Parameters
structworkqueue_struct*wqtarget workqueue
Description
Ifwq isn’t freezing, setwq->max_active to the saved_max_active andactivate inactive work items accordingly. Ifwq is freezing, clearwq->max_active to zero.
- voiddestroy_workqueue(structworkqueue_struct*wq)¶
safely terminate a workqueue
Parameters
structworkqueue_struct*wqtarget workqueue
Description
Safely destroy a workqueue. All work currently pending will be done first.
This function does NOT guarantee that non-pending work that has beensubmitted withqueue_delayed_work() and similar functions will be donebefore destroying the workqueue. The fundamental problem is that, currently,the workqueue has no way of accessing non-pending delayed_work. delayed_workis only linked on the timer-side. All delayed_work must, therefore, becanceled before calling this function.
TODO: It would be better if the problem described above wouldn’t exist anddestroy_workqueue() would cleanly cancel all pending and non-pendingdelayed_work.
- voidworkqueue_set_max_active(structworkqueue_struct*wq,intmax_active)¶
adjust max_active of a workqueue
Parameters
structworkqueue_struct*wqtarget workqueue
intmax_activenew max_active value.
Description
Set max_active ofwq tomax_active. See thealloc_workqueue() functioncomment.
Context
Don’t call from IRQ context.
- voidworkqueue_set_min_active(structworkqueue_struct*wq,intmin_active)¶
adjust min_active of an unbound workqueue
Parameters
structworkqueue_struct*wqtarget unbound workqueue
intmin_activenew min_active value
Description
Set min_active of an unbound workqueue. Unlike other types of workqueues, anunbound workqueue is not guaranteed to be able to process max_activeinterdependent work items. Instead, an unbound workqueue is guaranteed to beable to process min_active number of interdependent work items which isWQ_DFL_MIN_ACTIVE by default.
Use this function to adjust the min_active value between 0 and the currentmax_active.
- structwork_struct*current_work(void)¶
retrieve
currenttask’s work struct
Parameters
voidno arguments
Description
Determine ifcurrent task is a workqueue worker and what it’s working on.Useful to find out the context that thecurrent task is running in.
Return
work struct ifcurrent task is a workqueue worker,NULL otherwise.
- boolcurrent_is_workqueue_rescuer(void)¶
is
currentworkqueue rescuer?
Parameters
voidno arguments
Description
Determine whethercurrent is a workqueue rescuer. Can be used fromwork functions to determine whether it’s being run off the rescuer task.
Return
true ifcurrent is a workqueue rescuer.false otherwise.
- boolworkqueue_congested(intcpu,structworkqueue_struct*wq)¶
test whether a workqueue is congested
Parameters
intcpuCPU in question
structworkqueue_struct*wqtarget workqueue
Description
Test whetherwq’s cpu workqueue forcpu is congested. There isno synchronization around this function and the test result isunreliable and only useful as advisory hints or for debugging.
Ifcpu is WORK_CPU_UNBOUND, the test is performed on the local CPU.
With the exception of ordered workqueues, all workqueues have per-cpupool_workqueues, each with its own congested state. A workqueue beingcongested on one CPU doesn’t mean that the workqueue is contested on anyother CPUs.
Return
true if congested,false otherwise.
- unsignedintwork_busy(structwork_struct*work)¶
test whether a work is currently pending or running
Parameters
structwork_struct*workthe work to be tested
Description
Test whetherwork is currently pending or running. There is nosynchronization around this function and the test result isunreliable and only useful as advisory hints or for debugging.
Return
OR’d bitmask of WORK_BUSY_* bits.
- voidset_worker_desc(constchar*fmt,...)¶
set description for the current work item
Parameters
constchar*fmtprintf-style format string
...arguments for the format string
Description
This function can be called by a running work function to describe whatthe work item is about. If the worker task gets dumped, thisinformation will be printed out together to help debugging. Thedescription can be at most WORKER_DESC_LEN including the trailing ‘0’.
- voidprint_worker_info(constchar*log_lvl,structtask_struct*task)¶
print out worker information and description
Parameters
constchar*log_lvlthe log level to use when printing
structtask_struct*tasktarget task
Description
Iftask is a worker and currently executing a work item, print out thename of the workqueue being serviced and worker description set withset_worker_desc() by the currently executing work item.
This function can be safely called on any task as long as thetask_struct itself is accessible. While safe, this function isn’tsynchronized and may print out mixups or garbages of limited length.
- voidshow_one_workqueue(structworkqueue_struct*wq)¶
dump state of specified workqueue
Parameters
structworkqueue_struct*wqworkqueue whose state will be printed
- voidshow_one_worker_pool(structworker_pool*pool)¶
dump state of specified worker pool
Parameters
structworker_pool*poolworker pool whose state will be printed
- voidshow_all_workqueues(void)¶
dump workqueue state
Parameters
voidno arguments
Description
Called from a sysrq handler and prints out all busy workqueues and pools.
- voidshow_freezable_workqueues(void)¶
dump freezable workqueue state
Parameters
voidno arguments
Description
Called fromtry_to_freeze_tasks() and prints out all freezable workqueuesstill busy.
- voidrebind_workers(structworker_pool*pool)¶
rebind all workers of a pool to the associated CPU
Parameters
structworker_pool*poolpool of interest
Description
pool->cpu is coming online. Rebind all workers to the CPU.
- voidrestore_unbound_workers_cpumask(structworker_pool*pool,intcpu)¶
restore cpumask of unbound workers
Parameters
structworker_pool*poolunbound pool of interest
intcputhe CPU which is coming up
Description
An unbound pool may end up with a cpumask which doesn’t have any onlineCPUs. When a worker of such pool get scheduled, the scheduler resetsits cpus_allowed. Ifcpu is inpool’s cpumask which didn’t have anyonline CPU before, cpus_allowed of all its workers should be restored.
- longwork_on_cpu_key(intcpu,long(*fn)(void*),void*arg,structlock_class_key*key)¶
run a function in thread context on a particular cpu
Parameters
intcputhe cpu to run on
long(*fn)(void*)the function to run
void*argthe function arg
structlock_class_key*keyThe lock class key for lock debugging purposes
Description
It is up to the caller to ensure that the cpu doesn’t go offline.The caller must not hold any locks which would preventfn from completing.
Return
The valuefn returns.
- voidfreeze_workqueues_begin(void)¶
begin freezing workqueues
Parameters
voidno arguments
Description
Start freezing workqueues. After this function returns, all freezableworkqueues will queue new works to their inactive_works list instead ofpool->worklist.
Context
Grabs and releases wq_pool_mutex, wq->mutex and pool->lock’s.
- boolfreeze_workqueues_busy(void)¶
are freezable workqueues still busy?
Parameters
voidno arguments
Description
Check whether freezing is complete. This function must be calledbetweenfreeze_workqueues_begin() andthaw_workqueues().
Context
Grabs and releases wq_pool_mutex.
Return
true if some freezable workqueues are still busy.false if freezingis complete.
- voidthaw_workqueues(void)¶
thaw workqueues
Parameters
voidno arguments
Description
Thaw workqueues. Normal queueing is restored and all collectedfrozen works are transferred to their respective pool worklists.
Context
Grabs and releases wq_pool_mutex, wq->mutex and pool->lock’s.
- intworkqueue_unbound_exclude_cpumask(cpumask_var_texclude_cpumask)¶
Exclude given CPUs from unbound cpumask
Parameters
cpumask_var_texclude_cpumaskthe cpumask to be excluded from wq_unbound_cpumask
Description
This function can be called from cpuset code to provide a set of isolatedCPUs that should be excluded from wq_unbound_cpumask.
- intworkqueue_set_unbound_cpumask(cpumask_var_tcpumask)¶
Set the low-level unbound cpumask
Parameters
cpumask_var_tcpumaskthe cpumask to set
Description
The low-level workqueues cpumask is a global cpumask that limitsthe affinity of all unbound workqueues. This function check thecpumaskand apply it to all unbound workqueues and updates all pwqs of them.
Return
0 - Success-EINVAL - Invalidcpumask-ENOMEM - Failed to allocate memory for attrs or pwqs.
- intworkqueue_sysfs_register(structworkqueue_struct*wq)¶
make a workqueue visible in sysfs
Parameters
structworkqueue_struct*wqthe workqueue to register
Description
Exposewq in sysfs under /sys/bus/workqueue/devices.alloc_workqueue*() automatically calls this function if WQ_SYSFS is setwhich is the preferred method.
Workqueue user should use this function directly iff it wants to applyworkqueue_attrs before making the workqueue visible in sysfs; otherwise,apply_workqueue_attrs() may race against userland updating theattributes.
Return
0 on success, -errno on failure.
- voidworkqueue_sysfs_unregister(structworkqueue_struct*wq)¶
Parameters
structworkqueue_struct*wqthe workqueue to unregister
Description
Ifwq is registered to sysfs byworkqueue_sysfs_register(), unregister.
- voidworkqueue_init_early(void)¶
early init for workqueue subsystem
Parameters
voidno arguments
Description
This is the first step of three-staged workqueue subsystem initialization andinvoked as soon as the bare basics - memory allocation, cpumasks and idr areup. It sets up all the data structures and system workqueues and allows earlyboot code to create workqueues and queue/cancel work items. Actual work itemexecution starts only after kthreads can be created and scheduled rightbefore early initcalls.
- voidworkqueue_init(void)¶
bring workqueue subsystem fully online
Parameters
voidno arguments
Description
This is the second step of three-staged workqueue subsystem initializationand invoked as soon as kthreads can be created and scheduled. Workqueues havebeen created and work items queued on them, but there are no kworkersexecuting the work items yet. Populate the worker pools with the initialworkers and enable future kworker creations.
- voidworkqueue_init_topology(void)¶
initialize CPU pods for unbound workqueues
Parameters
voidno arguments
Description
This is the third step of three-staged workqueue subsystem initialization andinvoked after SMP and topology information are fully initialized. Itinitializes the unbound CPU pods accordingly.