Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Satoru Takeuchi
Satoru Takeuchi

Posted on • Edited on

     

The Linux's sysctl parameters about process scheduler

Preface

This article explains the meaning of the Linux's sysctl parameters about the process scheduler and some background knowledge needed to understand it. Here I don't tend to explain all parameters, but just cover essential ones.

The description in this article doesn't consider the following things about process scheduling for simplicity.

  • nice value
  • real-time priority

This article is based on Linux kernel v5.0.

Scheduling Classes

There is a concept calledscheduling classes in the Linux kernel. All processes running on Linux belong to one of the scheduling classes. Each scheduling class defines how the processes belonging to it are scheduled.

Processes belong tofair scheduling class by default. In this article, I call these processesnormal processes. On the other hand, processes calledreal-time processes (see later) belong torealtime scheduling class.

I'll describe the meaning of the sysctl parameters about the above-mentioned two scheduling classes in the following sections. In addition, I'll also describe a brief explanation about each scheduling class.

The sysctl parameters aboutfair scheduling class

The normal processes belongs tofair scheduling class are scheduled with Completely Fair Scheduler (CFS). The meaning of the CFS will be explained in the next section.

kernel.sched_latency_ns parameter

If there are two or more runnable processes, CFS divide CPU time to each process as fair as possible. In this case,fair means giving fair share of CPU time to each process.

CFS has a concept calledlatency target. CFS tries to give timeslice to all runnable processes once per the latency target. Here the timeslice of each process is(latency target)/<the number of runnable processes>. For example, if the latency target is 10ms and there are two runnable processes, these can get 5ms per 10ms. If there are four, these can get 2.5ms per 10ms.

Herekernel.sched_latency_ns defines thelatency target of CFS in nanoseconds. If there are multiple CPUs in the system, thelatency target becomeskernel.sched_latency_ns * (1+log2(the number of CPUs)).

kernel.sched_min_granularity_ns parameter

How about the case that there are so many runnable processes? For example, if the latency target is 10ms and there are 100 runnable processes, does each process's timeslice get just 100us? It seems to be too short since the context switch cost becomes too high in this case.

To prevent this problem, timeslice is guaranteed to become equal or longer than the value ofkernel.sched_min_granularity_ns parameter. The unit of this parameter is nanoseconds. Please note that the latency target becomeskernel.sched_min_granularity_ns * (the number of runnable processes).

Similar to the latency target, if there are multiple CPUs in the system, the guaranteed timeslice becomeskernel.sched_min_granularity_ns * (1+log2(the number of CPUs)).

kernel.wakeup_granularity_ns parameter

The processes, which are woken up from a sleep state, tend to sleep again in a short period. So, in many cases, it's efficient to give CPU time to the woken up process as soon as possible.

The typical example is terminal emulators that directly interact with users through the input from keyboard. When a user types something, a terminal emulator
is woken up and echo back his input. If the echo back takes too long, the user experience becomes bad.

CFS has a special logic to shorten the latency of such interactive processes. However, to explain the detail of this logic is a bit difficult. So I only say that if you decreasekernel.wakeup_granularity_ns parameter, the probability of the preemption by the woken up process gets high. Then the system's interactivity would get better.

However, please note that there is a tradeoff between interactivity and throughput. If you set the value that is shorter than the default value, the number of context switches would get large and the throughput would get worse.

The sysctl parameters about therealtime scheduling class

realtime scheduling class is for the processes that must run prior to any normal processes, in other words, the processes belonging tofair scheduling class.

As I already described, the processes belong torealtime scheduling class are called real-time processes. The definition of the real-time processes is the processes havingSCHED_FIFO scheduling policy orSCHED_RR scheduling policy. We can set the scheduling policy of processes withsched_setscheduler() system call.

Let's assume that a real-time process A becomes runnable in a CPU, in which process B, that belongs tofair scheduling class, is running on this CPU. Here B can preempt A at any time by definition. So, how about the case that the B is also real-time processes? It depends on the scheduling policy of B.

If B's scheduling policy is SCHED_FIFO, A can't preempt B and can run on this CPU only when B exits or becomes sleeping state. However, if its scheduling policy is SCHED_RR, B has its predefined timeslice and B can preempt A after A exhausts its timeslice. If A also belongs to SCHED_RR, both A and B got CPU time in a round-robin manner after that.

kernel.sched_rr_timeslice_ms parameter

This parameter means the timeslice of real-time processes that belong to SCHED_RR scheduling policy. Its unit is millisecond.

kernel.sched_rt_period_us parameter andkernel.sched_rt_runtime_us parameter

These parameters are to prevent CPU occupation by the out-of-control real-time processes.

If the real-time process continues to run for a long time without getting sleep, any normal processes can't get CPU time at all during this period. It would incur serious problems like hanging up the whole system. For example, let's assume a system that has only one CPU and the a real-time process A is running on the CPU. If A hangs up, the system also hangs up. In addition, we can't kill this problematic real-time process because launching bash is also prevented by this process.

To prevent this kind of problem, the process scheduler has a logic to limit the running time of real-time processes. In short, the total CPU time consumed by real-time processes can't exceedkernel.sched_rt_runtime_us perkernel.sched_rt_period_us. Both units are microseconds.

Conclusion

This article describes some of Linux's scheduler and the basic knowledge which is necessary to understand this explanation. If you're interested in this topic, please modify these parameters and run your workload to verify whether the description of this article is correct or not. For example, the following article would help you.

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

One of the Rook's maintainers, ex-Linux kernel developer, developing a storage system based on Ceph on top of Kubernetes.
  • Location
    Japan
  • Work
    Software engineer at Cybozu.inc
  • Joined

More fromSatoru Takeuchi

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp