Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit2802bf3

Browse files
msrasmussenIngo Molnar
authored and
Ingo Molnar
committed
sched/fair: Add over-utilization/tipping point indicator
Energy-aware scheduling is only meant to be active while the system is_not_ over-utilized. That is, there are spare cycles available to shifttasks around based on their actual utilization to get a moreenergy-efficient task distribution without depriving any tasks. Whenabove the tipping point task placement is done the traditional way basedon load_avg, spreading the tasks across as many cpus as possible basedon priority scaled load to preserve smp_nice. Below the tipping point wewant to use util_avg instead. We need to define a criteria for when wemake the switch.The util_avg for each cpu converges towards 100% regardless of how manyadditional tasks we may put on it. If we define over-utilized as:sum_{cpus}(rq.cfs.avg.util_avg) + margin > sum_{cpus}(rq.capacity)some individual cpus may be over-utilized running multiple tasks evenwhen the above condition is false. That should be okay as long as we tryto spread the tasks out to avoid per-cpu over-utilization as much aspossible and if all tasks have the _same_ priority. If the latter isn'ttrue, we have to consider priority to preserve smp_nice.For example, we could have n_cpus nice=-10 util_avg=55% tasks andn_cpus/2 nice=0 util_avg=60% tasks. Balancing based on util_avg we arelikely to end up with nice=-10 tasks sharing cpus and nice=0 tasksgetting their own as we 1.5*n_cpus tasks in total and 55%+55% is lessover-utilized than 55%+60% for those cpus that have to be shared. Thesystem utilization is only 85% of the system capacity, but we arebreaking smp_nice.To be sure not to break smp_nice, we have defined over-utilizationconservatively as when any cpu in the system is fully utilized at itshighest frequency instead:cpu_rq(any).cfs.avg.util_avg + margin > cpu_rq(any).capacityIOW, as soon as one cpu is (nearly) 100% utilized, we switch to load_avgto factor in priority to preserve smp_nice.With this definition, we can skip periodic load-balance as no cpu has analways-running task when the system is not over-utilized. All tasks willbe periodic and we can balance them at wake-up. This conservativecondition does however mean that some scenarios that could benefit fromenergy-aware decisions even if one cpu is fully utilized would not getthose benefits.For systems where some cpus might have reduced capacity on some cpus(RT-pressure and/or big.LITTLE), we want periodic load-balance checks assoon a just a single cpu is fully utilized as it might one of those withreduced capacity and in that case we want to migrate it.[ peterz: Added a comment explaining why new tasks are not accounted during overutilization detection. ]Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>Signed-off-by: Quentin Perret <quentin.perret@arm.com>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>Cc: Linus Torvalds <torvalds@linux-foundation.org>Cc: Mike Galbraith <efault@gmx.de>Cc: Peter Zijlstra <peterz@infradead.org>Cc: Thomas Gleixner <tglx@linutronix.de>Cc: adharmap@codeaurora.orgCc: chris.redpath@arm.comCc: currojerez@riseup.netCc: dietmar.eggemann@arm.comCc: edubezval@gmail.comCc: gregkh@linuxfoundation.orgCc: javi.merino@kernel.orgCc: joel@joelfernandes.orgCc: juri.lelli@redhat.comCc: patrick.bellasi@arm.comCc: pkondeti@codeaurora.orgCc: rjw@rjwysocki.netCc: skannan@codeaurora.orgCc: smuckle@google.comCc: srinivas.pandruvada@linux.intel.comCc: thara.gopinath@linaro.orgCc: tkjos@google.comCc: valentin.schneider@arm.comCc: vincent.guittot@linaro.orgCc: viresh.kumar@linaro.orgLink:https://lkml.kernel.org/r/20181203095628.11858-13-quentin.perret@arm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
1 parent630246a commit2802bf3

File tree

2 files changed

+61
-2
lines changed

2 files changed

+61
-2
lines changed

‎kernel/sched/fair.c‎

Lines changed: 57 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5082,6 +5082,24 @@ static inline void hrtick_update(struct rq *rq)
50825082
}
50835083
#endif
50845084

5085+
#ifdefCONFIG_SMP
5086+
staticinlineunsigned longcpu_util(intcpu);
5087+
staticunsigned longcapacity_of(intcpu);
5088+
5089+
staticinlineboolcpu_overutilized(intcpu)
5090+
{
5091+
return (capacity_of(cpu)*1024)< (cpu_util(cpu)*capacity_margin);
5092+
}
5093+
5094+
staticinlinevoidupdate_overutilized_status(structrq*rq)
5095+
{
5096+
if (!READ_ONCE(rq->rd->overutilized)&&cpu_overutilized(rq->cpu))
5097+
WRITE_ONCE(rq->rd->overutilized,SG_OVERUTILIZED);
5098+
}
5099+
#else
5100+
staticinlinevoidupdate_overutilized_status(structrq*rq) { }
5101+
#endif
5102+
50855103
/*
50865104
* The enqueue_task method is called before nr_running is
50875105
* increased. Here we update the fair scheduling stats and
@@ -5139,8 +5157,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
51395157
update_cfs_group(se);
51405158
}
51415159

5142-
if (!se)
5160+
if (!se) {
51435161
add_nr_running(rq,1);
5162+
/*
5163+
* Since new tasks are assigned an initial util_avg equal to
5164+
* half of the spare capacity of their CPU, tiny tasks have the
5165+
* ability to cross the overutilized threshold, which will
5166+
* result in the load balancer ruining all the task placement
5167+
* done by EAS. As a way to mitigate that effect, do not account
5168+
* for the first enqueue operation of new tasks during the
5169+
* overutilized flag detection.
5170+
*
5171+
* A better way of solving this problem would be to wait for
5172+
* the PELT signals of tasks to converge before taking them
5173+
* into account, but that is not straightforward to implement,
5174+
* and the following generally works well enough in practice.
5175+
*/
5176+
if (flags&ENQUEUE_WAKEUP)
5177+
update_overutilized_status(rq);
5178+
5179+
}
51445180

51455181
hrtick_update(rq);
51465182
}
@@ -7940,6 +7976,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
79407976
if (nr_running>1)
79417977
*sg_status |=SG_OVERLOAD;
79427978

7979+
if (cpu_overutilized(i))
7980+
*sg_status |=SG_OVERUTILIZED;
7981+
79437982
#ifdefCONFIG_NUMA_BALANCING
79447983
sgs->nr_numa_running+=rq->nr_numa_running;
79457984
sgs->nr_preferred_running+=rq->nr_preferred_running;
@@ -8170,8 +8209,15 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
81708209
env->fbq_type=fbq_classify_group(&sds->busiest_stat);
81718210

81728211
if (!env->sd->parent) {
8212+
structroot_domain*rd=env->dst_rq->rd;
8213+
81738214
/* update overload indicator if we are at root domain */
8174-
WRITE_ONCE(env->dst_rq->rd->overload,sg_status&SG_OVERLOAD);
8215+
WRITE_ONCE(rd->overload,sg_status&SG_OVERLOAD);
8216+
8217+
/* Update over-utilization (tipping point, U >= 0) indicator */
8218+
WRITE_ONCE(rd->overutilized,sg_status&SG_OVERUTILIZED);
8219+
}elseif (sg_status&SG_OVERUTILIZED) {
8220+
WRITE_ONCE(env->dst_rq->rd->overutilized,SG_OVERUTILIZED);
81758221
}
81768222
}
81778223

@@ -8398,6 +8444,14 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
83988444
* this level.
83998445
*/
84008446
update_sd_lb_stats(env,&sds);
8447+
8448+
if (static_branch_unlikely(&sched_energy_present)) {
8449+
structroot_domain*rd=env->dst_rq->rd;
8450+
8451+
if (rcu_dereference(rd->pd)&& !READ_ONCE(rd->overutilized))
8452+
gotoout_balanced;
8453+
}
8454+
84018455
local=&sds.local_stat;
84028456
busiest=&sds.busiest_stat;
84038457

@@ -9798,6 +9852,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
97989852
task_tick_numa(rq,curr);
97999853

98009854
update_misfit_status(curr,rq);
9855+
update_overutilized_status(task_rq(curr));
98019856
}
98029857

98039858
/*

‎kernel/sched/sched.h‎

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -718,6 +718,7 @@ struct perf_domain {
718718

719719
/* Scheduling group status flags */
720720
#defineSG_OVERLOAD0x1/* More than one runnable task on a CPU. */
721+
#defineSG_OVERUTILIZED0x2/* One or more CPUs are over-utilized. */
721722

722723
/*
723724
* We add the notion of a root-domain which will be used to define per-domain
@@ -741,6 +742,9 @@ struct root_domain {
741742
*/
742743
intoverload;
743744

745+
/* Indicate one or more cpus over-utilized (tipping point) */
746+
intoverutilized;
747+
744748
/*
745749
* The bit corresponding to a CPU gets set here if such CPU has more
746750
* than one runnable -deadline task (as it is below for RT tasks).

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp