CPU Performance Scaling¶
- Copyright:
© 2017 Intel Corporation
- Author:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Concept of CPU Performance Scaling¶
The majority of modern processors are capable of operating in a number ofdifferent clock frequency and voltage configurations, often referred to asOperating Performance Points or P-states (in ACPI terminology). As a rule,the higher the clock frequency and the higher the voltage, the more instructionscan be retired by the CPU over a unit of time, but also the higher the clockfrequency and the higher the voltage, the more energy is consumed over a unit oftime (or the more power is drawn) by the CPU in the given P-state. Thereforethere is a natural tradeoff between the CPU capacity (the number of instructionsthat can be executed over a unit of time) and the power drawn by the CPU.
In some situations it is desirable or even necessary to run the program as fastas possible and then there is no reason to use any P-states different from thehighest one (i.e. the highest-performance frequency/voltage configurationavailable). In some other cases, however, it may not be necessary to executeinstructions so quickly and maintaining the highest available CPU capacity for arelatively long time without utilizing it entirely may be regarded as wasteful.It also may not be physically possible to maintain maximum CPU capacity for toolong for thermal or power supply capacity reasons or similar. To cover thosecases, there are hardware interfaces allowing CPUs to be switched betweendifferent frequency/voltage configurations or (in the ACPI terminology) to beput into different P-states.
Typically, they are used along with algorithms to estimate the required CPUcapacity, so as to decide which P-states to put the CPUs into. Of course, sincethe utilization of the system generally changes over time, that has to be donerepeatedly on a regular basis. The activity by which this happens is referredto as CPU performance scaling or CPU frequency scaling (because it involvesadjusting the CPU clock frequency).
CPU Performance Scaling in Linux¶
The Linux kernel supports CPU performance scaling by means of theCPUFreq(CPU Frequency scaling) subsystem that consists of three layers of code: thecore, scaling governors and scaling drivers.
TheCPUFreq core provides the common code infrastructure and user spaceinterfaces for all platforms that support CPU performance scaling. It definesthe basic framework in which the other components operate.
Scaling governors implement algorithms to estimate the required CPU capacity.As a rule, each governor implements one, possibly parametrized, scalingalgorithm.
Scaling drivers talk to the hardware. They provide scaling governors withinformation on the available P-states (or P-state ranges in some cases) andaccess platform-specific hardware interfaces to change CPU P-states as requestedby scaling governors.
In principle, all available scaling governors can be used with every scalingdriver. That design is based on the observation that the information used byperformance scaling algorithms for P-state selection can be represented in aplatform-independent form in the majority of cases, so it should be possibleto use the same performance scaling algorithm implemented in exactly the sameway regardless of which scaling driver is used. Consequently, the same set ofscaling governors should be suitable for every supported platform.
However, that observation may not hold for performance scaling algorithmsbased on information provided by the hardware itself, for example throughfeedback registers, as that information is typically specific to the hardwareinterface it comes from and may not be easily represented in an abstract,platform-independent way. For this reason,CPUFreq allows scaling driversto bypass the governor layer and implement their own performance scalingalgorithms. That is done by theintel_pstate scaling driver.
CPUFreq Policy Objects¶
In some cases the hardware interface for P-state control is shared by multipleCPUs. That is, for example, the same register (or set of registers) is used tocontrol the P-state of multiple CPUs at the same time and writing to it affectsall of those CPUs simultaneously.
Sets of CPUs sharing hardware P-state control interfaces are represented byCPUFreq asstructcpufreq_policy objects. For consistency,structcpufreq_policy is also used when there is only one CPU in the givenset.
TheCPUFreq core maintains a pointer to astructcpufreq_policy object forevery CPU in the system, including CPUs that are currently offline. If multipleCPUs share the same hardware P-state control interface, all of the pointerscorresponding to them point to the samestructcpufreq_policy object.
CPUFreq usesstructcpufreq_policy as its basic data type and the designof its user space interface is based on the policy concept.
CPU Initialization¶
First of all, a scaling driver has to be registered forCPUFreq to work.It is only possible to register one scaling driver at a time, so the scalingdriver is expected to be able to handle all CPUs in the system.
The scaling driver may be registered before or after CPU registration. IfCPUs are registered earlier, the driver core invokes theCPUFreq core totake a note of all of the already registered CPUs during the registration of thescaling driver. In turn, if any CPUs are registered after the registration ofthe scaling driver, theCPUFreq core will be invoked to take note of themat their registration time.
In any case, theCPUFreq core is invoked to take note of any logical CPU ithas not seen so far as soon as it is ready to handle that CPU. [Note that thelogical CPU may be a physical single-core processor, or a single core in amulticore processor, or a hardware thread in a physical processor or processorcore. In what follows “CPU” always means “logical CPU” unless explicitly statedotherwise and the word “processor” is used to refer to the physical partpossibly including multiple logical CPUs.]
Once invoked, theCPUFreq core checks if the policy pointer is already setfor the given CPU and if so, it skips the policy object creation. Otherwise,a new policy object is created and initialized, which involves the creation ofa new policy directory insysfs, and the policy pointer corresponding tothe given CPU is set to the new policy object’s address in memory.
Next, the scaling driver’s->init() callback is invoked with the policypointer of the new CPU passed to it as the argument. That callback is expectedto initialize the performance scaling hardware interface for the given CPU (or,more precisely, for the set of CPUs sharing the hardware interface it belongsto, represented by its policy object) and, if the policy object it has beencalled for is new, to set parameters of the policy, like the minimum and maximumfrequencies supported by the hardware, the table of available frequencies (ifthe set of supported P-states is not a continuous range), and the mask of CPUsthat belong to the same policy (including both online and offline CPUs). Thatmask is then used by the core to populate the policy pointers for all of theCPUs in it.
The next major initialization step for a new policy object is to attach ascaling governor to it (to begin with, that is the default scaling governordetermined by the kernel command line or configuration, but it may be changedlater viasysfs). First, a pointer to the new policy object is passed tothe governor’s->init() callback which is expected to initialize all of thedata structures necessary to handle the given policy and, possibly, to adda governorsysfs interface to it. Next, the governor is started byinvoking its->start() callback.
That callback is expected to register per-CPU utilization update callbacks forall of the online CPUs belonging to the given policy with the CPU scheduler.The utilization update callbacks will be invoked by the CPU scheduler onimportant events, like task enqueue and dequeue, on every iteration of thescheduler tick or generally whenever the CPU utilization may change (from thescheduler’s perspective). They are expected to carry out computations neededto determine the P-state to use for the given policy going forward and toinvoke the scaling driver to make changes to the hardware in accordance withthe P-state selection. The scaling driver may be invoked directly fromscheduler context or asynchronously, via a kernel thread or workqueue, dependingon the configuration and capabilities of the scaling driver and the governor.
Similar steps are taken for policy objects that are not new, but were “inactive”previously, meaning that all of the CPUs belonging to them were offline. Theonly practical difference in that case is that theCPUFreq core will attemptto use the scaling governor previously used with the policy that became“inactive” (and is re-initialized now) instead of the default governor.
In turn, if a previously offline CPU is being brought back online, but someother CPUs sharing the policy object with it are online already, there is noneed to re-initialize the policy object at all. In that case, it only isnecessary to restart the scaling governor so that it can take the new online CPUinto account. That is achieved by invoking the governor’s->stop and->start() callbacks, in this order, for the entire policy.
As mentioned before, theintel_pstate scaling driver bypasses the scalinggovernor layer ofCPUFreq and provides its own P-state selection algorithms.Consequently, ifintel_pstate is used, scaling governors are not attached tonew policy objects. Instead, the driver’s->setpolicy() callback is invokedto register per-CPU utilization update callbacks for each policy. Thesecallbacks are invoked by the CPU scheduler in the same way as for scalinggovernors, but in theintel_pstate case they both determine the P-state touse and change the hardware configuration accordingly in one go from schedulercontext.
The policy objects created during CPU initialization and other data structuresassociated with them are torn down when the scaling driver is unregistered(which happens when the kernel module containing it is unloaded, for example) orwhen the last CPU belonging to the given policy in unregistered.
Policy Interface insysfs¶
During the initialization of the kernel, theCPUFreq core creates asysfs directory (kobject) calledcpufreq under/sys/devices/system/cpu/.
That directory contains apolicyX subdirectory (whereX represents aninteger number) for every policy object maintained by theCPUFreq core.EachpolicyX directory is pointed to bycpufreq symbolic linksunder/sys/devices/system/cpu/cpuY/ (whereY represents an integerthat may be different from the one represented byX) for all of the CPUsassociated with (or belonging to) the given policy. ThepolicyX directoriesin/sys/devices/system/cpu/cpufreq each contain policy-specificattributes (files) to controlCPUFreq behavior for the corresponding policyobjects (that is, for all of the CPUs associated with them).
Some of those attributes are generic. They are created by theCPUFreq coreand their behavior generally does not depend on what scaling driver is in useand what scaling governor is attached to the given policy. Some scaling driversalso add driver-specific attributes to the policy directories insysfs tocontrol policy-specific aspects of driver behavior.
The generic attributes under/sys/devices/system/cpu/cpufreq/policyX/are the following:
affected_cpusList of online CPUs belonging to this policy (i.e. sharing the hardwareperformance scaling interface represented by the
policyXpolicyobject).bios_limitIf the platform firmware (BIOS) tells the OS to apply an upper limit toCPU frequencies, that limit will be reported through this attribute (ifpresent).
The existence of the limit may be a result of some (often unintentional)BIOS settings, restrictions coming from a service processor or otherBIOS/HW-based mechanisms.
This does not cover ACPI thermal limitations which can be discoveredthrough a generic thermal driver.
This attribute is not present if the scaling driver in use does notsupport it.
cpuinfo_cur_freqCurrent frequency of the CPUs belonging to this policy as obtained fromthe hardware (in KHz).
This is expected to be the frequency the hardware actually runs at.If that frequency cannot be determined, this attribute should notbe present.
cpuinfo_avg_freqAn average frequency (in KHz) of all CPUs belonging to a given policy,derived from a hardware provided feedback and reported on a time framespanning at most few milliseconds.
This is expected to be based on the frequency the hardware actually runsat and, as such, might require specialised hardware support (such as AMUextension on ARM). If one cannot be determined, this attribute shouldnot be present.
Note that failed attempt to retrieve current frequency for a givenCPU(s) will result in an appropriate error, i.e.: EAGAIN for CPU thatremains idle (raised on ARM).
cpuinfo_max_freqMaximum possible operating frequency the CPUs belonging to this policycan run at (in kHz).
cpuinfo_min_freqMinimum possible operating frequency the CPUs belonging to this policycan run at (in kHz).
cpuinfo_transition_latencyThe time it takes to switch the CPUs belonging to this policy from oneP-state to another, in nanoseconds.
related_cpusList of all (online and offline) CPUs belonging to this policy.
scaling_available_frequenciesList of available frequencies of the CPUs belonging to this policy(in kHz).
scaling_available_governorsList of
CPUFreqscaling governors present in the kernel that canbe attached to this policy or (if theintel_pstate scaling driver isin use) list of scaling algorithms provided by the driver that can beapplied to this policy.[Note that some governors are modular and it may be necessary to load akernel module for the governor held by it to become available and belisted by this attribute.]
scaling_cur_freqCurrent frequency of all of the CPUs belonging to this policy (in kHz).
In the majority of cases, this is the frequency of the last P-staterequested by the scaling driver from the hardware using the scalinginterface provided by it, which may or may not reflect the frequencythe CPU is actually running at (due to hardware design and otherlimitations).
Some architectures (e.g.
x86) may attempt to provide informationmore precisely reflecting the current CPU frequency through thisattribute, but that still may not be the exact current CPU frequency asseen by the hardware at the moment. This behavior though, is onlyavailable via c:macro:CPUFREQ_ARCH_CUR_FREQoption.scaling_driverThe scaling driver currently in use.
scaling_governorThe scaling governor currently attached to this policy or (if theintel_pstate scaling driver is in use) the scaling algorithmprovided by the driver that is currently applied to this policy.
This attribute is read-write and writing to it will cause a new scalinggovernor to be attached to this policy or a new scaling algorithmprovided by the scaling driver to be applied to it (in theintel_pstate case), as indicated by the string written to thisattribute (which must be one of the names listed by the
scaling_available_governorsattribute described above).scaling_max_freqMaximum frequency the CPUs belonging to this policy are allowed to berunning at (in kHz).
This attribute is read-write and writing a string representing aninteger to it will cause a new limit to be set (it must not be lowerthan the value of the
scaling_min_freqattribute).scaling_min_freqMinimum frequency the CPUs belonging to this policy are allowed to berunning at (in kHz).
This attribute is read-write and writing a string representing anon-negative integer to it will cause a new limit to be set (it must notbe higher than the value of the
scaling_max_freqattribute).scaling_setspeedThis attribute is functional only if theuserspace scaling governoris attached to the given policy.
It returns the last frequency requested by the governor (in kHz) or canbe written to in order to set a new frequency for the policy.
Generic Scaling Governors¶
CPUFreq provides generic scaling governors that can be used with allscaling drivers. As stated before, each of them implements a single, possiblyparametrized, performance scaling algorithm.
Scaling governors are attached to policy objects and different policy objectscan be handled by different scaling governors at the same time (although thatmay lead to suboptimal results in some cases).
The scaling governor for a given policy object can be changed at any time withthe help of thescaling_governor policy attribute insysfs.
Some governors exposesysfs attributes to control or fine-tune the scalingalgorithms implemented by them. Those attributes, referred to as governortunables, can be either global (system-wide) or per-policy, depending on thescaling driver in use. If the driver requires governor tunables to beper-policy, they are located in a subdirectory of each policy directory.Otherwise, they are located in a subdirectory under/sys/devices/system/cpu/cpufreq/. In either case the name of thesubdirectory containing the governor tunables is the name of the governorproviding them.
performance¶
When attached to a policy object, this governor causes the highest frequency,within thescaling_max_freq policy limit, to be requested for that policy.
The request is made once at that time the governor for the policy is set toperformance and whenever thescaling_max_freq orscaling_min_freqpolicy limits change after that.
powersave¶
When attached to a policy object, this governor causes the lowest frequency,within thescaling_min_freq policy limit, to be requested for that policy.
The request is made once at that time the governor for the policy is set topowersave and whenever thescaling_max_freq orscaling_min_freqpolicy limits change after that.
userspace¶
This governor does not do anything by itself. Instead, it allows user spaceto set the CPU frequency for the policy it is attached to by writing to thescaling_setspeed attribute of that policy. Though the intention may be toset an exact frequency for the policy, the actual frequency may vary dependingon hardware coordination, thermal and power limits, and other factors.
schedutil¶
This governor uses CPU utilization data available from the CPU scheduler. Itgenerally is regarded as a part of the CPU scheduler, so it can access thescheduler’s internal data structures directly.
It runs entirely in scheduler context, although in some cases it may need toinvoke the scaling driver asynchronously when it decides that the CPU frequencyshould be changed for a given policy (that depends on whether or not the driveris capable of changing the CPU frequency from scheduler context).
The actions of this governor for a particular CPU depend on the scheduling classinvoking its utilization update callback for that CPU. If it is invoked by theRT or deadline scheduling classes, the governor will increase the frequency tothe allowed maximum (that is, thescaling_max_freq policy limit). In turn,if it is invoked by the CFS scheduling class, the governor will use thePer-Entity Load Tracking (PELT) metric for the root control group of thegiven CPU as the CPU utilization estimate (see thePer-entity load trackingLWN.net article[1] for a description of the PELT mechanism). Then, the newCPU frequency to apply is computed in accordance with the formula
f = 1.25 *
f_0*util/max
whereutil is the PELT number,max is the theoretical maximum ofutil, andf_0 is either the maximum possible CPU frequency for the givenpolicy (if the PELT number is frequency-invariant), or the current CPU frequency(otherwise).
This governor also employs a mechanism allowing it to temporarily bump up theCPU frequency for tasks that have been waiting on I/O most recently, called“IO-wait boosting”. That happens when theSCHED_CPUFREQ_IOWAIT flagis passed by the scheduler to the governor callback which causes the frequencyto go up to the allowed maximum immediately and then draw back to the valuereturned by the above formula over time.
This governor exposes only one tunable:
rate_limit_usMinimum time (in microseconds) that has to pass between two consecutiveruns of governor computations (default: 1.5 times the scaling driver’stransition latency or the maximum 2ms).
The purpose of this tunable is to reduce the scheduler context overheadof the governor which might be excessive without it.
This governor generally is regarded as a replacement for the olderondemandandconservative governors (described below), as it is simpler and moretightly integrated with the CPU scheduler, its overhead in terms of CPU contextswitches and similar is less significant, and it uses the scheduler’s own CPUutilization metric, so in principle its decisions should not contradict thedecisions made by the other parts of the scheduler.
ondemand¶
This governor uses CPU load as a CPU frequency selection metric.
In order to estimate the current CPU load, it measures the time elapsed betweenconsecutive invocations of its worker routine and computes the fraction of thattime in which the given CPU was not idle. The ratio of the non-idle (active)time to the total CPU time is taken as an estimate of the load.
If this governor is attached to a policy shared by multiple CPUs, the load isestimated for all of them and the greatest result is taken as the load estimatefor the entire policy.
The worker routine of this governor has to run in process context, so it isinvoked asynchronously (via a workqueue) and CPU P-states are updated fromthere if necessary. As a result, the scheduler context overhead from thisgovernor is minimum, but it causes additional CPU context switches to happenrelatively often and the CPU P-state updates triggered by it can be relativelyirregular. Also, it affects its own CPU load metric by running code thatreduces the CPU idle time (even though the CPU idle time is only reduced veryslightly by it).
It generally selects CPU frequencies proportional to the estimated load, so thatthe value of thecpuinfo_max_freq policy attribute corresponds to the load of1 (or 100%), and the value of thecpuinfo_min_freq policy attributecorresponds to the load of 0, unless when the load exceeds a (configurable)speedup threshold, in which case it will go straight for the highest frequencyit is allowed to use (thescaling_max_freq policy limit).
This governor exposes the following tunables:
sampling_rateThis is how often the governor’s worker routine should run, inmicroseconds.
Typically, it is set to values of the order of 2000 (2 ms). Itsdefault value is to add a 50% breathing roomto
cpuinfo_transition_latencyon each policy this governor isattached to. The minimum is typically the length of two schedulerticks.If this tunable is per-policy, the following shell command sets the timerepresented by it to be 1.5 times as high as the transition latency(the default):
# echo `$(($(cat cpuinfo_transition_latency) * 3 / 2))` > ondemand/sampling_rate
up_thresholdIf the estimated CPU load is above this value (in percent), the governorwill set the frequency to the maximum value allowed for the policy.Otherwise, the selected frequency will be proportional to the estimatedCPU load.
ignore_nice_loadIf set to 1 (default 0), it will cause the CPU load estimation code totreat the CPU time spent on executing tasks with “nice” levels greaterthan 0 as CPU idle time.
This may be useful if there are tasks in the system that should not betaken into account when deciding what frequency to run the CPUs at.Then, to make that happen it is sufficient to increase the “nice” levelof those tasks above 0 and set this attribute to 1.
sampling_down_factorTemporary multiplier, between 1 (default) and 100 inclusive, to apply tothe
sampling_ratevalue if the CPU load goes aboveup_threshold.This causes the next execution of the governor’s worker routine (aftersetting the frequency to the allowed maximum) to be delayed, so thefrequency stays at the maximum level for a longer time.
Frequency fluctuations in some bursty workloads may be avoided this wayat the cost of additional energy spent on maintaining the maximum CPUcapacity.
powersave_biasReduction factor to apply to the original frequency target of thegovernor (including the maximum value used when the
up_thresholdvalue is exceeded by the estimated CPU load) or sensitivity thresholdfor the AMD frequency sensitivity powersave bias driver(drivers/cpufreq/amd_freq_sensitivity.c), between 0 and 1000inclusive.If the AMD frequency sensitivity powersave bias driver is not loaded,the effective frequency to apply is given by
f * (1 -
powersave_bias/ 1000)where f is the governor’s original frequency target. The default valueof this attribute is 0 in that case.
If the AMD frequency sensitivity powersave bias driver is loaded, thevalue of this attribute is 400 by default and it is used in a differentway.
On Family 16h (and later) AMD processors there is a mechanism to get ameasured workload sensitivity, between 0 and 100% inclusive, from thehardware. That value can be used to estimate how the performance of theworkload running on a CPU will change in response to frequency changes.
The performance of a workload with the sensitivity of 0 (memory-bound orIO-bound) is not expected to increase at all as a result of increasingthe CPU frequency, whereas workloads with the sensitivity of 100%(CPU-bound) are expected to perform much better if the CPU frequency isincreased.
If the workload sensitivity is less than the threshold represented bythe
powersave_biasvalue, the sensitivity powersave bias driverwill cause the governor to select a frequency lower than its originaltarget, so as to avoid over-provisioning workloads that will not benefitfrom running at higher CPU frequencies.
conservative¶
This governor uses CPU load as a CPU frequency selection metric.
It estimates the CPU load in the same way as theondemand governor describedabove, but the CPU frequency selection algorithm implemented by it is different.
Namely, it avoids changing the frequency significantly over short time intervalswhich may not be suitable for systems with limited power supply capacity (e.g.battery-powered). To achieve that, it changes the frequency in relativelysmall steps, one step at a time, up or down - depending on whether or not a(configurable) threshold has been exceeded by the estimated CPU load.
This governor exposes the following tunables:
freq_stepFrequency step in percent of the maximum frequency the governor isallowed to set (the
scaling_max_freqpolicy limit), between 0 and100 (5 by default).This is how much the frequency is allowed to change in one go. Settingit to 0 will cause the default frequency step (5 percent) to be usedand setting it to 100 effectively causes the governor to periodicallyswitch the frequency between the
scaling_min_freqandscaling_max_freqpolicy limits.down_thresholdThreshold value (in percent, 20 by default) used to determine thefrequency change direction.
If the estimated CPU load is greater than this value, the frequency willgo up (by
freq_step). If the load is less than this value (and thesampling_down_factormechanism is not in effect), the frequency willgo down. Otherwise, the frequency will not be changed.sampling_down_factorFrequency decrease deferral factor, between 1 (default) and 10inclusive.
It effectively causes the frequency to go down
sampling_down_factortimes slower than it ramps up.
Frequency Boost Support¶
Background¶
Some processors support a mechanism to raise the operating frequency of somecores in a multicore package temporarily (and above the sustainable frequencythreshold for the whole package) under certain conditions, for example if thewhole chip is not fully utilized and below its intended thermal or power budget.
Different names are used by different vendors to refer to this functionality.For Intel processors it is referred to as “Turbo Boost”, AMD calls it“Turbo-Core” or (in technical documentation) “Core Performance Boost” and so on.As a rule, it also is implemented differently by different vendors. The simpleterm “frequency boost” is used here for brevity to refer to all of thoseimplementations.
The frequency boost mechanism may be either hardware-based or software-based.If it is hardware-based (e.g. on x86), the decision to trigger the boosting ismade by the hardware (although in general it requires the hardware to be putinto a special state in which it can control the CPU frequency within certainlimits). If it is software-based (e.g. on ARM), the scaling driver decideswhether or not to trigger boosting and when to do that.
Theboost File insysfs¶
This file is located under/sys/devices/system/cpu/cpufreq/ and controlsthe “boost” setting for the whole system. It is not present if the underlyingscaling driver does not support the frequency boost mechanism (or supports it,but provides a driver-specific interface for controlling it, likeintel_pstate).
If the value in this file is 1, the frequency boost mechanism is enabled. Thismeans that either the hardware can be put into states in which it is able totrigger boosting (in the hardware-based case), or the software is allowed totrigger boosting (in the software-based case). It does not mean that boostingis actually in use at the moment on any CPUs in the system. It only means apermission to use the frequency boost mechanism (which still may never be usedfor other reasons).
If the value in this file is 0, the frequency boost mechanism is disabled andcannot be used at all.
The only values that can be written to this file are 0 and 1.
Rationale for Boost Control Knob¶
The frequency boost mechanism is generally intended to help to achieve optimumCPU performance on time scales below software resolution (e.g. below thescheduler tick interval) and it is demonstrably suitable for many workloads, butit may lead to problems in certain situations.
For this reason, many systems make it possible to disable the frequency boostmechanism in the platform firmware (BIOS) setup, but that requires the system tobe restarted for the setting to be adjusted as desired, which may not bepractical at least in some cases. For example:
Boosting means overclocking the processor, although under controlledconditions. Generally, the processor’s energy consumption increasesas a result of increasing its frequency and voltage, even temporarily.That may not be desirable on systems that switch to power sources oflimited capacity, such as batteries, so the ability to disable the boostmechanism while the system is running may help there (but that depends onthe workload too).
In some situations deterministic behavior is more important thanperformance or energy consumption (or both) and the ability to disableboosting while the system is running may be useful then.
To examine the impact of the frequency boost mechanism itself, it is usefulto be able to run tests with and without boosting, preferably withoutrestarting the system in the meantime.
Reproducible results are important when running benchmarks. Sincethe boosting functionality depends on the load of the whole package,single-thread performance may vary because of it which may lead tounreproducible results sometimes. That can be avoided by disabling thefrequency boost mechanism before running benchmarks sensitive to thatissue.
Legacy AMDcpb Knob¶
The AMD powernow-k8 scaling driver supports asysfs knob very similar tothe globalboost one. It is used for disabling/enabling the “CorePerformance Boost” feature of some AMD processors.
If present, that knob is located in everyCPUFreq policy directory insysfs (/sys/devices/system/cpu/cpufreq/policyX/) and is calledcpb, which indicates a more fine grained control interface. The actualimplementation, however, works on the system-wide basis and setting that knobfor one policy causes the same value of it to be set for all of the otherpolicies at the same time.
That knob is still supported on AMD processors that support its underlyinghardware feature, but it may be configured out of the kernel (via theCONFIG_X86_ACPI_CPUFREQ_CPB configuration option) and the globalboost knob is present regardless. Thus it is always possible use theboost knob instead of thecpb one which is highly recommended, as thatis more consistent with what all of the other systems do (and thecpb knobmay not be supported any more in the future).
Thecpb knob is never present for any processors without the underlyinghardware feature (e.g. all Intel ones), even if theCONFIG_X86_ACPI_CPUFREQ_CPB configuration option is set.
References¶
[1]Jonathan Corbet,Per-entity load tracking,https://lwn.net/Articles/531853/