Horizontal Pod Autoscaling

In Kubernetes, aHorizontalPodAutoscaler automatically updates a workload resource (such asaDeployment orStatefulSet), with theaim of automatically scaling capacity to match demand.

Horizontal scaling means that the response to increased load is to deploy morePods.This is different fromvertical scaling, which for Kubernetes would meanassigning more resources (for example: memory or CPU) to the Pods that are alreadyrunning for the workload.

If the load decreases, and the number of Pods is above the configured minimum,the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,or other similar resource) to scale back down.

Horizontal pod autoscaling does not apply to objects that can't be scaled (for example:aDaemonSet.)

The HorizontalPodAutoscaler is implemented as a Kubernetes API resource and acontroller.The resource determines the behavior of the controller.The horizontal pod autoscaling controller, running within the Kubernetescontrol plane, periodically adjusts thedesired scale of its target (for example, a Deployment) to match observed metrics such as averageCPU utilization, average memory utilization, or any other custom metric you specify.

There iswalkthrough example of usinghorizontal pod autoscaling.

How does a HorizontalPodAutoscaler work?

graph BThpa[HorizontalPodAutoscaler] --> scale[Scale]subgraph rc[Deployment]scaleendscale -.-> pod1[Pod 1]scale -.-> pod2[Pod 2]scale -.-> pod3[Pod N]classDef hpa fill:#D5A6BD,stroke:#1E1E1D,stroke-width:1px,color:#1E1E1D;classDef rc fill:#F9CB9C,stroke:#1E1E1D,stroke-width:1px,color:#1E1E1D;classDef scale fill:#B6D7A8,stroke:#1E1E1D,stroke-width:1px,color:#1E1E1D;classDef pod fill:#9FC5E8,stroke:#1E1E1D,stroke-width:1px,color:#1E1E1D;class hpa hpa;class rc rc;class scale scale;class pod1,pod2,pod3 pod

Figure 1. HorizontalPodAutoscaler controls the scale of a Deployment and its ReplicaSet

Kubernetes implements horizontal pod autoscaling as a control loop that runs intermittently(it is not a continuous process). The interval is set by the--horizontal-pod-autoscaler-sync-period parameter to thekube-controller-manager(and the default interval is 15 seconds).

Once during each period, the controller manager queries the resource utilization against themetrics specified in each HorizontalPodAutoscaler definition. The controller managerfinds the target resource defined by thescaleTargetRef,then selects the pods based on the target resource's.spec.selector labels,and obtains the metrics from either the resource metrics API (for per-pod resource metrics),or the custom metrics API (for all other metrics).

  • For per-pod resource metrics (like CPU), the controller fetches the metricsfrom the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler.Then, if a target utilization value is set, the controller calculates the utilizationvalue as a percentage of the equivalentresource requeston the containers in each Pod. If a target raw value is set, the raw metric values are used directly.The controller then takes the mean of the utilization or the raw value (depending on the typeof target specified) across all targeted Pods, and produces a ratio used to scalethe number of desired replicas.

    Please note that if some of the Pod's containers do not have the relevant resource request set,CPU utilization for the Pod will not be defined and the autoscaler willnot take any action for that metric. See thealgorithm details section belowfor more information about how the autoscaling algorithm works.

  • For per-pod custom metrics, the controller functions similarly to per-pod resource metrics,except that it works with raw values, not utilization values.

  • For object metrics and external metrics, a single metric is fetched, which describesthe object in question. This metric is compared to the targetvalue, to produce a ratio as above. In theautoscaling/v2 APIversion, this value can optionally be divided by the number of Pods before thecomparison is made.

The common use for HorizontalPodAutoscaler is to configure it to fetch metrics fromaggregated APIs(metrics.k8s.io,custom.metrics.k8s.io, orexternal.metrics.k8s.io). Themetrics.k8s.io API isusually provided by an add-on named Metrics Server, which needs to be launched separately.For more information about resource metrics, seeMetrics Server.

Support for metrics APIs explains the stability guarantees and support status for thesedifferent APIs.

The HorizontalPodAutoscaler controller accesses corresponding workload resources that support scaling (such as Deploymentsand StatefulSet). These resources each have a subresource namedscale, an interface that allows you to dynamically set thenumber of replicas and examine each of their current states.For general information about subresources in the Kubernetes API, seeKubernetes API Concepts.

Algorithm details

From the most basic perspective, the HorizontalPodAutoscaler controlleroperates on the ratio between desired metric value and current metricvalue:

$$\begin{equation*}desiredReplicas = ceil\left\lceil currentReplicas \times \frac{currentMetricValue}{desiredMetricValue} \right\rceil\end{equation*}$$

For example, if the current metric value is200m, and the desired valueis100m, the number of replicas will be doubled, since\( { 200.0 \div 100.0 } = 2.0 \).
If the current value is instead50m, you'll halve the number ofreplicas, since \( { 50.0 \div 100.0 } = 0.5 \). The control plane skips any scalingaction if the ratio is sufficiently close to 1.0 (within aconfigurable tolerance, 0.1 by default).

When atargetAverageValue ortargetAverageUtilization is specified,thecurrentMetricValue is computed by taking the average of the givenmetric across all Pods in the HorizontalPodAutoscaler's scale target.

Before checking the tolerance and deciding on the final values, the controlplane also considers whether any metrics are missing, and how many PodsareReady.All Pods with a deletion timestamp set (objects with a deletion timestamp arein the process of being shut down / removed) are ignored, and all failed Podsare discarded.

If a particular Pod is missing metrics, it is set aside for later; Podswith missing metrics will be used to adjust the final scaling amount.

When scaling on CPU, if any pod has yet to become ready (it's stillinitializing, or possibly is unhealthy)or the most recent metric point forthe pod was before it became ready, that pod is set aside as well.

Due to technical constraints, the HorizontalPodAutoscaler controllercannot exactly determine the first time a pod becomes ready whendetermining whether to set aside certain CPU metrics. Instead, itconsiders a Pod "not yet ready" if it's unready and transitioned toready within a short, configurable window of time since it started.This value is configured with the--horizontal-pod-autoscaler-initial-readiness-delaycommand line option, and its default is 30 seconds.Once a pod has become ready, it considers any transition toready to be the first if it occurred within a longer, configurable timesince it started. This value is configured with the--horizontal-pod-autoscaler-cpu-initialization-period command line option,and its default is 5 minutes.

The \( currentMetricValue \over desiredMetricValue \) base scale ratio is thencalculated, using the remaining pods not set aside or discarded from above.

If there were any missing metrics, the control plane recomputes the average moreconservatively, assuming those pods were consuming 100% of the desiredvalue in case of a scale down, and 0% in case of a scale up. This dampensthe magnitude of any potential scale.

Furthermore, if any not-yet-ready pods were present, and the workload would havescaled up without factoring in missing metrics or not-yet-ready pods,the controller conservatively assumes that the not-yet-ready pods are consuming 0%of the desired metric, further dampening the magnitude of a scale up.

After factoring in the not-yet-ready pods and missing metrics, thecontroller recalculates the usage ratio. If the new ratio reverses the scaledirection, or is within the tolerance, the controller doesn't take any scalingaction. In other cases, the new ratio is used to decide any change to thenumber of Pods.

Note that theoriginal value for the average utilization is reportedback via the HorizontalPodAutoscaler status, without factoring in thenot-yet-ready pods or missing metrics, even when the new usage ratio isused.

If multiple metrics are specified in a HorizontalPodAutoscaler, thiscalculation is done for each metric, and then the largest of the desiredreplica counts is chosen. If any of these metrics cannot be convertedinto a desired replica count (e.g. due to an error fetching the metricsfrom the metrics APIs) and a scale down is suggested by the metrics whichcan be fetched, scaling is skipped. This means that the HPA is still capableof scaling up if one or more metrics give adesiredReplicas greater thanthe current value.

Finally, right before HPA scales the target, the scale recommendation is recorded. Thecontroller considers all recommendations within a configurable window choosing thehighest recommendation from within that window. You can configure this value using the--horizontal-pod-autoscaler-downscale-stabilization command line option, which defaults to 5 minutes.This means that scaledowns will occur gradually, smoothing out the impact of rapidlyfluctuating metric values.

Pod readiness and autoscaling metrics

The HorizontalPodAutoscaler (HPA) controller includes two command line options that influence how CPU metrics are collected from Pods during startup:

  1. --horizontal-pod-autoscaler-cpu-initialization-period (default: 5 minutes)

This defines the time window after a Pod starts during which itsCPU usage is ignored unless:- The Pod is in aReady stateand- The metric sample was taken entirely during the period it wasReady.

This command line option helpsexclude misleading high CPU usage from initializing Pods (for example: Java apps warming up) in HPA scaling decisions.

  1. --horizontal-pod-autoscaler-initial-readiness-delay (default: 30 seconds)

This defines a short delay period after a Pod starts during which the HPA controller treats Pods that are currentlyUnready as still initializing,even if they have previously transitioned toReady briefly.

It is designed to:- Avoid including Pods that rapidly fluctuate betweenReady andUnready during startup.- Ensure stability in the initial readiness signal before HPA considers their metrics valid.

You can only set these command line options cluster-wide.

Key behaviors for pod readiness

  • If a Pod isReady and remainsReady, it can be counted as contributing metrics even within the delay.
  • If a Pod rapidly toggles betweenReady andUnready, metrics are ignored until it’s considered stablyReady.

Good practice for pod readiness

  • Configure astartupProbe that doesn't pass until the high CPU usage has passed, or
  • Ensure yourreadinessProbe only reportsReadyafter the CPU spike subsides, usinginitialDelaySeconds.

And ideally also set--horizontal-pod-autoscaler-cpu-initialization-period tocover the startup duration.

API object

The HorizontalPodAutoscaler is an API kind in the Kubernetesautoscaling API group. The current stable version can be found intheautoscaling/v2 API version which includes support for scaling onmemory and custom metrics. The new fields introduced inautoscaling/v2 are preserved as annotations when working withautoscaling/v1.

When you create a HorizontalPodAutoscaler API object, make sure the name specified is a validDNS subdomain name.More details about the API object can be found atHorizontalPodAutoscaler Object.

Stability of workload scale

When managing the scale of a group of replicas using the HorizontalPodAutoscaler,it is possible that the number of replicas keeps fluctuating frequently due to thedynamic nature of the metrics evaluated. This is sometimes referred to asthrashing,orflapping. It's similar to the concept ofhysteresis in cybernetics.

Autoscaling during rolling update

Kubernetes lets you perform a rolling update on a Deployment. In thatcase, the Deployment manages the underlying ReplicaSets for you.When you configure autoscaling for a Deployment, you bind aHorizontalPodAutoscaler to a single Deployment. The HorizontalPodAutoscalermanages thereplicas field of the Deployment. The deployment controller is responsiblefor setting thereplicas of the underlying ReplicaSets so that they add up to a suitablenumber during the rollout and also afterwards.

If you perform a rolling update of a StatefulSet that has an autoscaled number ofreplicas, the StatefulSet directly manages its set of Pods (there is no intermediate resourcesimilar to ReplicaSet).

Support for resource metrics

Any HPA target can be scaled based on the resource usage of the pods in the scaling target.When defining the pod specification the resource requests likecpu andmemory shouldbe specified. This is used to determine the resource utilization and used by the HPA controllerto scale the target up or down. To use resource utilization based scaling specify a metric sourcelike this:

type:Resourceresource:name:cputarget:type:UtilizationaverageUtilization:60

With this metric the HPA controller will keep the average utilization of the pods in the scalingtarget at 60%. Utilization is the ratio between the current usage of resource to the requestedresources of the pod. SeeAlgorithm for more details about how the utilizationis calculated and averaged.

Note:

Since the resource usages of all the containers are summed up the total pod utilization may notaccurately represent the individual container resource usage. This could lead to situations wherea single container might be running with high usage and the HPA will not scale out because the overallpod usage is still within acceptable limits.

Container resource metrics

FEATURE STATE:Kubernetes v1.30 [stable](enabled by default)

The HorizontalPodAutoscaler API also supports a container metric source where the HPA can track theresource usage of individual containers across a set of Pods, in order to scale the target resource.This lets you configure scaling thresholds for the containers that matter most in a particular Pod.For example, if you have a web application and a sidecar container that provides logging, you can scale based on the resourceuse of the web application, ignoring the sidecar container and its resource use.

If you revise the target resource to have a new Pod specification with a different set of containers,you should revise the HPA spec if that newly added container should also be used forscaling. If the specified container in the metric source is not present or only present in a subsetof the pods then those pods are ignored and the recommendation is recalculated. SeeAlgorithmfor more details about the calculation. To use container resources for autoscaling define a metricsource as follows:

type:ContainerResourcecontainerResource:name:cpucontainer:applicationtarget:type:UtilizationaverageUtilization:60

In the above example the HPA controller scales the target such that the average utilization of the cpuin theapplication container of all the pods is 60%.

Note:

If you change the name of a container that a HorizontalPodAutoscaler is tracking, you canmake that change in a specific order to ensure scaling remains available and effectivewhilst the change is being applied. Before you update the resource that defines the container(such as a Deployment), you should update the associated HPA to track both the new andold container names. This way, the HPA is able to calculate a scaling recommendationthroughout the update process.

Once you have rolled out the container name change to the workload resource, tidy up by removingthe old container name from the HPA specification.

Scaling on custom metrics

FEATURE STATE:Kubernetes v1.23 [stable]

(theautoscaling/v2beta2 API version previously provided this ability as a beta feature)

Provided that you use theautoscaling/v2 API version, you can configure a HorizontalPodAutoscalerto scale based on a custom metric (that is not built in to Kubernetes or any Kubernetes component).The HorizontalPodAutoscaler controller then queries for these custom metrics from the KubernetesAPI.

SeeSupport for metrics APIs for the requirements.

Scaling on multiple metrics

FEATURE STATE:Kubernetes v1.23 [stable]

(theautoscaling/v2beta2 API version previously provided this ability as a beta feature)

Provided that you use theautoscaling/v2 API version, you can specify multiple metrics for aHorizontalPodAutoscaler to scale on. Then, the HorizontalPodAutoscaler controller evaluates each metric,and proposes a new scale based on that metric. The HorizontalPodAutoscaler takes the maximum scalerecommended for each metric and sets the workload to that size (provided that this isn't larger than theoverall maximum that you configured).

Support for metrics APIs

By default, the HorizontalPodAutoscaler controller retrieves metrics from a series of APIs.In order for it to access these APIs, cluster administrators must ensure that:

  • TheAPI aggregation layer is enabled.

  • The corresponding APIs are registered:

    • For resource metrics, this is themetrics.k8s.ioAPI,generally provided bymetrics-server.It can be launched as a cluster add-on.

    • For custom metrics, this is thecustom.metrics.k8s.ioAPI.It's provided by "adapter" API servers provided by metrics solution vendors.Check with your metrics pipeline to see if there is a Kubernetes metrics adapter available.

    • For external metrics, this is theexternal.metrics.k8s.ioAPI.It may be provided by the custom metrics adapters provided above.

For more information on these different metrics paths and how they differ please see the relevant design proposals forthe HPA V2,custom.metrics.k8s.ioandexternal.metrics.k8s.io.

For examples of how to use them seethe walkthrough for using custom metricsandthe walkthrough for using external metrics.

Configurable scaling behavior

FEATURE STATE:Kubernetes v1.23 [stable]

(theautoscaling/v2beta2 API version previously provided this ability as a beta feature)

If you use thev2 HorizontalPodAutoscaler API, you can use thebehavior field(see theAPI reference)to configure separate scale-up and scale-down behaviors.You specify these behaviors by settingscaleUp and / orscaleDownunder thebehavior field.

Scaling policies let you control the rate of change of replicas while scaling.Also two settings can be used to preventflapping: you can specify astabilization window for smoothing replica counts, and a tolerance to ignoreminor metric fluctuations below a specified threshold.

Scaling policies

One or more scaling policies can be specified in thebehavior section of the spec.When multiple policies are specified the policy which allows the highest amount ofchange is the policy which is selected by default. The following example shows this behaviorwhile scaling down:

behavior:scaleDown:policies:-type:Podsvalue:4periodSeconds:60-type:Percentvalue:10periodSeconds:60

periodSeconds indicates the length of time in the past for which the policy must hold true.The maximum value that you can set forperiodSeconds is 1800 (half an hour).The first policy(Pods) allows at most 4 replicas to be scaled down in one minute. The second policy(Percent) allows at most 10% of the current replicas to be scaled down in one minute.

Since by default the policy which allows the highest amount of change is selected, the second policy willonly be used when the number of pod replicas is more than 40. With 40 or less replicas, the first policy will be applied.For instance if there are 80 replicas and the target has to be scaled down to 10 replicasthen during the first step 8 replicas will be reduced. In the next iteration when the numberof replicas is 72, 10% of the pods is 7.2 but the number is rounded up to 8. On each loop ofthe autoscaler controller the number of pods to be change is re-calculated based on the numberof current replicas. When the number of replicas falls below 40 the first policy(Pods) is appliedand 4 replicas will be reduced at a time.

The policy selection can be changed by specifying theselectPolicy field for a scalingdirection. By setting the value toMin which would select the policy which allows thesmallest change in the replica count. Setting the value toDisabled completely disablesscaling in that direction.

Stabilization window

The stabilization window is used to restrict theflapping ofreplica count when the metrics used for scaling keep fluctuating. The autoscaling algorithmuses this window to infer a previous desired state and avoid unwanted changes to workloadscale.

For example, in the following example snippet, a stabilization window is specified forscaleDown.

behavior:scaleDown:stabilizationWindowSeconds:300

When the metrics indicate that the target should be scaled down the algorithm looksinto previously computed desired states, and uses the highest value from the specifiedinterval. In the above example, all desired states from the past 5 minutes will be considered.

This approximates a rolling maximum, and avoids having the scaling algorithm frequentlyremove Pods only to trigger recreating an equivalent Pod just moments later.

Tolerance

FEATURE STATE:Kubernetes v1.35 [beta](enabled by default)

Thetolerance field configures a threshold for metric variations, preventing theautoscaler from scaling for changes below that value.

This tolerance is defined as the amount of variation around the desired metric value underwhich no scaling will occur. For example, consider a HorizontalPodAutoscaler configuredwith a target memory consumption of 100MiB and a scale-up tolerance of 5%:

behavior:scaleUp:tolerance:0.05# 5% tolerance for scale up

With this configuration, the HPA algorithm will only consider scaling up if the memoryconsumption is higher than 105MiB (that is: 5% above the target).

If you don't set this field, the HPA applies the default cluster-wide tolerance of 10%. Thisdefault can be updated for both scale-up and scale-down using thekube-controller-manager--horizontal-pod-autoscaler-tolerance command line argument. (You can't use the Kubernetes APIto configure this default value.)

Default behavior

To use the custom scaling not all fields have to be specified. Only values which need to becustomized can be specified. These custom values are merged with default values. The default valuesmatch the existing behavior in the HPA algorithm.

behavior:scaleDown:stabilizationWindowSeconds:300policies:-type:Percentvalue:100periodSeconds:15scaleUp:stabilizationWindowSeconds:0policies:-type:Percentvalue:100periodSeconds:15-type:Podsvalue:4periodSeconds:15selectPolicy:Max

For scaling down the stabilization window is300 seconds (or the value of the--horizontal-pod-autoscaler-downscale-stabilization command line option, if provided). There is only a single policyfor scaling down which allows a 100% of the currently running replicas to be removed whichmeans the scaling target can be scaled down to the minimum allowed replicas.For scaling up there is no stabilization window. When the metrics indicate that the target should bescaled up the target is scaled up immediately. There are 2 policies where 4 pods or a 100% of the currentlyrunning replicas may at most be added every 15 seconds till the HPA reaches its steady state.

Example: change downscale stabilization window

To provide a custom downscale stabilization window of 1 minute, the followingbehavior would be added to the HPA:

behavior:scaleDown:stabilizationWindowSeconds:60

Example: limit scale down rate

To limit the rate at which pods are removed by the HPA to 10% per minute, thefollowing behavior would be added to the HPA:

behavior:scaleDown:policies:-type:Percentvalue:10periodSeconds:60

To ensure that no more than 5 Pods are removed per minute, you can add a second scale-downpolicy with a fixed size of 5, and setselectPolicy to minimum. SettingselectPolicy toMin meansthat the autoscaler chooses the policy that affects the smallest number of Pods:

behavior:scaleDown:policies:-type:Percentvalue:10periodSeconds:60-type:Podsvalue:5periodSeconds:60selectPolicy:Min

Example: disable scale down

TheselectPolicy value ofDisabled turns off scaling the given direction.So to prevent downscaling the following policy would be used:

behavior:scaleDown:selectPolicy:Disabled

Support for HorizontalPodAutoscaler in kubectl

HorizontalPodAutoscaler, like every API resource, is supported in a standard way bykubectl.You can create a new autoscaler usingkubectl create command.You can list autoscalers bykubectl get hpa or get detailed description bykubectl describe hpa.Finally, you can delete an autoscaler usingkubectl delete hpa.

In addition, there is a specialkubectl autoscale command for creating a HorizontalPodAutoscaler object.For instance, executingkubectl autoscale rs foo --min=2 --max=5 --cpu-percent=80will create an autoscaler for ReplicaSetfoo, with target CPU utilization set to80%and the number of replicas between 2 and 5.

Implicit maintenance-mode deactivation

You can implicitly deactivate the HPA for a target without theneed to change the HPA configuration itself. If the target's desired replica countis set to 0, and the HPA's minimum replica count is greater than 0, the HPAstops adjusting the target (and sets theScalingActive Condition on itselftofalse) until you reactivate it by manually adjusting the target's desiredreplica count or HPA's minimum replica count.

Migrating Deployments and StatefulSets to horizontal autoscaling

When an HPA is enabled, it is recommended that the value ofspec.replicas ofthe Deployment and / or StatefulSet be removed from theirmanifest(s). If this isn't done, any timea change to that object is applied, for example viakubectl apply -f deployment.yaml, this will instruct Kubernetes to scale the current number of Podsto the value of thespec.replicas key. This may not bedesired and could be troublesome when an HPA is active, resulting in thrashing or flapping behavior.

Keep in mind that the removal ofspec.replicas may incur a one-timedegradation of Pod counts as the default value of this key is 1 (referenceDeployment Replicas).Upon the update, all Pods except 1 will begin their termination procedures. Anydeployment application afterwards will behave as normal and respect a rollingupdate configuration as desired. You can avoid this degradation by choosing one of the following twomethods based on how you are modifying your deployments:

  1. kubectl apply edit-last-applied deployment/<deployment_name>
  2. In the editor, removespec.replicas. When you save and exit the editor,kubectlapplies the update. No changes to Pod counts happen at this step.
  3. You can now removespec.replicas from the manifest. If you use source code management,also commit your changes or take whatever other steps for revising the source codeare appropriate for how you track updates.
  4. From here on out you can runkubectl apply -f deployment.yaml

When using theServer-Side Applyyou can follow thetransferring ownershipguidelines, which cover this exact use case.

What's next

If you configure autoscaling in your cluster, you may also want to consider usingnode autoscalingto ensure you are running the right number of nodes.You can also read more aboutvertical Pod autoscaling.

For more information on HorizontalPodAutoscaler:

Last modified January 05, 2026 at 3:17 PM PST:Update horizontal-pod-autoscale.md (0d33c42026)