Configuring horizontal Pod autoscaling Stay organized with collections Save and categorize content based on your preferences.
This page shows you how to scale your deployments in Google Kubernetes Engine (GKE)by automatically adjusting your resources using metrics like resource allocation,load balancer traffic, custom metrics, or multiple metrics simultaneously. Thispage also provides step-by-step instructions for configuring aHorizontal Pod Autoscaler (HPA)profile, including how to view, delete, clean, and troubleshoot your HPA object. ADeployment is a Kubernetes API object that lets you run multiple replicas of Pods that are distributed among the nodes in a cluster..
This page is for Operators and Developers who manageapplication scaling in GKEand want to understand how to dynamically optimize performance and maintain costefficiency through horizontal Pod autoscaling. To learn more about common rolesand example tasks referenced in Google Cloudcontent, seeCommon GKE user roles and tasks.
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,install and theninitialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.Note: For existing gcloud CLI installations, make sure to set thecompute/regionproperty. If you use primarily zonal clusters, set thecompute/zoneinstead. By setting a default location, you can avoid errors in the gcloud CLI like the following:One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.
- Ensure that you have an existing Autopilot or Standardcluster. If you need one,create an Autopilot cluster.
API versions forHorizontalPodAutoscaler objects
When you use the Google Cloud console,HorizontalPodAutoscaler objects are created using theautoscaling/v2 API.
When you usekubectl to create or view information about a Horizontal Pod Autoscaler, you canspecify either theautoscaling/v1 API or theautoscaling/v2 API.
apiVersion: autoscaling/v1is the default, and lets you autoscalebased only on CPU utilization. To autoscale based on other metrics, usingapiVersion: autoscaling/v2is recommended. The exampleinCreate the example Deployment usesapiVersion: autoscaling/v1.apiVersion: autoscaling/v2is recommended for creating newHorizontalPodAutoscalerobjects. It lets you autoscale based on multiple metrics, includingcustom or external metrics. All other examples in this page useapiVersion: autoscaling/v2.
To check which API versions are supported, use thekubectl api-versionscommand.
You can specify which API to use whenviewing details about a Horizontal Pod Autoscaler that usesapiVersion: autoscaling/v2.
Create the example Deployment
Before you can create a Horizontal Pod Autoscaler, you must create the workload it monitors. Theexamples in this page apply different Horizontal Pod Autoscaler configurations to the followingnginx Deployment. Separate examples show a Horizontal Pod Autoscaler based onresource utilization, based on acustom or external metric,and based onmultiple metrics.
Save the following to a file namednginx.yaml:
apiVersion:apps/v1kind:Deploymentmetadata:name:nginxnamespace:defaultspec:replicas:3selector:matchLabels:app:nginxtemplate:metadata:labels:app:nginxspec:containers:-name:nginximage:nginx:1.7.9ports:-containerPort:80resources:# You must specify requests for CPU to autoscale# based on CPU utilizationrequests:cpu:"250m"This manifest specifies a value for CPU requests. If you want to autoscale basedon a resource's utilization as a percentage, you must specify requests for thatresource. If you don't specify requests, you can autoscale based only on theabsolute value of the resource's utilization, such as milliCPUs forCPU utilization.
To create the Deployment, apply thenginx.yaml manifest:
kubectlapply-fnginx.yamlThe Deployment hasspec.replicas set to 3, so three Pods are deployed.You can verify this using thekubectl get deployment nginx command.
Each of the examples in this page applies a different Horizontal Pod Autoscaler to an example nginxDeployment.
Autoscaling based on resources utilization
This example createsHorizontalPodAutoscaler object to autoscale thenginx Deployment when CPU utilizationsurpasses 50%, and ensures that there is always a minimum of 1replica and a maximum of 10 replicas.
You can create a Horizontal Pod Autoscaler that targets CPU using the Google Cloud console, thekubectl apply command, or for average CPU only, thekubectl autoscalecommand.
apiVersion: autoscaling/v1. For more information aboutthe available APIs, seeAPI versions forHorizontalPodAutoscaler objects.Console
Go to theWorkloads page in the Google Cloud console.
Click the name of the
nginxDeployment.ClicklistActions> Autoscale.
Specify the following values:
- Minimum number of replicas: 1
- Maximum number of replicas: 10
- Autoscaling metric: CPU
- Target: 50
- Unit: %
ClickDone.
ClickAutoscale.
kubectl apply
Save the following YAML manifest as a file namednginx-hpa.yaml:
apiVersion:autoscaling/v1kind:HorizontalPodAutoscalermetadata:name:nginxspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:nginx# Set the minimum and maximum number of replicas the Deployment can scale to.minReplicas:1maxReplicas:10# The target average CPU utilization percentage across all Pods.targetCPUUtilizationPercentage:50To create the HPA, apply the manifest using the following command:
kubectlapply-fnginx-hpa.yamlkubectl autoscale
To create aHorizontalPodAutoscaler object that only targets average CPU utilization, you can usethekubectl autoscalecommand:
kubectlautoscaledeploymentnginx--cpu-percent=50--min=1--max=10--dry-run and-o yaml flags to print a YAMLmanifest for a Horizontal Pod Autoscaler without actually creating it.To get a list of Horizontal Pod Autoscalers in the cluster, use the following command:
kubectlgethpaThe output is similar to the following:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEnginx Deployment/nginx 0%/50% 1 10 3 61sTo get details about the Horizontal Pod Autoscaler, you can use the Google Cloud console or thekubectl command.
Console
Go to theWorkloads page in the Google Cloud console.
Click the name of the
nginxDeployment.View the Horizontal Pod Autoscaler configuration in theAutoscaler section.
View more details about autoscaling events in theEvents tab.
kubectl get
To get details about the Horizontal Pod Autoscaler, you can usekubectl get hpa with the-o yamlflag. Thestatus field contains information about the current number ofreplicas and any recent autoscaling events.
kubectlgethpanginx-oyamlThe output is similar to the following:
apiVersion: autoscaling/v1kind: HorizontalPodAutoscalermetadata: annotations: autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ScaleDownStabilized","message":"recent recommendations were higher than current one, applying the highest recent recommendation"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ValidMetricFound","message":"the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"DesiredWithinRange","message":"the desired count is within the acceptable range"}]' autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"0"}}]' kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"maxReplicas":10,"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"nginx"},"targetCPUUtilizationPercentage":50}} creationTimestamp: "2019-10-30T19:42:43Z" name: nginx namespace: default resourceVersion: "220050" selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/nginx uid: 70d1067d-fb4d-11e9-8b2a-42010a8e013fspec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx targetCPUUtilizationPercentage: 50status: currentCPUUtilizationPercentage: 0 currentReplicas: 3 desiredReplicas: 3Before following the remaining examples in this page, delete the HPA:
kubectldeletehpanginxWhen you delete a Horizontal Pod Autoscaler, the number of replicas of the Deployment remain the same.A Deployment does not automatically revert back to its state before the Horizontal Pod Autoscaler wasapplied.
You can learn more aboutdeleting a Horizontal Pod Autoscaler.
Autoscaling based on load balancer traffic
Traffic-based autoscaling is a capability of GKE that integratestraffic utilization signals from load balancers to autoscale Pods.
Using traffic as an autoscaling signal might be helpful since traffic is aleading indicator of load that is complementary to CPU and memory. Built-inintegration with GKE ensures that the setup is easy and thatautoscaling reacts to traffic spikes quickly to meet demand.
Traffic-based autoscaling is enabled by theGateway controller and itsglobal traffic managementcapabilities. To learn more, seeTraffic-based autoscaling.
Autoscaling based on load balancer traffic is only available forGateway workloads.
Requirements
Traffic-based autoscaling has the following requirements:
- Supported on GKE versions 1.31 and later.
- Gateway API enabled in your GKE cluster.
- Supported for traffic that goes through load balancers deployed using theGateway API and either the
gke-l7-global-external-managed,gke-l7-regional-external-managed,gke-l7-rilb, or thegke-l7-gxlbGatewayClass.
Limitations
Traffic-based autoscaling has the following limitations:
- Not supported by the multi-cluster GatewayClasses(
gke-l7-global-external-managed-mc,gke-l7-regional-external-managed-mc,gke-l7-rilb-mc, andgke-l7-gxlb-mc). - Not supported for traffic using Services of type
LoadBalancer. - There must be a clear and isolated relationship between the componentsinvolved in traffic-based autoscaling. One Horizontal Pod Autoscaler must bededicated to scaling a single Deployment (or any scalable resource) exposed bya single Service.
- After configuring the capacity of your Service using the
maxRatePerEndpointfield, allow sufficient time (usually one minute, but potentially up to 15 minutesin large clusters) for the load balancer to beupdated with this change, before configuring the Horizontal Pod Autoscaler withtraffic-based metrics. This ensures your service won't temporarily experiencea situation where your cluster tries to autoscale based on metrics emitted bya load balancer still undergoing configuration. - If traffic-based autoscaling is used on a Service served by multiple loadbalancers (for example -- by both an Ingress and a Gateway, or by two Gateways), the HorizontalPod Autoscaler might consider the highest traffic value from individual load balancers tomake scaling decisions, rather than the sum of traffic values from all load balancers.
Deploy traffic-based autoscaling
The following exercise uses theHorizontalPodAutoscaler to autoscale thestore-autoscale Deployment based on the traffic it receives. AGateway accepts ingresstraffic from the internet for the Pods. The autoscaler compares traffic signalsfrom the Gateway with theper-Pod traffic capacitythat is configured on thestore-autoscale Service resource. By generatingtraffic to the Gateway, you influence the number of Pods deployed.
The following diagram demonstrates how traffic-based autoscaling works:
To deploy traffic-based autoscaling, perform the following steps:
For Standard clusters, confirm that the GatewayClasses are installedin your cluster. For Autopilot clusters, the GatewayClasses areinstalled by default.
kubectlgetgatewayclassThe output confirms that the GKE GatewayClass resources areready to use in your cluster:
NAME CONTROLLER ACCEPTED AGEgke-l7-global-external-managed networking.gke.io/gateway True 16hgke-l7-regional-external-managed networking.gke.io/gateway True 16hgke-l7-gxlb networking.gke.io/gateway True 16hgke-l7-rilb networking.gke.io/gateway True 16hIf you don't see this output,enable the Gateway APIin your GKE cluster.
Deploy the sample application and Gateway load balancer to your cluster:
kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/master/gateway/docs/store-autoscale.yamlThe sample application creates:
- A Deployment with 2 replicas.
- A Service with an associated
GCPBackendPolicysettingmaxRatePerEndpointset to10. To learn more about Gateway capabilities, seeGatewayClass capabilities. - An external Gateway for accessing the application on the internet.To learn more about how to use Gateway load balancers, seeDeploying Gateways.
- An HTTPRoute that matches all traffic and sends it to the
store-autoscaleService.
TheService capacityis a critical element when using traffic-based autoscaling because itdetermines the amount of per-Pod traffic that triggers an autoscaling event.It is configured using a
maxRatePerEndpointfield on aGCPBackendPolicyassociated with the Service, which defines the maximum traffic a Serviceshould receive in requests per second, per Pod. Service capacity is specificto your application.For more information, seeDetermining your Service's capacity.
Save the following manifest as
hpa.yaml: Note: If you previously used theapiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:store-autoscalespec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:store-autoscale# Set the minimum and maximum number of replicas the Deployment can scale to.minReplicas:1maxReplicas:10# This section defines that scaling should be based on the fullness of load balancer# capacity, using the following configuration.metrics:-type:Objectobject:describedObject:kind:Servicename:store-autoscalemetric:# The name of the custom metric which measures how "full" a backend is# relative to its configured capacity.name:"autoscaling.googleapis.com|gclb-capacity-fullness"target:# The target average value for the metric. The autoscaler adjusts the number# of replicas to maintain an average capacity fullness of 70% across all Pods.averageValue:70type:AverageValueautoscaling.googleapis.com|gclb-capacity-utilizationmetric name, we recommend that you switch to theautoscaling.googleapis.com|gclb-capacity-fullnessmetric name instead.This manifest describes a
HorizontalPodAutoscalerwith the followingproperties:minReplicasandmaxReplicas: sets the minimum and maximum number ofreplicas for this Deployment. In this configuration, the number of Pods canscale from 1 to 10 replicas.describedObject.name: store-autoscale: the reference to thestore-autoscaleService that defines the traffic capacity.scaleTargetRef.name: store-autoscale: the reference to thestore-autoscaleDeployment that defines the resource that is scaled bythe Horizontal Pod Autoscaler.averageValue: 70: target average value of 70% capacity utilization. Thisgives the Horizontal Pod Autoscaler a growth margin so that the runningPods can process excess traffic while new Pods are being created.
The Horizontal Pod Autoscaler results in the following traffic behavior:
- The number of Pods is adjusted between 1 and 10 replicas to achieve70% of the max rate per endpoint. This results in 7 RPS per Pod when
maxRatePerEndpoint=10. - At more than 7 RPS per pod, Pods are scaled up until they've reachedtheir maximum of 10 replicas or until the average traffic is 7 RPS per Pod.
- If traffic is reduced, Pods scale down to a reasonable rate using theHorizontal Pod Autoscaler algorithm.
You can alsodeploy a traffic generatorto validate traffic-based autoscaling behavior.
At 30 RPS, the Deployment is scaled to 5 replicas so that each replica ideallyreceives 6 RPS of traffic, which would be 60% utilization per Pod. This is underthe 70% target utilization and so the Pods are scaled appropriately.Depending on traffic fluctuations, the number of autoscaled replicas might alsofluctuate. For a more detailed description of how the number of replicas iscomputed, seeAutoscaling behavior.
Autoscaling based on a custom or external metric
To create horizontal Pod autoscalers forcustom metrics and external metrics, seeOptimize Pod autoscaling based on metrics.
Autoscaling based on multiple metrics
This example creates a Horizontal Pod Autoscaler that autoscales based on CPU utilization and acustom metric namedpackets_per_second.
If you followed the previous example and still have a Horizontal Pod Autoscaler namednginx,delete it before following this example.
This example requiresapiVersion: autoscaling/v2. For more informationabout the available APIs, seeAPI versions forHorizontalPodAutoscaler objects.
Before you can autoscale based on a custom metric, you must create the custommetric and configure your workload to export the metric toCloud Monitoring. For this reason, thepackets_per_second metric in themanifest below is included for illustration, but commented out. Seecustom metricsandthe Monitoring documentation forcreating custom metrics.
Save this YAML manifest as a file namednginx-multiple.yaml:
apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:nginxspec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:nginxminReplicas:1maxReplicas:10metrics:# The metrics to base the autoscaling on.-type:Resourceresource:name:cpu# Scale based on CPU utilization.target:type:UtilizationaverageUtilization:50# The HPA will scale the replicas to try and maintain an average# CPU utilization of 50% across all Pods.-type:Resourceresource:name:memory# Scale based on memory usage.target:type:AverageValueaverageValue:100Mi# The HPA will scale the replicas to try and maintain an average# memory usage of 100 Mebibytes (MiB) across all Pods.# Uncomment these lines if you create the custom packets_per_second metric and# configure your app to export the metric.# - type: Pods# pods:# metric:# name: packets_per_second# target:# type: AverageValue# averageValue: 100Apply the YAML manifest:
kubectlapply-fnginx-multiple.yamlWhen created, the Horizontal Pod Autoscaler monitors thenginx Deployment for average CPU utilization,average memory utilization, and (if you uncommented it) the custompackets_per_second metric. The Horizontal Pod Autoscaler autoscales the Deployment based on themetric whose value would create the larger autoscale event.
Configure the Performance HPA profile
The Performance HPA profile improves the reaction time of the Horizontal Pod Autoscaler,enabling it to quickly recalculate a large number ofHorizontalPodAutoscaler objects(up to 1,000 objects in minor versions 1.31-1.32 and 5,000 objects in version 1.33 or later).
This profile is automatically enabled on qualifying Autopilot clusterswith a control plane running GKE version 1.32 or later. ForStandard clusters, the profile is automatically enabled on qualifyingclusters with a control plane running GKE version 1.33 or later.
A Standard cluster is exempt from auto-enablement of the PerformanceHPA profile if it meets all of the following conditions:
- The cluster is upgrading from an earlier version to version 1.33 or later.
- The cluster has at least one node pool with any of the following machinetypes:
e2-micro,e2-custom-micro,g1-small,f1-micro. - Node auto-provisioning is not enabled.
You can also enable the Performance HPA profile on existing clusters if theymeet the requirements.
Requirements
To enable the Performance HPA profile, verify that your Autopilot andStandard clusters meet the following requirements:
- Your control plane is running GKE version 1.31 or later.
- If your control plane is running GKE version 1.31, enablesystem metric collection.
- TheAutoscaling APIis enabled in your cluster.
- Allnode Service Accountshave the
roles/autoscaling.metricsWriterrole assigned. - If you useVPC Service Controls,verify that theAutoscaling APIis included in your service perimeter.
Enable the Performance HPA profile
To enable the Performance HPA profile in your cluster, use the following command:
gcloudcontainerclustersupdateCLUSTER_NAME\--location=LOCATION\--project=PROJECT_ID\--hpa-profile=performanceReplace:
CLUSTER_NAME: The name of the cluster.LOCATION: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID: Your Google Cloud project ID.
gke-metrics-agent resourcerequests, and triggers a simultaneous restart of its Pods.This may cause temporary disruption on resource-constrained nodes due to Pod rescheduling.Disable the Performance HPA profile
To disable Performance HPA profile in a cluster, use the following command:
gcloudcontainerclustersupdateCLUSTER_NAME\--location=LOCATION\--project=PROJECT_ID\--hpa-profile=noneReplace:
CLUSTER_NAME: The name of the cluster.LOCATION: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID: Your Google Cloud project ID.
Viewing details about a Horizontal Pod Autoscaler
To view a Horizontal Pod Autoscaler's configuration and statistics, use the following command:
kubectldescribehpaHPA_NAMEReplaceHPA_NAME with the name of yourHorizontalPodAutoscaler object.
If the Horizontal Pod Autoscaler usesapiVersion: autoscaling/v2 and is based on multiplemetrics, thekubectl describe hpa command only shows the CPU metric. To seeall metrics, use the following command instead:
kubectldescribehpa.v2.autoscalingHPA_NAMEReplaceHPA_NAME with the name of yourHorizontalPodAutoscaler object.
Each Horizontal Pod Autoscaler's current status is shown inConditions field, and autoscaling eventsare listed in theEvents field.
Events: Reason is listed asHpaProfilePerformance.The output is similar to the following:
Name: nginxNamespace: defaultLabels: <none>Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"s...CreationTimestamp: Tue, 05 May 2020 20:07:11 +0000Reference: Deployment/nginxMetrics: ( current / target ) resource memory on pods: 2220032 / 100Mi resource cpu on pods (as a percentage of request): 0% (0) / 50%Min replicas: 1Max replicas: 10Deployment pods: 1 current / 1 desiredConditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True ReadyForNewScale recommended size matches current size ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource ScalingLimited False DesiredWithinRange the desired count is within the acceptable rangeEvents: <none>Deleting a Horizontal Pod Autoscaler
You can delete a Horizontal Pod Autoscaler using the Google Cloud console or thekubectl delete command.
Console
To delete thenginx Horizontal Pod Autoscaler:
Go to theWorkloads page in the Google Cloud console.
Click the name of the
nginxDeployment.ClicklistActions> Autoscale.
ClickDelete.
kubectl delete
To delete thenginx Horizontal Pod Autoscaler, use the following command:
kubectldeletehpanginxWhen you delete a Horizontal Pod Autoscaler, the Deployment or (or other deployment object) remainsat its existing scale, and does not revert back to the number of replicas inthe Deployment's original manifest. To manually scale the Deployment back tothree Pods, you can use thekubectl scale command:
kubectlscaledeploymentnginx--replicas=3Cleaning up
Delete the Horizontal Pod Autoscaler, if you have not done so:
kubectldeletehpanginxDelete the
nginxDeployment:kubectldeletedeploymentnginxOptionally,delete the cluster.
Troubleshooting
For advice on troubleshooting, seeTroubleshoot horizontal Pod autoscaling.
What's next
- Learn more aboutHorizontal Pod Autoscaling.
- Learn more aboutVertical Pod Autoscaling.
- Learn how tooptimize Pod autoscaling based on metrics.
- Learn more aboutautoscaling Deployments with Custom Metrics.
- Learn how toAssign CPU Resources to Containers and Pods.
- Learn how toAssign Memory Resources to Containers and Pods.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.