Scaling based on CPU utilization Stay organized with collections Save and categorize content based on your preferences.
The simplest form of autoscaling is to scale amanaged instance group (MIG)based on the CPU utilization of its instances.
You can also autoscale a MIG based on theload balancing serving capacity,Monitoring metrics,orschedules.
Before you begin
- Review the autoscalerlimitations.
- Read about autoscalerfundamentals.
- If you haven't already, set upauthentication. Authentication verifies your identity for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine by selecting one of the following options:
Select the tab for how you plan to use the samples on this page:
Console
When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.
gcloud
Install the Google Cloud CLI. After installation,initialize the Google Cloud CLI by running the following command:
gcloudinit
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.- Set a default region and zone.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
Install the Google Cloud CLI. After installation,initialize the Google Cloud CLI by running the following command:
gcloudinit
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.For more information, seeAuthenticate for using REST in the Google Cloud authentication documentation.
Scaling based on CPU utilization
You can autoscale based on the average CPU utilization of a managedinstance group (MIG). Using this policy tells the autoscaler to collect the CPUutilization of the instances in the group and determine whether it needsto scale. You set the target CPU utilization the autoscaler should maintain andthe autoscaler works to maintain that level.
The autoscaler treats the target CPU utilization level as a fraction of theaverage use of all vCPUs over time in the instance group. If the averageutilization of your total vCPUs exceeds the target utilization, the autoscaleradds more VM instances. If the average utilization of your total vCPUs isless than the target utilization, the autoscaler removes instances. For example,setting a 0.75 target utilization tells the autoscaler to maintain an averageutilization of 75% among all vCPUs in the instance group.
Caution: If your application takes a long time to initialize on new VMs, Googlerecommends that you do not set a target CPU utilization of 85% or above. In sucha case, if your application sees an increase in traffic, your MIG's CPUs mightbe at risk of getting overloaded while your application slowly initializes onthe new VMs that the autoscaler adds.You can also scale based on forecasted CPU utilization. For more information,and to see if this is suitable for your workload, seeScaling based on predictions.
Enable autoscaling based on CPU utilization
Permissions required for this task
To perform this task, you must have the followingpermissions:
compute.autoscalers.createon the projectcompute.instanceGroupManagers.useon the project
Console
In the console, go to theInstance groups page.
If you have an instance group, click the name of the instance group, andthen clickEdit. On the edit instance group page, do the following:
- ClickGroup size & autosclaing to expand the section.
- ClickConfigure autoscaling.
If you don't have an instance group, clickCreate instance group anddo the following:
- In theName field, specify a name for the group.
- In theInstance template list, select a template.
In theLocation section, depending on whether you're creating azonal or regional MIG, choose an option as follows:
- For a zonal MIG, selectSingle zone, and then select a regionand a zone.
- For a regional MIG, selectMultiple zones, and then select aregion and zones.
In theAutoscaling section, a CPU utilization autoscaling signalis added by default. You can either use the default values for the signalor do the following:
- Specify the minimum and the maximum numbers of instances that you wantthe autoscaler to create in this group.
To edit the target CPU utilization, click the CPU utilization signalto expand the section and specify the percentage.
- UnderPredictive autoscaling, selectOff. To learn moreabout predictive autoscaling, and whether it is suitable for yourworkload, seeScaling based on predictions.
ClickDone.
You can use theInitialization period to tell theautoscaler how long it takes for your application to initialize. Specifying an accurateinitialization period improves autoscaler decisions. For example, when scaling out, theautoscaler ignores data from VMs that are still initializing because those VMsmight not yet represent normal usage of your application. The default initializationperiod is 60 seconds.
ClickSave.
gcloud
Use theset-autoscalingsub-command to enable autoscaling for a managed instance group. For example,the following command creates an autoscaler that has a target CPUutilization of 60%. Along with the--target-cpu-utilization parameter,the--max-num-replicas parameter is also required when creating anautoscaler:
set-autoscalingcommand updates the existing autoscaler to the new specifications.gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \ --max-num-replicas 20 \ --target-cpu-utilization 0.60 \ --cool-down-period 90
You can use the--cool-down-period flag to set the initialization period, which tells theautoscaler how long it takes for your application to initialize. Specifying an accurateinitialization period improves autoscaler decisions. For example, when scaling out, theautoscaler ignores data from VMs that are still initializing because those VMsmight not yet represent normal usage of your application. The default initializationperiod is 60 seconds.
Optionally, you can enable predictive autoscaling to scale out ahead ofpredicted load. To learn whether predictive autoscaling is suitable for yourworkload, seeScaling based on predictions.
You can verify that autoscaling is successfully enabled by using theinstance-groups managed describe sub-command,which describes thecorresponding managed instance group and provides information aboutany autoscaling features for that instance group:
gcloud compute instance-groups managed describe example-managed-instance-group
For a list of availablegcloud commands and flags, see thegcloud reference.
REST
Note: Although autoscaling is a feature ofmanaged instance groups, it is a separate API resource. Keep that in mindwhen you construct API requests for autoscaling.To create an autoscaler, use theautoscalers.insert methodfor a zonal MIG or theregionAutoscalers.insert methodfor a regional MIG.
The following example creates an autoscaler for a zonal MIG:
POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/autoscalers/
Your request body must contain thename,target, andautoscalingPolicyfields.autoscalingPolicy must definecpuUtilization andmaxNumReplicas.
You can use thecoolDownPeriodSec field to set the initialization period, which tells theautoscaler how long it takes for your application to initialize. Specifying an accurateinitialization period improves autoscaler decisions. For example, when scaling out, theautoscaler ignores data from VMs that are still initializing because those VMsmight not yet represent normal usage of your application. The default initializationperiod is 60 seconds.
Optionally, you can enable predictive autoscaling to scale out ahead ofpredicted load. To learn whether predictive autoscaling is suitable for yourworkload, seeScaling based on predictions.
{ "name": "example-autoscaler", "target": "https://www.googleapis.com/compute/v1/projects/myproject/zones/us-central1-f/instanceGroupManagers/example-managed-instance-group", "autoscalingPolicy": { "maxNumReplicas": 10, "cpuUtilization": { "utilizationTarget": 0.6 }, "coolDownPeriodSec": 90 }}For more information about enabling autoscaling based on CPU utilization,complete the tutorial,Using autoscaling for highly scalable apps.
How autoscaler handles heavy CPU utilization
During periods of heavy CPU utilization, if utilization is close to100%, the autoscaler estimates that the group might already be heavilyoverloaded. In these cases, the autoscaler increases the number of virtualmachines by 50% at most.
Note: This behavior might change in the future. Google recommends that you notrely on this behavior.What's next
- Learn how to enablepredictive autoscaling.
- Learn aboutmanaging autoscalers.
- Learnhow autoscalers make decisions.
- Learn how to usemultiple autoscaling signalsto scale your group.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.