Autoscaling groups of instances Stay organized with collections Save and categorize content based on your preferences.
Managed instance groups (MIGs) offerautoscaling capabilities that let you automatically add or delete virtualmachine (VM) instances from a MIG based on increases ordecreases in load. Autoscaling helps your apps gracefully handle increases intraffic and reduce costs when the need for resources is lower. You define theautoscaling policy and the autoscaler performs automaticscaling based on the measured load and the options you configure.
Autoscaling works by adding more VMs to your MIG when there ismore load (scaling out), and deletingVMs when the need for VMs is lowered (scaling in).
Prerequisites
The autoscaler uses theCompute Engine Service Agentto add and remove instances in the group. Google Cloud automaticallycreates this service account, as well as its IAM policy binding to theCompute Engine Service Agent role, when the Compute Engine APIisenabled.
If your project is missing this account—for instance, if you have removed it—youcan add it manually:
Console
In the Google Cloud console, go to theIAM page.
ClickGrant Access.
In theNew principals field, enter
service-PROJECT_NUMBER@compute-system.iam.gserviceaccount.com.Select theCompute Engine Service Agent role.
ClickSave.
gcloud
gcloud projects add-iam-policy-bindingPROJECT_ID \ --member serviceAccount:service-PROJECT_NUMBER@compute-system.iam.gserviceaccount.com \ --role roles/compute.serviceAgent
Fundamentals
Autoscaling uses the following fundamental concepts and services.
Managed instance groups
Autoscaling is a feature ofmanaged instance groups (MIGs).A managed instance group is a collection of virtual machine (VM) instances thatare created from a commoninstance template.An autoscaler adds or deletes instances from a managed instance group basedon the group's autoscaling policy. Although Compute Engine has bothmanaged and unmanaged instance groups, only managed instance groups can be usedwith an autoscaler.
To understand the difference between a managed instance group and an unmanagedinstance group, seeInstance groups.
To learn how to create a managed instance group, seeCreating MIGs.
Autoscaling policy
When you define an autoscaling policy for your group, you specify one or moresignals that the autoscaler uses to scale the group. When you set multiplesignals in a policy, the autoscaler calculates the recommended number of VMsfor each signal and sets your group's recommended size to the largest number.
An autoscaling policy must always have at least one scaling signal. When youturn on autoscaling in a MIG, by default, the autoscaler adds a CPU utilizationsignal. You can edit this default signal, or remove and add other signals in thepolicy.
The following sections provide an overview of signals based on targetutilization metrics and signals based on schedules.
Target utilization metrics
You can autoscale based on one or more of the following metrics that reflect theload of the instance group:
- Average CPU utilization
- HTTP load balancing serving capacity
- Cloud Monitoring metrics
The autoscaler continuously collects usage information based on the selectedutilization metric, compares actual utilization to your desired targetutilization, and uses this information to determine whether the group needsto remove instances (scale in) or add instances (scale out).
The target utilization level is the level at which you want to maintainyour virtual machine (VM) instances. For example, if you scale based on CPUutilization, you can set your target utilization level at 75% and the autoscalerwill maintain the CPU utilization of the specified group of instances at orclose to 75%. The utilization level for each metric is interpreted differentlybased on the autoscaling policy.
If you autoscale based on any of the following, then your MIG cannot scalein to zero instances:
- Average CPU utilization
- HTTP load balancing serving capacity
- Monitoring metrics that come from each instance in the MIG
However, you can use other Monitoring metrics when scaling in tozero instances, provided you set the minimum number of instances(autoscalingPolicy.minNumReplicas) to0.
For more information about scaling based on target utilization metrics, seethe following pages:
- Scaling based on CPU utilization
- Scaling based on load balancing serving capacity
- Scaling based on Cloud Monitoring metrics
Schedules
You can use schedule-based autoscaling to allocate capacity for anticipatedloads. You can have up to 128 scaling schedules per instance group.For each scaling schedule, specify the following:
- Capacity: minimum required VM instances
- Schedule: start time, duration, and recurrence(for example, once, daily, weekly, or monthly)
Each scaling schedule is active from its start time and for the configuredduration. During this time, the autoscaler scales the group to have at least asmany instances as defined by the scaling schedule.
When using schedules, your MIG can scale into zero instances if all of the following conditions are met:
- The minimum number of instances (
autoscalingPolicy.minNumReplicas) is set to0. - The autoscaling policy doesn't contain any active schedules.
- The autoscaling policy doesn't contain signals based ontarget utilization metrics that prevent scaling in tozero instances.
For more information, seeScaling based on schedules.
Initialization period
The initialization period, formerly known as cool down period, is the durationit takes for applications to initialize on your VM instances.While an application is initializing on an instance, the instance's usage datamight not reflect normal circumstances. So the autoscaler uses theinitialization period for scaling decisions in the following ways:
- For scale-in decisions, the autoscaler considers usage data from allinstances, even an instance that is still within its initialization period.The autoscaler recommends to remove instances if the average utilization fromall instances is less than the target utilization.
- For scale-out decisions, the autoscaler ignores usage data from instances thatare still in their initialization period.
- If you enablepredictive mode, the initialization periodinforms the predictive autoscaler to scale out further in advance ofanticipated load, so that applications are initialized when the load arrives.For example, if you set the initialization period to 300 seconds, thenpredictive autoscaler creates VMs 5 minutes ahead of forecasted load.
By default, the initialization period is 60 seconds. Actual initialization timesvary because of numerous factors. We recommend thatyou test how long your application takes to initialize. To do this, create aninstance and time the startup process from when the instance becomesRUNNINGuntil the application is ready.
If you set a initialization period value that is significantly longer than thetime it takes for an instance to initialize, then your autoscaler might ignorelegitimate utilization data, and it might underestimate the required size ofyour group, causing adelay in scaling out.
Stabilization period
Autoscaling signals like CPU utilization are not very stable and can changerapidly. As the load goes up and down, the autoscaler needs to stabilize thesignal to avoid continuous VM deletion and creation. The autoscalerstabilizes a signal by keeping sufficient VM capacity in order to serve the peakload that is observed during thestabilization period.
The stabilization period is equal to 10 minutes or to theinitialization period that you set,whichever is longer. The stabilization period is used only for scale-indecisions when the autoscaler has to delete VMs.
When the load goes down, the autoscaler does not delete VMsimmediately. The autoscaler keeps monitoring capacity needed for the duration ofthe stabilization period and deletes VMs only when there is sufficient capacityto meet the peak load. This might appear as a delay inscaling in, but it is a built-in feature of autoscaling.
If your application takes longer than 10 minutes to initialize on a new VM, thenthe autoscaler uses the initialization period instead of the default10 minutes of stabilization to wait until the VM can be deleted. This ensuresthat the autoscaler decision to delete VM takes into account how long it takesto get back the serving capacity.
When the load goes up, the autoscaler does not usestabilization period and immediately creates as many VMs as needed to meet thedemand.
Autoscaling mode
If you need to investigate or configure your group without interference fromautoscaler operations, you can temporarilyturn off or restrict autoscaling activities.The autoscaler's configuration persists while it is turned off or restricted,and all autoscaling activities resume when you turn it on again or lift therestriction.
Predictive autoscaling
If you enable predictive autoscaling to optimize your MIG for availability, theautoscaler forecasts future load based on historical data and scales out a MIGin advance of predicted load, so that new instances are ready to serve when theload arrives.
Predictive autoscaling works best if your workload meets the following criteria:
- Your application takes a long time to initialize—for example, if you configureainitialization period of more than 2 minutes.
- Your workload varies predictably with daily or weekly cycles.
For more information, seeScaling based on predictions.
Scale-in controls
If your workloads take many minutes to initialize (for example, due tolengthy installation tasks), you can reduce the risk of response latency causedby abrupt scale-in events byconfiguring scale-in controls.Specifically, if you expect load spikes to follow soon after declines,you can limit the scale-in rate to prevent autoscaling from reducing a MIG'ssize by more VM instances than your workload can tolerate.
You don't have to configure scale-in controls if your application initializesquickly enough to pick up load spikes on scale out.
To configure scale-in controls, set the following properties in yourautoscaling policy.
Maximum allowed reduction.The number of VM instances that your workload can afford to lose (from itspeak size) within the specified trailing time window. Use this parameter tolimit how much your group can be scaled in so that you can still servea likely load spike until more instances start serving. The smaller you setthe maximum allowed reduction, the longer it takes for your group to scale in.
Trailing time window.The history within which the autoscaler monitors the peak size required byyour workload. The autoscaler will not resize below the maximum allowedreduction subtracted from the peak size observed in this period.You can use this parameter to define how long the autoscaler shouldwait before removing instances, as defined by the maximum allowed reduction.With a longer trailing time window, the autoscaler considers more historicalpeaks, making scale-in more conservative and stable.
For more information, seeConfiguring scale-in controlsandUnderstanding autoscaler decisions.
Recommended size
The recommended group size is the autoscaler's recommended number of VMs thatthe managed instance group should maintain, based on peak load observed duringthe last 10 minutes. These last 10 minutes are referred to as thestabilization period.The recommended size is recalculated constantly. If you set an autoscalingpolicy with scale-in controls, then the recommended size is constrained by yourscale-in controls.
Limitations
You cannot use autoscaling with the following instance groups, which don'tallow the autoscaler to create or delete VMs according to demand:
- Unmanaged instance groups
- MIGs withstateful configuration
- MIGs withVM repairs turned off
You cannotcreate VM instances with specific nameswhile autoscaling is turned on.
Do not use Compute Engine autoscaling with MIGsthat are owned by Google Kubernetes Engine. For Google Kubernetes Engine groups, usecluster autoscaling instead. Ifyou're not sure whether a MIG is part of a GKEcluster, look for the
gkeprefix in the MIG name. For example,gke-test-1-3-default-pool-eadji9ah.
What happens during autohealing
Autoscaling works independently fromautohealing.If you configure autohealing for your group and an instance fails the healthcheck, the MIG attempts to recreate the instance. While an instance isbeing recreated by the MIG, the number of running instances in thegroup might be lower than the minimum number of instances specified for thegroup (autoscalingPolicy.minNumReplicas).
Pricing
There is no additional charge for configuring an autoscaling policy. Autoscalerdynamically adds or deletes VM instances, so you are charged only for theresources that your MIG uses. You can control resource cost by configuring theminimum and maximum number of instances in the autoscaling policy.ForCompute Engine pricing information, seePricing.
What's next
- Learnhow autoscaling works in a regional MIG.
- If you don't have an existing MIG, review how tocreate a managed instance group.
Create an autoscaler that scales on:
Manage your autoscaler, forexample, to get information about it, to configure scale-in controls, or totemporarily restrict it.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.