Scaling based on predictions

You can configureautoscaling for amanaged instance group (MIG)to automatically add or remove virtual machine (VM) instances based on increasesor decreases in load. However, if your application takes a few minutes or moreto initialize, adding instances in response to real-time changes might notincrease your application's capacity quickly enough. For example, if there's alarge increase in load (like when users first wake up in the morning), someusers might experience delays while your application is initializing on newinstances.

You can usepredictive autoscaling to improve response times for applicationswith long initialization times and whose workloads vary predictably with dailyor weekly cycles.

When you enable predictive autoscaling, Compute Engine forecasts futureload based on your MIG's history and scales out the MIG in advance of predictedload, so that new instances are ready to serve when the load arrives. Withoutpredictive autoscaling, an autoscaler can only scale a group reactively, basedon observed changes in load in real time. With predictive autoscaling enabled,the autoscaler works with real-time data as well as with historical data tocover both the current and forecasted load. For more information, seeHow predictive autoscaling worksandChecking if predictive autoscaling is suitable for your workload.

Before you begin

Pricing

Predictive autoscaling is free of charge. However, if you enable predictiveautoscaling to optimize for availability, you pay for the Compute Engineresources that your MIG uses.

Note: Depending on your current set up, enabling predictive autoscaling mightincrease your costs. Predictive autoscaler starts VM instances earlier, andmight run VMs for a longer period of time. Use predictive autoscaling on MIGswhere you need to improve availability by keeping utilization below the target.

Limitations

Suitable workloads

Predictive autoscaling works best if your workload meets the following criteria:

  • Your application takes a long time to initialize—for example, if you configureaninitialization period of morethan 2 minutes.
  • Your workload varies predictably with daily or weekly cycles.

If your service takes a long time to initialize, your users might experienceservice latency after a scale-out event, that is, while the new VMs areprovisioned but not yet serving. Predictive autoscaling takes into account yourapplication's initialization time and scales out in advance of predictedincreases in usage, helping to ensure that the number of available servinginstances is sufficient for the target utilization.

To preview how predictive autoscaling can affect your group, seeChecking if predictive autoscaling is suitable for your workload.

Enabling and disabling predictive autoscaling

You can enable predictive autoscaling when scaling based on CPU utilization.For more information about setting up CPU-based autoscaling, seeScaling based on CPU utilization.

If your MIG has no autoscaler history, it can take 3 days before thepredictive algorithm affects the autoscaler. During this time, the groupscales based on real-time data only. After 3 days, the group starts to scaleusing predictions. As more historical load is collected, the predictiveautoscaler better understands your load patterns and its forecasts improve.Compute Engine uses up to 3 weeks of your MIG's load history to feed themachine learning model.

Note: Google is continually improving and updating the machine learningalgorithm used by predictive autoscaling.

Permissions required for this task

To perform this task, you must have the followingpermissions:

  • compute.autoscalers.update on the group's autoscaler
  • compute.instanceGroupManagers.use on the group

Console

  1. In the console, go to theInstance groups page.

    Go to Instance groups

  2. From the list, click the name of an existing MIG to openthe group's overview page.

  3. ClickEdit.

  4. ClickGroup size & autoscaling to expand the section and do thefollowing:

    1. If autoscaling configuration doesn't exist, then configure autoscalingas follows:

      1. In theAutoscaling section, clickConfigure autoscaling.CPU utilization signal is added by default.
      2. Specify the minimum and maximum numbers of instances that you wantthe autoscaler to create in this group.
      3. In theAutoscaling signals section, click the CPU utilizationsignal.
      4. Modify theSignal type andTarget CPU utilization asneeded.
      5. In thePredictive autoscaling section, selectOptimize for availability to enable predictive autoscaling.

        • Alternatively, if you want to disable the predictive algorithmand use only the real-time autoscaler, selectOff.
      6. ClickDone.

    2. If autoscaling based on CPU utilization is already configured, then dothe following:

      1. In theAutoscaling signals section, click the CPU utilizationsignal.
      2. In thePredictive autoscaling section, selectOptimize for availability to enable predictive autoscaling.
    3. ClickDone.

  5. In theInitialization period section, specify how long it takes foryour application to initialize on a new instance. This setting informs thepredictive autoscaler to scale out further in advance of anticipated load,so that applications are initialized when the load arrives.

  6. ClickSave.

gcloud

When setting or updating a MIG's autoscaler, include the--cpu-utilization-predictive-method flag with one of the following values:

  • optimize-availability: to enable the predictive algorithm
  • none (default): to disable the predictive algorithm
Note: To check if a group has an existing autoscaling policy, and to make anote of any settings that you want to replicate when using theset-autoscaling command,reviewyour group's existing configuration.

If CPU-based autoscaling is not yet enabled for the group, you must enable it.You can use theset-autoscaling commandto configure a group's autoscaling policy from scratch. For example, thefollowing command shows how to configure autoscaling with the followingsettings:

  • Predictive autoscaling enabled.
  • Target CPU utilization of 75%.
  • The maximum number of instances set to 20.
  • An initialization period (--cool-down-period) set to 5 minutes. Thissetting informs the predictiveautoscaler to scale out 5 minutes in advance of anticipated load, so thatapplications are initialized when the load arrives.
gcloud compute instance-groups managed set-autoscalingMIG_NAME \--cpu-utilization-predictive-methodoptimize-availability \  --target-cpu-utilization0.75 \  --max-num-replicas20 \  --cool-down-period300

If CPU-based autoscaling is already enabled for the group, use theupdate-autoscaling commandto enable the predictive algorithm:

gcloud compute instance-groups managed update-autoscalingMIG_NAME \--cpu-utilization-predictive-method=optimize-availability

REST

When creating or updating an autoscaler, include thepredictiveMethod fieldin the request body with one of the following values:

  • OPTIMIZE_AVAILABILITY: to enable the predictive algorithm
  • NONE (default): to disable the predictive algorithm
Note: To see if a group has an existing autoscaling policy,reviewyour group's existing configuration.

If the group has no existing autoscaling configuration, do the following:

If the group already has an autoscaling configuration, do the following:

If CPU-based autoscaling is not yet enabled for the group, you must enable it.

For example, the following request patches an existing autoscaler resource toenable CPU-based autoscaling with the following settings:

  • Predictive autoscaling enabled.
  • Target CPU utilization of 75%.
  • The maximum number of instances set to 20.
  • An initialization period (coolDownPeriodSec) set to 5 minutes. Thissetting informs the predictiveautoscaler to scale out 5 minutes in advance of anticipated load, so thatapplications are initialized when the load arrives.
PATCH https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/regions/REGION/autoscalers/{  "name": "AUTOSCALER_NAME",  "target": "https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/regions/REGION/instanceGroupManagers/MIG_NAME",  "autoscalingPolicy": {    "cpuUtilization": {       "utilizationTarget":0.75,"predictiveMethod": "OPTIMIZE_AVAILABILITY"     },    "maxNumReplicas":20,"coolDownPeriodSec":300  }}

Checking if predictive autoscaler is enabled

To view a MIG's current autoscaling configuration, seeGetting a MIG's properties.

Configuring predictive autoscaling

For more information about how to configure the target utilization, minimum andmaximum number of instances, and the initialization period, seeScaling based on CPU utilization.When you configure these options, the predictive autoscaler works to maintainall instances at the target utilization level that you set, within the minimumand maximum bounds of the group, in the same way that a real-time autoscalerdoes.

Use the initialization period setting to account for the time it takes for yourapplication to initialize. This setting influences how far in advance thepredictive autoscaler starts new instances ahead of predicted increase in load,so that your application is ready to serve when the load arrives.

Checking if predictive autoscaling is suitable for your workload

To see if predictive autoscaling might improve your application'savailability, you can compare the performance of your group's currentCPU-based autoscaling configuration against predictive autoscaling. You don'tneed to enable predictive autoscaling in order to make the comparison.

For more information about workloads that are suitable for predictiveautoscaling, seeSuitable workloads.

Checking for overloads

Your autoscaled MIG is overloaded when its average CPU utilization exceeds yourtarget. To check if your autoscaling configuration resulted in overloaded VMsduring the last 7 days, and to see if predictive autoscaling can reduceoverloads, complete the following steps:

  1. In the console, go to theInstance groups page.

    Go to Instance groups

  2. Click an existing MIG for which CPU-based autoscaling is configured. The group's overview page opens.

  3. ClickEdit.

  4. ClickGroup size & autoscaling to expand the section.

  5. In theAutoscaling section, underAutoscaling signals, expand theCPU utilization section, then clickSee if predictive autoscaling can optimize your availability.

  6. Based on data for the last 7 days, the table shows how many VMs were used per day and how many VMs were overloaded per day for the following rows:

    • Current autoscaling configuration: shows how the autoscaler performedbased on the autoscaler's configuration over the last 7 days.
    • With predictive autoscaling set to "Optimize for availability": showshow the autoscaler would have performed if predictive autoscaling wasenabled over the last 7 days.

You can use the "Number of VMs used per day" as a proxy for costs. Forexample, to reduce the daily number of overloaded VMs, the predictive autoscalermight create VMs earlier and run them for longer, which results in additionalcharges.

Monitoring and simulating predictive autoscaling

You can visualize the historical size of your group using Cloud Monitoring.The monitoring graph shows how your autoscaling configuration scaled your groupover time, and it also shows how predictive autoscaling, if enabled, would havescaled your group.

For groups with predictive autoscaling disabled, you can use this tool tosimulate predictive autoscaling before enabling it.

  1. In the console, go to theInstance groups page.

    Go to Instance groups

  2. Click an existing MIG for which CPU-based autoscaling is configured. The group's overview page opens.

  3. ClickMonitoring to see charts related to the group.

  4. In the first chart, click its title and selectPredictive autoscaling. This view shows the group's actual size as well as its predicted size.

  5. You can select a different time range to see more history or zoom in into a period where demand grew to see how predictive autoscaling affects group size ahead of forecasted load.

How predictive autoscaling works

Predictive autoscaler forecasts your scaling metric based on the metric'shistorical trends. Forecasts are recomputed every few minutes, which lets theautoscaler rapidly adapt its forecast to very recent changes in load. Predictiveautoscaler needs at least 3 days of history from which to determine arepresentative service usage pattern before it can provide predictions.Compute Engine uses up to 3 weeks of your MIG's load history to feed themachine learning model.

Predictive autoscaler calculates the number of VMs needed to achieve yourutilization target based on numerous factors, including the following:

  • The predicted future value of the scaling metric
  • The current value of the scaling metric
  • Confidence in past trends, including past variability of the scaling metric
  • The configured application initialization period, also referred to as theinitialization period

Based on such factors, the predictive autoscaler scales out your group aheadof anticipated demand.

Comparison of serving VMs with and without predictive autoscaling.

Figure 1. Comparison of serving VMs with and without predictive autoscaling.

In figure 1, the blue line shows a growing demand for VMs. The black line showsthe autoscaler's response: more VMs are added. However, for applications withlong initialization times, the grey line shows that the added VMs requireadditional time before they are ready to serve, which can result in not enoughserving VMs to meet the demand. With predictive autoscaling enabled, thepredicted increase in demand and the long application initialization time areaccounted for: the autoscaler responds by adding VMs earlier, resulting in asufficient number of serving VMs. You can configure how far in advance newinstances are added by setting theinitialization period.

Real-time usage data

Predictive autoscaler can't determine a pattern for all future changes in usagebased on historical data, so it works seamlessly with real-time data, too. Forexample, an unexpected news event might contribute to a spike in usage thatcouldn't have been predicted based on history alone. To handle suchunpredictable changes in load, the predictive autoscaler responds as follows:

  • It adapts its predictions: Predictions are recalculated constantly, withinminutes, so they adjust to incorporate the latest data. The exact timing ofadjustments to new patterns depends on, among other things, how repeatablethe new pattern is and how large the difference is between the new pattern andpast predictions.
  • It yields to real-time data: The autoscaler's recommended number of instances,based on real-time values of the metric, is always sufficient to meet thegroup's target utilization. If the current value of a real-time signal isgreater than the prediction, the current value of the signal takes priorityover the prediction. As a result, MIGs that have predictive autoscalingenabled always have more availability than MIGs that don't.

Two charts show how predictions adapt to actual CPU usage.

Figure 2. Two charts show how predictions adapt to actual CPU usage.

In figure 2, the dotted yellow line shows the prediction at t1. Butthe actual CPU usage, as shown by the solid blue line, is different thanpredicted. On the left chart, the actual CPU usage is higher than predicted. Onthe right chart, the actual CPU usage is lower than predicted. The dotted blueline shows the adjusted prediction.

Short, unpredictable spikes

Short, unpredictable peaks are covered in real time. The autoscaler creates atleast as many instances as needed to keep utilization at the configured target,based on the current actual value of the metric. However, these instancesaren't created in advance, as shown in the following figure.

A short, unpredictable spike causes the autoscaler to react in real time.

Figure 3. A short, unpredictable spike causes the autoscaler to react inreal time.

In figure 3, the solid blue line shows actual CPU usage. An unexpected spike inCPU usage could not be predicted. Because the autoscaler always monitorsreal-time data, it adds instances to accommodate the spike. The solid black lineillustrates the autoscaler's reactive addition of VMs in response to the spike.The solid grey line shows the number of serving VMs. The grey line lags behindthe black line due to the application's initialization time. In this scenario,the group is temporarily overloaded.

Sudden dips

Another type of unpredictable change in usage is a sudden dip, for example, adip caused by a failure in part of the application stack. When that happens, thenumber of instances initially follows the forecast. However, over time, theforecast adjusts to the lower-than-forecasted usage, resulting in a scale-in.The exact timing of this adjustment depends on numerous factors, including: howoften the pattern occurred in the past, how long the dip lasts, and how deep thedip is.

A sudden dip causes the predictive autoscaler to change its forecast.

Figure 4. A sudden dip causes the predictive autoscaler to change its forecast.

In figure 4, the dotted yellow line shows the prediction at t1. Butthe actual CPU usage, as shown by the solid blue line, fell lower thanpredicted. The dotted blue line shows the updated prediction, which wasautomatically adjusted after observing lower-than-forecasted usage. This resultsin the autoscaler removing instances following the standardstabilization period.

Historical data

Predictive autoscaler needs at least 3 days of historical load to startforecasting. If you have a new MIG that lacks historical data,Compute Engine scales your group reactively using real-time data untilsufficient historical data becomes available. After 3 days, asCompute Engine collects additional usage data, the predictions improve.

If you update your application by creating a new MIG and deleting the oldone—for example, a blue-green deployment—then your new MIG needs 3 days ofhistorical load data before predictive autoscaling can start generatingforecasts again. If you want to preserve load history across MIGs so thatforecasts can start immediately when you create a new MIG,contact us to request instructions to join aprivate preview.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.