Using autoscaling for highly scalable applications

This tutorial explains how to useautoscalingto automatically adjust the number of VM instances that are hosting yourapplication, allowing your application to adapt to varying amounts of traffic.

To use autoscaling, host your application on a managed instancegroup. Amanaged instance groupis a collection of instances that are all running the same applicationand can be managed as a single entity. When a managed instance group hasautoscaling enabled, the number of VMs in the instance group automaticallyincreases (scales out) or decreases (scales in) according to thetarget value that you specifyfor your autoscaling policy.

This tutorial includes detailed steps for launching a web applicationon a managed instance group, setting up autoscaling, configuring network access,and observing autoscaling by simulating load spikes and drops. Depending on yourexperience with these features, this tutorial takes about 20 minutes tocomplete.

Objectives

Launch a demo web application on a managed instance group.
Observe the effects of autoscaling by simulating traffic spikes and drops.

Costs

In this document, you use the following billable components of Google Cloud:

Compute Engine

To generate a cost estimate based on your projected usage, use thepricing calculator.

New Google Cloud users might be eligible for afree trial.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.

Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Application architecture

The application includes the following Compute Engine components:

Firewall rule:a Google Cloudfirewall that lets you allow or deny traffic to your instances.
Instance template:a template used to create each VM instance in the managed instance group.
Regional managed instance group:a group of VM instances running the same application across multiple zones.

Launching the web application

This tutorial uses a web application that is stored on GitHub. If you wouldlike learn more about how the application was implemented, see theGoogleCloudPlatform/python-docs-samples repository on GitHub.

Launch the web application on every VM in a managed instance group by includinga startup script in an instance template. To allow HTTP traffic to the webapplication, create a firewall rule.

Create a firewall rule

Note: This firewall rule allows ingress HTTP traffic for all instances that areon the default network and have thehttp-server networking tag.

Create a firewall rule to allow HTTP traffic to the web application:

In the Google Cloud console, go to theFirewalls page.
Go to Firewalls
ClickCreate firewall rule.
UnderName, enterdefault-allow-http.
SetNetwork todefault.
SetTargets to selectSpecified target tags.
UnderTarget Tags, enterhttp-server.
SetSource filter toIPv4 ranges.
UnderSource IPv4 ranges, enter0.0.0.0/0
to allow access for all IP addresses.
UnderProtocols and ports, selectSpecified protocols and ports.Then, selectTCP and enter80 toallow access for HTTP traffic.
ClickCreate.

Create an instance template

Create an instance template that launches the demo web application on startup:

In the Google Cloud console, go to theInstance templates page.
Go to Instance templates
ClickCreate instance template.
UnderName, enterautoscaling-web-app-template.
UnderMachine configuration, set theMachine type toe2-standard-2.
UnderFirewall, select theAllow HTTP traffic checkbox. This appliesthehttp-server networking tag to each instance created from thistemplate.
Expand theAdvanced options section to see advancedsettings.
Expand theManagement section.

In theAutomation section, enter the following startup script:

sudo apt update && sudo apt -y install git gunicorn3 python3-pipgit clone https://github.com/GoogleCloudPlatform/python-docs-samples.gitcd python-docs-samples/compute/managed-instances/demosudo pip3 install -r requirements.txtsudo gunicorn3 --bind 0.0.0.0:80 app:app --daemon

This script causes each VM to run the web application during startup.

ClickCreate.

Create a managed instance group

Create a regional instance group to begin running the web application:

In the Google Cloud console, go to theInstance groups page.
Go to Instance groups
ClickCreate instance group to create a new instance group.
SelectNew managed instance group (stateless)."
ForName, enterautoscaling-web-app-group.
ForInstance template, selectautoscaling-web-app-template.
ForLocation, selectMultiple zones.
Pro Tip: To ensure your application is available during extreme events, like zonal outages, Compute Engine recommends that you distribute your application across multiple zones.
ForRegion, selectus-central1.
ForZones, select the following zones from the drop-down list:
- us-central1-b
- us-central1-c
- us-central1-f
Configure autoscaling for the instance group:
1. ForAutoscaling mode, selectOn: add and remove instances tothe group.
2. Set theMinimum number of instances to3.
  Pro Tip:When creating a regional managed instance group, Compute Enginerecommends that you provision enough instances so that, if all of theinstances in any one zone are unavailable, the remaining instancesstill meet the minimum number of instances that you require.However, provisioning more instances than you need might incuradditional costs. For more information, see Selecting instance group size to ensure availability.
3. Set theMaximum number of instances to6.
4. Set theInitialization period to120 seconds.
  Pro Tip: Theinitialization periodis the number of seconds after an instance is created that theautoscaler should wait before using information about the instance forscaling decisions. When a VM is initializing, the CPU usage is notreliable for autoscaling. To prevent the autoscaler from scaling basedon inaccurate data, make sure the initialization period is longer thanthe time than the time it takes for the CPU utilization of your VM toinitially stabilize. For more information, see Initialization periodandMonitoring autoscaling chartsand logs.
5. UnderAutoscaling Metrics, selectCPU utilization as themetric type.To learn more about autoscaling metrics, seeAutoscaling policy.
6. Set theTarget CPU utilization to60.
7. ClickDone.
UnderAutohealing, selectNo health check from theHealth checkdrop-down list.
ClickCreate. This redirects you to theInstance groups page.
Note: Wait a few minutes until all of the instances in thegroup are running and ready to display the web application.
To verify that your instances are running:
1. On theInstance groups page in the Google Cloud console, clickautoscaling-web-app-group to see the instances in that group.
2. UnderExternal IP, click on an IP address to connect that instance.A new browser tab opens displaying the demo web application:
  Note: If you are unable toconnect to the web application after waiting a few minutes, verifythe instance status and network settings:
  - Verify that the instance group is ready. If the application fails to load with an ERR_CONNECTION_REFUSED status, wait a few minutes for the startup script to finish running.
  - Verify that the group's instance template hasAllow HTTP traffic enabled. Then, verify thatallow-web-app-http firewall rule was created correctly.
  When you are done, close the browser tab for the demo web application.

Observing autoscaling

For more information about autoscaling behaviors, see Understanding autoscaling decisions.

Monitor autoscaling

The instance group you created uses anAutoscaling policy based onCPU usage. This means that autoscaler grows or shrinks the group as neededto maintain the target CPU utilization of60%.

To monitor the size and CPU utilization of your instance group, usetheautoscaling chartsin the Google Cloud console:

On theInstance groups page for theautoscaling-web-app-groupinstance group, click theMonitoring tab.
You can monitor autoscaling from theGroup size chart.The graph displaysInstances, which represents thenumber of VM instances in the group over time.
Optional: To monitor autoscaled capacity versus utilization, see theAutoscaler utilization (CPU) chart. The graph displaysUtilization, which is the total CPU utilization of VM instances in thegroup, andCapacity, which is the cumulative target CPU utilization ofthe group (target CPU utilization multiplied by the number of VM instances).
Autoscaling attempts to makeCapacity matchUtilization by changingthe number ofInstances, when possible.

Keep this window open.

Simulate scale out

Scale out occurs when the average CPU utilization of the instance group issignificantly higher than the target value. During scale out, autoscalergradually increases the size of the instance group until CPU utilizationdecreases to the target CPU utilization value or until the instance group sizeequals theMaximum number of instances, which was set to6.

To trigger scale out, increase the CPU utilization for your instances:

In the Google Cloud console, openCloud Shell.
Open Cloud Shell
Cloud Shell opens on the bottom of the Google Cloud console. It can take a few seconds for the session to initialize.
Pro Tip:
You can open the Cloud Shell from any Google Cloud console page using theActivate Cloud Shell button, which is in the top right corner of every Google Cloud console page.
Create a local bash variable for the project ID:
```
export PROJECT_ID=[PROJECT_ID]
```
wherePROJECT_ID is the project ID for your current project, whichis displayed on each new line in the Cloud Shell:
```
user@cloudshell:~ ([PROJECT_ID])$
```

Run the following bash script. This script causes the demo web applicationinstances to have an increased load, which increases CPU utilization.After a few minutes, the CPU utilization will surpass the target value,prompting the autoscaling to increase the instance group size.

export MACHINES=$(gcloud --project=$PROJECT_ID compute instances list --format="csv(name,networkInterfaces[0].accessConfigs[0].natIP)" | grep "autoscaling-web-app-group")for i in $MACHINES;do  NAME=$(echo "$i" | cut -f1 -d,)  IP=$(echo "$i" | cut -f2 -d,)  echo "Simulating high load for instance $NAME"  curl -q -s "http://$IP/startLoad" >/dev/null --retry 2done

Open theMonitoring tab in the Google Cloud console.
After a few minutes, theMonitoring tab displays that the CPUUtilization increased, which triggers autoscaling to increaseCapacity by increasing the number ofInstances.
Note: You might need to refresh the page to see the most recent chart.
You might also notice that 6 instances are now listed under theOverview tab.

Keep both windows open.

Simulate scale in

Scale in occurs when the average CPU utilization of the instance group issignificantly lower than the target value. During scale in, autoscalergradually decreases the size of the instance group until CPU utilizationincreases to the target CPU utilization or until the instancegroup size equals theMinimum number of instances, which was set to3.

Note: To prevent preemptive scale in, the autoscaler calculates the group'srecommended target size based on peak load over the stabilization period.The stabilization period might appear as a delay in scaling in, butit is actually a built-in feature of autoscaling. The delay ensures that thesmaller group size will be enough to support peak load observed during thestabilization period. For more information about the stabilization period, see Delays in scaling in.

To trigger scale in, decrease the CPU utilization for your instances:

Run the following bash script. This script causes the demo web applicationinstances to have a decreased load, which decreases CPU utilization.After a few minutes, the CPU utilization will fall below the target value,prompting the autoscaler to decrease the instance group size.

export MACHINES=$(gcloud --project=$PROJECT_ID compute instances list --format="csv(name,networkInterfaces[0].accessConfigs[0].natIP)" | grep "autoscaling-web-app-group")for i in $MACHINES;do  NAME=$(echo "$i" | cut -f1 -d,)  IP=$(echo "$i" | cut -f2 -d,)  echo "Simulating low load for instance $NAME"  curl -q -s "http://$IP/stopLoad" >/dev/null --retry 2done

Open theMonitoring tab in the Google Cloud console.
After a few minutes, theMonitoring tab displays that the CPUUtilization decreased. After thestabilization period,which verifies that the load is consistently less,autoscaling decreasesCapacity by decreasing the number ofInstances.
Note: You might need to refresh the page to see the most recent chart.
You might also notice that only 3 instances are listed under theOverview tab.

Close both windows when you have finished.

Clean up

After you finish the tutorial, you can clean up the resources that you created so that they stop using quota and incurring charges. The following sections describe how to delete or turn off these resources.

If you created a separate project for this tutorial, delete the entire project.Otherwise, if the project has resources that you want to keep, only delete theresources created in this tutorial.

Deleting the project

Caution: Deleting a project has the following effects:

Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

In the Google Cloud console, go to theManage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then clickDelete.
In the dialog, type the project ID, and then clickShut down to delete the project.

Deleting specific resources

Deleting the instance group

In the Google Cloud console, go to theInstance groups page.
Go to Instance groups
Select the checkbox for yourautoscaling-web-app-group instance group.
To delete the instance group, clickDelete.

Deleting the instance template

Note: You must finish deleting the instance group before deleting the instancetemplate. You cannot delete an instance template if a managed instance groupis using it.

In the Google Cloud console, go to theInstance templates page.
Go to Instance templates
Click the checkbox next to theautoscaling-web-app-template.
ClickDeleteat the top of the page. In the new window, clickDelete toconfirm the deletion.

Deleting the firewall rule

In the Google Cloud console, go to theFirewall rules page.
Go to Firewall rules
Click the checkbox next to the firewall rule nameddefault-allow-http.
ClickDelete. In the newwindow, clickDelete to confirm the deletion.

What's next

Try another tutorial:
- Using autohealing for highly available applications.
- Using load balancing for highly available applications.
Learn more aboutManaged Instance Groups.
Learn more aboutAutoscaling.
Learn more aboutDesigning Robust Systems.
Learn more aboutBuilding Scalable and Resilient Web Applications on Google Cloud.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Using autoscaling for highly scalable applications Stay organized with collections Save and categorize content based on your preferences.