Using autohealing for highly available apps

This interactive tutorial shows how to useautohealing to build highlyavailable apps on Compute Engine.

Highly available apps are designed to serve clients with minimal latencyand downtime. Availability is compromised when an app crashes orfreezes. Clients of a compromised app can experience high latency ordowntime.

Autohealing lets you automatically restart apps that are compromised. Itpromptly detects failed virtual machine (VM) instances and recreates themautomatically, so clients can be served again. With autohealing, you no longerneed to manually bring an app back to service after a failure.

Objectives

  • Configure a health check and an autohealing policy.
  • Set up a demo web service on a managed instance group (MIG).
  • Simulate health check failures and witness the autohealing recovery process.

Costs

This tutorial uses billable components of Google Cloud including:

  • Compute Engine

Before you begin

    Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

    In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

    Verify that billing is enabled for your Google Cloud project.

    Enable the Compute Engine API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

    In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

    Verify that billing is enabled for your Google Cloud project.

    Enable the Compute Engine API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

If you prefer to work from the command line, install the Google Cloud CLI.

App architecture

The app includes the following Compute Engine components:

System architecture for a health check and an instance group.

How the health check probes the demo webservice

A health check sends probe requests to a VM using a specified protocol,such as HTTP(S), SSL, or TCP. For more information, seehow health checks work andhealth check categories, protocols, and ports.

The health check in this tutorial is an HTTP health check that probes the HTTPpath/health on port 80. For an HTTP health check, the probe request passesonly if the path returns anHTTP 200 (OK) response. For this tutorial, thedemo web server defines the path/health to return anHTTP 200 (OK) responsewhen healthy or anHTTP 500 (Internal Server Error) response when unhealthy.For more information, seesuccess criteria for HTTP, HTTPS, and HTTP/2.

Create the health check

To set up autohealing, create a custom health check and configure the networkfirewall to allow health check probes.

In this tutorial, you create a regional health check. For autohealing, you canuse either aregional or aglobal health check. Regionalhealth checks reduce cross-region dependencies andhelp to achieve data residency. Global health checks are convenient if you wantto use the same health check for MIGs in multiple regions.

Pro Tip: Use separate health checks for load balancing and forautohealing. Health checks for load balancing detect unresponsive VMs and direct traffic away from them. Health checks for autohealing detect and recreate failed VMs, so they should be less aggressive than load balancing health checks. Using the same health check for these services would remove the distinction between unresponsive VMs and failed VMs, causing unnecessary latency and unavailability for your users.
For more information, seeWhat makes a good autohealing health check.

Console

  1. Create a health check.

    1. In the Google Cloud console, go to theCreate health check page.

      Go to Create health check

    2. In theName field, enterautohealer-check.

    3. Set theScope toRegional.

    4. In theRegion drop-down, selecteurope-west1.

    5. ForProtocol selectHTTP.

    6. SetRequest path to/health. This indicates what HTTP path thehealth check uses. For this tutorial, the demo web server defines thepath/health to return either anHTTP 200 (OK) response whenhealthy or anHTTP 500 (Internal Server Error) response whenunhealthy.

    7. Set theHealth criteria:

      1. SetCheck interval to10. This defines the amount of timefrom the start of one probe to the start of the next one.
      2. SetTimeout to5. This defines the amount of time thatGoogle Cloud waits for a response to a probe. This valuemust be less than or equal to the check interval.
      3. SetHealthy threshold to2. This defines the number ofsequential probes that must succeed for the VMto be considered healthy.
      4. SetUnhealthy threshold to3. This defines the number ofsequential probes that must fail for the VM to beconsidered unhealthy.
    8. Leave default values for the other options.

    9. ClickCreate at the bottom.

  2. Create a firewall rule to allow health check probes to make HTTPrequests.

    1. In the Google Cloud console, go to theCreate firewall rule page.

      Go to Create firewall rule

    2. ForName, enterdefault-allow-http-health-check.

    3. ForNetwork, selectdefault.

    4. ForTargets, selectAll instances in the network.

    5. ForSource filter, selectIPv4 ranges.

    6. ForSource IPv4 ranges, enter130.211.0.0/22, 35.191.0.0/16.

      Note: Health check probes come from addresses in the ranges130.211.0.0/22 and35.191.0.0/16. For this tutorial, your healthcheck uses theHTTP protocol, so make sure the firewall rule allowsconnections to port 80. For more information, seesetting up health checking and autohealing for managed instance groups.
    7. InProtocols and ports, selectTCP and enter80.

    8. Leave default values for the other options.

    9. ClickCreate.

gcloud

  1. Create a health check using thehealth-checks create http command.

    gcloud compute health-checks create http autohealer-check \    --region europe-west1 \    --check-interval 10 \    --timeout 5 \    --healthy-threshold 2 \    --unhealthy-threshold 3 \    --request-path "/health"
    • check-interval defines the amount of time from the start of oneprobe to the start of the next one.
    • timeout defines the amount of time that Google Cloudwaits for a response to a probe. This value must be less than orequal to the check interval.
    • healthy-threshold defines the number of sequential probes that mustsucceed for the VM to be considered healthy.
    • unhealthy-threshold defines the number of sequential probes thatmust fail for the VM to be considered unhealthy.
    • request-path indicates what HTTP path the health check uses. Forthis tutorial, the demo web server defines the path/health toreturn either anHTTP 200 (OK) response when healthy or anHTTP 500 (Internal Server Error) response when unhealthy.
  2. Create a firewall rule to allow health check probes to make HTTPrequests.

    gcloud compute firewall-rules create default-allow-http-health-check \    --network default \    --allow tcp:80 \    --source-ranges 130.211.0.0/22,35.191.0.0/16
    Note: Health check probes come from addresses in the ranges130.211.0.0/22 and35.191.0.0/16. For this tutorial, your healthcheck uses theHTTP protocol, so make sure the firewall rule allowsconnections to port 80. For more information, seesetting up health checking and autohealing for managed instance groups.

What makes a good autohealing health check

Health checks used for autohealing should be conservative so they don'tpreemptively delete and recreate your instances. When an autohealer health checkis too aggressive, the autohealer might mistake busy instances for failedinstances and unnecessarily restart them, reducing availability.

  • unhealthy-threshold. Should be more than1. Ideally, set this value to3 or more. This protects against rare failures like a network packet loss.
  • healthy-threshold. A value of2 is sufficient for most apps.
  • timeout. Set this time value to a generous amount (five times or more than the expected response time). The timeout must be less than or equal to thecheck-interval. This protects against unexpected delays like busy instances or a slow network connection.
  • check-interval. This value should be between 1 second and two times the timeout (not too long nor too short). When a value is too long, a failed instance is not caught soon enough. When a value is too short, the instances and the network can become measurably busy, given the high number of health check probes being sent every second.

Set up the web service

This tutorial uses a web app that is stored on GitHub. If you wouldlike learn more about how the app was implemented, see theGoogleCloudPlatform/python-docs-samples GitHub repository.

To set up the demo web service, create an instance template that launches thedemo web server on startup. Then, use this instance template to deploy a managedinstance group and enable autohealing.

Console

  1. Create an instance template. Include a startup script that starts up thedemo web server.

    1. In the Google Cloud console, go to theCreate instance template page.

      Go to Create instance template

    2. Set theName towebserver-template.

    3. In theLocation section, from theRegion drop-down, selecteurope-west1.

    4. In theMachine configuration section, for theMachine typedrop-down, selecte2-medium.

    5. In theFirewall section, select theAllow HTTP trafficcheckbox.

    6. Expand theAdvanced options sectionto reveal advanced settings. Several sub-sections appear.

    7. In theManagement section, findAutomation and enter thefollowingStartup script:

      apt-get updateapt-get -y install git python3-pip python3-venvgit clone https://github.com/GoogleCloudPlatform/python-docs-samples.gitpython3 -m venv venv./venv/bin/pip3 install -Ur ./python-docs-samples/compute/managed-instances/demo/requirements.txt./venv/bin/pip3 install gunicorn./venv/bin/gunicorn --bind 0.0.0.0:80 app:app --daemon --chdir ./python-docs-samples/compute/managed-instances/demo

    8. Leave default values for the other options.

    9. ClickCreate.

  2. Deploy the web server as a managed instance group.

    1. In the Google Cloud console, go to theCreate instance group page.

      Go to Create instance group

    2. Set theName towebserver-group.

    3. ForInstance template, selectwebserver-template.

    4. ForRegion, selecteurope-west1.

    5. ForZone, selecteurope-west1-b.

    6. In theAutoscaling section, forAutoscaling mode, selectOff: do not autoscale.

    7. Scroll back to theNumber of instances field and set it to3.

    8. In theAutohealing section, do the following:

      1. In theHealth check drop-down, selectautohealer-check.
      2. SetInitial delay to300.

        Note: Theinitial delay isthe number of seconds that a new VM takes to initialize and run its startup script. During a VM'sinitial delay period, the MIG ignores unsuccessful health checks because the VM might be in thestartup process. This prevents the MIG from prematurely recreating a VM. If the health checkreceives a healthy response during the initial delay, it indicates that the startup process iscomplete and the VM is ready. The initial delay timer starts when the VM'scurrentAction field changes toVERIFYING. The timer stops when the settime completes or when a health check succeeds. The value of initial delay must be between0 and3600 seconds. In the console, the default value is300 seconds.
    9. Leave default values for the other options.

    10. ClickCreate.

  3. Create a firewall rule that allows HTTP requests to the web servers.

    1. In the Google Cloud console, go to theCreate firewall rule page.

      Go to Create firewall rule

    2. ForName, enterdefault-allow-http.

    3. ForNetwork, selectdefault.

    4. ForTargets, selectSpecified target tags.

    5. ForTarget Tags, enterhttp-server.

    6. ForSource filter, selectIPv4 ranges.

    7. ForSource IPv4 ranges, enter0.0.0.0/0 to allow access for allIP addresses.

    8. InProtocols and ports, selectTCP and enter80.

    9. Leave default values for the other options.

    10. ClickCreate.

gcloud

  1. Create an instance template. Include a startup script that starts thedemo web server.

    gcloud compute instance-templates create webserver-template \    --instance-template-region europe-west1 \    --machine-type e2-medium \    --tags http-server \    --metadata startup-script='  apt-get update  apt-get -y install git python3-pip python3-venv  git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git  python3 -m venv venv  ./venv/bin/pip3 install -Ur ./python-docs-samples/compute/managed-instances/demo/requirements.txt  ./venv/bin/pip3 install gunicorn  ./venv/bin/gunicorn --bind 0.0.0.0:80 app:app --daemon --chdir ./python-docs-samples/compute/managed-instances/demo'
  2. Create a managed instance group.

    gcloud compute instance-groups managed create webserver-group \    --zone europe-west1-b \    --template projects/PROJECT_ID/regions/europe-west1/instanceTemplates/webserver-template \    --size 3 \    --health-check projects/PROJECT_ID/regions/europe-west1/healthChecks/autohealer-check \    --initial-delay 300
    Note: Theinitial delay is the number of seconds that a new VM takes to initialize and run its startup script. During a VM'sinitial delay period, the MIG ignores unsuccessful health checks because the VM might be in thestartup process. This prevents the MIG from prematurely recreating a VM. If the health checkreceives a healthy response during the initial delay, it indicates that the startup process iscomplete and the VM is ready. The initial delay timer starts when the VM'scurrentAction field changes toVERIFYING. The timer stops when the settime completes or when a health check succeeds. The value of initial delay must be between0 and3600 seconds. The default value is0.
  3. Create a firewall rule that allows HTTP requests to the web servers.

    gcloud compute firewall-rules create default-allow-http \    --network default \    --allow tcp:80 \    --target-tags http-server

Wait a few minutes for the managed instance group to create and verify its VMs.

Note: If your health checks fail, your VMs may be taking longer toinitialize than expected. Consider adjusting your initial delay:
  1. In the Google Cloud console, go to theInstance groups page.
  2. Clickwebserver-group.
  3. ClickEdit.
  4. UnderInitialization period, increase the number of seconds. You may input a number from 0 to 3600.
  5. ClickSave and wait for the managed instance group to update.

Simulate health check failures

To simulate health check failures, the demo web server provides ways for you toforce a health check failure.

Console

  1. Navigate to a web server VM.

    1. In the Google Cloud console, go to theVM instances page.

      Go to VM instances

    2. For anywebserver-group VM, under theExternal IP column,click the IP address. A new tab opens in your webbrowser. If the request times out or the web page is not available,wait a minute to let the server finish setting up and try again.

    The demo web server displays a page similar to the following:

    Demo web page showing green status buttons and blue action buttons.

  2. On the demo web page, clickMake unhealthy.

    This causes the web server to fail the health check. Specifically,the web server makes the/health path return anHTTP 500 (InternalServer Error). You can verify this yourself by quickly clicking theCheck health button (this stops working after the autohealer hasstarted rebooting the VM).

  3. Wait for the autohealer to take action.

    1. In the Google Cloud console, go to theVM instances page.

      Go to VM instances

    2. Wait for the status of the web server VM to change. The greencheckmark next to the VM name should change to a grey square,indicating that the autohealer has started rebooting the unhealthyVM.

    3. ClickRefresh at the top of the page periodically to get the mostrecent status.

    4. The autohealing process is finished when the grey square changes backto a green checkmark, indicating the VM is healthy again.

gcloud

  1. Monitor the status of the managed instance group. (When you have finished,stop by pressingCtrl+C.)

    while : ; do  gcloud compute instance-groups managed list-instances webserver-group \  --zone europe-west1-b  sleep 5  # Wait for 5 secondsdone
      NAME: webserver-group-0zx6  ZONE: europe-west1-b  STATUS: RUNNING  HEALTH_STATE: HEALTHY  ACTION: NONE  INSTANCE_TEMPLATE: webserver-template  VERSION_NAME:  LAST_ERROR:  NAME: webserver-group-4qbx  ZONE: europe-west1-b  STATUS: RUNNING  HEALTH_STATE: HEALTHY  ACTION: NONE  INSTANCE_TEMPLATE: webserver-template  VERSION_NAME:  LAST_ERROR:  NAME: webserver-group-m5v5  ZONE: europe-west1-b  STATUS: RUNNING  HEALTH_STATE: HEALTHY  ACTION: NONE  INSTANCE_TEMPLATE: webserver-template  VERSION_NAME:  LAST_ERROR:

    All VMs in the group must showSTATUS: RUNNING andACTION: NONE.If not, wait a few minutes to let the VMs finish setting up andtry again.

  2. Open a new Cloud Shell session with the Google Cloud CLI installed.

    Note: If you're using Cloud Shell, you canopen multiple sessions.
  3. Get the address of a web server VM.

    gcloud compute instances list --filter webserver-group

    Under theEXTERNAL_IP column, copy the IP address of any web serverVM and save it as a local bash variable.

    export IP_ADDRESS=EXTERNAL_IP_ADDRESS
  4. Verify the web server has finished setting up. The server returns anHTTP 200 OK response.

    curl --head $IP_ADDRESS/health
    HTTP/1.1 200 OKServer: gunicorn...

    If you get aConnection refused error, wait a minute to let the serverfinish setting up and try again.

  5. Make the web server unhealthy.

    curl $IP_ADDRESS/makeUnhealthy > /dev/null

    This causes the web server to fail the health check. Specifically,the web server makes the/health path return anHTTP 500 INTERNALSERVER ERROR. You can verify this yourself by quickly making a requestto/health (this stops working after the autohealer hasstarted rebooting the VM).

    curl --head $IP_ADDRESS/health
    HTTP/1.1 500 INTERNAL SERVER ERRORServer: gunicorn...
  6. Return to your first shell session to monitor the managed instance groupand wait for the autohealer to take action.

    1. When the autohealing process has started, theSTATUS andACTIONcolumns update, indicating that the autohealer has started rebootingthe unhealthy VM.

        NAME: webserver-group-0zx6  ZONE: europe-west1-b  STATUS: STOPPING  HEALTH_STATE: UNHEALTHY  ACTION: RECREATING  INSTANCE_TEMPLATE: webserver-template  VERSION_NAME:  LAST_ERROR:  ...
    2. The autohealing process has finished when the VM againreports aSTATUS ofRUNNING and anACTION ofNONE, indicatingthe VM is successfully restarted.

        NAME: webserver-group-0zx6  ZONE: europe-west1-b  STATUS: RUNNING  HEALTH_STATE: HEALTHY  ACTION: NONE  INSTANCE_TEMPLATE: webserver-template  VERSION_NAME:  LAST_ERROR:  ...
    3. When you have finished monitoring the managed instance group, stop bypressingCtrl+C.

    Note: For more information about possible VM statuses and actions,seeInstance life cycle andCurrent actions on instances.

Feel free to repeat this exercise. Here are some ideas:

  • What happens if you make all VMs unhealthy at one time? For more information about autohealing behavior during concurrent failures, seeautohealing behavior.

  • Can you update the health check configuration to heal VMs as fast as possible? (In practice, you should set the health check parameters to use conservative values as explained in this tutorial. Otherwise, you may risk VMs being mistakenly deleted and restarted when there is no real problem.)

  • The managed instance group has aninitial delay configuration setting. Can you determine the minimum delay needed for this demo web server? (In practice, you should set the delay to somewhat longer (10%–20%) than it takes for a VM to boot and start serving app requests. Otherwise, you risk the VM getting stuck in an autohealing boot loop.)

View autohealer history (optional)

To view a history of autohealer operations use the followinggcloud command:

gcloud compute operations list --filter='operationType~compute.instances.repair.*'

For more information, seeviewing historical autohealing operations

Clean up

After you finish the tutorial, you can clean up the resources that you created so that they stop using quota and incurring charges. The following sections describe how to delete or turn off these resources.

If you created a separate project for this tutorial, delete the entire project.Otherwise, if the project has resources that you want to keep, only delete thespecific resources created in this tutorial.

Deleting the project

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

Deleting specific resources

If you can't delete the project used for this tutorial, delete the tutorialresources individually.

Deleting the instance group

console

  1. In the Google Cloud console, go to theInstance groups page.

    Go to Instance groups

  2. Select the checkbox for yourwebserver-group instance group.
  3. To delete the instance group, clickDelete.

gcloud

gcloud compute instance-groups managed delete webserver-group --zone europe-west1-b -q

Deleting the instance template

Note: You must delete the managed instance group before deleting the instancetemplate. You can't delete an instance template if a managed instance group uses it.

console

  1. In the Google Cloud console, go to theInstance templates page.

    Go to Instance templates

  2. Click the checkbox next to the instance template.

  3. ClickDelete atthe top of the page. In the new window, clickDelete to confirm thedeletion.

gcloud

gcloud compute instance-templates delete webserver-template -q \    --region=europe-west1

Deleting the health check

Note: You must delete the managed instance group before deleting the healthcheck. You can't delete a health check if other resources use it.

console

  1. In the Google Cloud console, go to theHealth checks page.

    Go to Health checks

  2. Click the checkbox next to the health check.

  3. ClickDelete atthe top of the page. In the new window, clickDelete to confirm thedeletion.

gcloud

gcloud compute health-checks delete autohealer-check -q \    --region=europe-west1

Deleting the firewall rules

console

  1. In the Google Cloud console, go to theFirewall rules page.

    Go to Firewall rules

  2. Click the checkboxes next to the firewall rules nameddefault-allow-http anddefault-allow-http-health-check.

  3. ClickDelete atthe top of the page. In the new window, clickDelete to confirm thedeletion.

gcloud

gcloud compute firewall-rules delete default-allow-http default-allow-http-health-check -q

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.