Set up an application-based health check and autohealing

This document describes how to set up an application-based health check toautoheal VMs in a managed instance group (MIG). It also describes how to do thefollowing: use a health check without autohealing, remove a health check,view autohealing policy, and check the health state of each VM.

You can configure an application-based health check to verify that yourapplication on a VM is responding as expected. If the health check that youconfigure detects that your application on a VM isn't responding, then the MIGmarks that VM asunhealthy and repairs it by default. Repairing a VM based onan application-based health check is calledautohealing.

You can also turn off autohealing in a MIG so that you canuse a health check without triggering the repairs for unhealthy VMs.

To know more about repairs in a MIG, seeAbout repairing VMs for high availability.

Before you begin

Pricing

When you set up an application-based health check, whenever a VM's healthstate changes, by default Compute Engine writes a log entry inCloud Logging.Cloud Logging provides afree allotment per monthafter which logging is priced by data volume. To avoid costs, you candisablethe health state change logs.

Set up an application-based health check and autohealing

To set up an application-based health check and autohealing in a MIG, you mustdo the following:

  1. Create a health check, if you haven't already.
  2. Configure an autohealing policy in the MIGto apply the health check.
Tip: Use separate health checks for load balancing and for autohealing.Health checks for load balancing detect unresponsive VMs and direct traffic away from them. Health checks forautohealing detect unhealthy VMs and proactively recreate those VMs, so thishealth check should be more conservative than a load balancing health check.For more information, seeWhat makes a good autohealing health check.

Create a health check

You can apply a single health check to a maximum of 50 MIGs. If you have morethan 50 groups, create multiple health checks.

The following example shows how to create a health check for autohealing. Youcan create either aregionalor aglobal health check forautohealing in MIGs. In this example, you create a global health check thatlooks for a web serverresponse on port80. To enable the health check probes to reach the webserver, configure a firewall rule.

Permissions required for this task

To perform this task, you must have the followingpermissions:

  • compute.healthChecks.create on the project if creating a health check.
  • compute.healthChecks.use on the health check to use.
  • compute.firewalls.create on the project if creating a firewall.
  • compute.networks.updatePolicy on the network if creating a firewall.

The permissions are available in the following preconfiguredroles.

  • compute.networkAdmin for creating health checks.
  • compute.securityAdmin for configuring firewall rules to let health checking connect.

Console

  1. Create a health check for autohealing that is more conservative than aload balancing health check.

    For example, create a health check that looks for a response on port80and that can tolerate some failure before it marks VMs asUNHEALTHY and causes them to be recreated. In this example, aVM is marked as healthy if the health check returns successfully once.The VM is marked as unhealthy if the health check returns unsuccessfully3 consecutive times.

    1. In the Google Cloud console, go to theCreate a health check page.

      Go to Create a health check

    2. Give the health check a name, such asexample-check.

    3. Select aScope. You can select eitherRegional orGlobal.For this example, selectGlobal.

    4. ForProtocol, make sure thatHTTP is selected.

    5. ForPort, enter80.

    6. In theHealth criteria section, provide the following values:

      1. ForCheck interval, enter5.
      2. ForTimeout, enter5.
      3. Set aHealthy threshold to determine how many consecutivesuccessful health checks must be returned before an unhealthyVM is marked as healthy. Enter1 for this example.
      4. Set anUnhealthy threshold to determine how many consecutiveunsuccessful health checks must be returned before a healthy VM ismarked as unhealthy. Enter3 for this example.
    7. ClickCreate to create the health check.

  2. Create a firewall rule to allow health check probes to connect to yourapp.

    Caution: If health check probes are blocked by firewall rules, theymark your VMs asUNHEALTHY because they cannot connect to theapp. This can prompt automatic recreation of VMs that maybe healthy.

    Health checkprobes come from addresses in the ranges130.211.0.0/22 and35.191.0.0/16,so make sure your network firewall rules allow the health check toconnect. For this example, the MIG uses thedefault network and its VMs are listening on port80. If port80 is notalready open on the default network, create a firewall rule.

    1. In the Google Cloud console, go to theFirewall policies page.

      Go to Firewall policies

    2. ClickCreate firewall rule.

    3. Enter aName for the firewall rule. For example,allow-health-check.

    4. ForNetwork, select thedefault network.

    5. ForTargets, selectAll instances in the network.

    6. ForSource filter, selectIPv4 ranges.

    7. ForSource IPv4 ranges, enter130.211.0.0/22 and35.191.0.0/16.

    8. InProtocols and ports, selectSpecified protocols andports and do the following:

      1. SelectTCP.
      2. In thePorts field, enter80.
    9. ClickCreate.

gcloud

  1. Create a health check for autohealing that is more conservative than aload balancing health check.

    For example, create a health check that looks for a response onport80 and that can tolerate some failure before it marks VMs asUNHEALTHY and causes them to be recreated. In this example,VM is marked as healthy if it returns successfully once. The VM ismarked as unhealthy if it returns unsuccessfully3 consecutive times.The following command creates a global health check.

    gcloud compute health-checks create http example-check --port 80 \   --check-interval 30s \   --healthy-threshold 1 \   --timeout 10s \   --unhealthy-threshold 3 \   --global
    Note: Use newerhealth checks,which support HTTP, HTTPS, TCP, and SSL (TLS) protocols. LegacyCompute EngineHTTP /HTTPS health checks continue to work.
  2. Create a firewall rule to allow health check probes to connect to yourapp.

    Caution: If health check probes are blocked by firewall rules, theymark your VMs asUNHEALTHY because they cannot connect to theapp. This can prompt automatic recreation of VMs that mightbe healthy.

    Health checkprobes come from addresses in the ranges130.211.0.0/22and35.191.0.0/16, so make sure your firewall rules allow the healthcheck to connect. For this example, the MIGuses thedefault network, and its VMs listen on port80. If port80 isn'talready open on the default network, create a firewall rule.

    gcloud compute firewall-rules create allow-health-check \    --allow tcp:80 \    --source-ranges 130.211.0.0/22,35.191.0.0/16 \    --network default

Terraform

  1. Create a health check using thegoogle_compute_http_health_check resource.

    For example, create a health check that looks for a response onport80 and that can tolerate some failure before it marks VMsasUNHEALTHY and causes them to be recreated. In this example, a VMis marked as healthy if it returns successfully once. The VM ismarked as unhealthy if it returns unsuccessfully3 consecutive times.The following request creates a global health check.

    resource "google_compute_http_health_check" "default" {  name                = "example-check"  timeout_sec         = 10  check_interval_sec  = 30  healthy_threshold   = 1  unhealthy_threshold = 3  port                = 80}
  2. Create a firewall using thegoogle_compute_firewall resource.

    Caution: If health check probes are blocked by firewall rules, theymark your VMs asUNHEALTHY because they cannot connect to theapp. This can prompt automatic recreation of VMs that mightbe healthy.

    Health checkprobescome from addresses in the ranges130.211.0.0/22 and35.191.0.0/16,so make sure your firewall rules allow the health check to connect. Forthis example, the MIG uses thedefault network and its VMs are listening on port80. If port80 is notalready open on the default network, create a firewall rule.

    resource "google_compute_firewall" "default" {  name          = "allow-health-check"  network       = "default"  source_ranges = ["130.211.0.0/22", "35.191.0.0/16"]  allow {    protocol = "tcp"    ports    = [80]  }}

To learn how to apply or remove a Terraform configuration, seeBasic Terraform commands.

REST

  1. Create ahealth check for autohealing that is more conservative than a load balancing healthcheck.

    For example, create a health check that looks for a response onport80 and that can tolerate some failure before it marks VMsasUNHEALTHY and causes them to be recreated. In this example, a VMis marked as healthy if it returns successfully once. The VM ismarked as unhealthy if it returns unsuccessfully3 consecutive times.The following request creates a global health check.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/global/healthChecks{ "name": "example-check", "type": "http", "port": 80, "checkIntervalSec": 30, "healthyThreshold": 1, "timeoutSec": 10, "unhealthyThreshold": 3}
    Note: Use newerhealth checks,which support HTTP, HTTPS, TCP, and SSL (TLS) protocols. LegacyCompute EngineHTTP /HTTPS health checks continue to work.
  2. Create a firewall rule to allow health check probes to connect to yourapp.

    Caution: If health check probes are blocked by firewall rules, theymark your VMs asUNHEALTHY because they cannot connect to theapp. This can prompt automatic recreation of VMs that mightbe healthy.

    Health checkprobescome from addresses in the ranges130.211.0.0/22 and35.191.0.0/16,so make sure your firewall rules allow the health check to connect. Forthis example, the MIG uses thedefault network and its VMs are listening on port80. If port80 is notalready open on the default network, create a firewall rule.

    POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/global/firewalls{ "name": "allow-health-check", "network": "https://www.googleapis.com/compute/v1/projects/PROJECT_ID/global/networks/default", "sourceRanges": [  "130.211.0.0/22",  "35.191.0.0/16" ], "allowed": [  {   "ports": [    "80"   ],   "IPProtocol": "tcp"  } ]}

    ReplacePROJECT_ID with yourproject ID.

Configure an autohealing policy in a MIG

In a MIG, you can set up only one autohealing policy to apply a health check.

Before configuring an autohealing policy, if you don't have a health checkalready, thencreate one. You can use either aregional or aglobal health checkfor autohealing in MIGs. A regional health check reduce cross-regiondependencies and help to achieve data residency whereas a global health check isconvenient if you want to use the same health check for MIGs in multipleregions.

If you want to prevent inadvertently triggering autohealing while setting up anew health check or want to use a health check without autohealing, then seeConfigure health check without autohealing.You can alsoturn off autohealingafter you configure a health check in the MIG.

To configure an autohealing policy, select one of the following options:

Permissions required for this task

To perform this task, you must have the followingpermissions:

  • compute.instanceGroupManagers.update on the MIG.

The permissions are available in the following preconfiguredroles.

  • compute.instanceAdmin.v1 for creating and updating autohealing policies in MIGs.

Console

  1. In the Google Cloud console, go to theInstance groups page.

    Go to Instance groups

  2. Under theName column of the list, click the name of the MIGin which you want to apply the health check.

  3. ClickEdit to modify this MIG.

  4. ClickInstance lifecycle and autohealing to expand the section.

    1. In theAutohealing section, for theHealth check, select a globalor a regional health check.
    2. For theInitial delay, use the default value or modify asneeded.

      Theinitial delay is the number of seconds that a new VM takes toinitialize and run its startup script. During a VM's initial delay period,the MIG ignores unsuccessful health checks because the VM might be in thestartup process. This prevents the MIG from prematurely recreating a VM. Ifthe health check receives a healthy response during the initial delay, itindicates that the startup process is complete and the VM is ready. Theinitial delay timer starts when the VM'scurrentAction fieldchanges toVERIFYING. The value of initial delay must bebetween 0 and 3600 seconds. In the console, the default value is 300seconds.

  5. ClickSave to apply your changes.

gcloud

To configure autohealing policy in an existing MIG, use theupdate command. For example, use the following command to configureautohealing policy in anexisting zonal MIG:

gcloud compute instance-groups managed updateMIG_NAME \    --health-checkHEALTH_CHECK_URL \    --initial-delayINITIAL_DELAY \    --zoneZONE

To configure autohealing policy when creating a MIG, use thecreate command.For example, use the following command to configure autohealing policy whencreating a zonal MIG:

gcloud compute instance-groups managed createMIG_NAME \    --sizeSIZE \    --templateINSTANCE_TEMPLATE_URL \    --health-checkHEALTH_CHECK_URL \    --initial-delayINITIAL_DELAY \    --zoneZONE

Replace the following:

  • MIG_NAME: The name of the MIG in which you want toset up autohealing.
  • SIZE: The number of VMs in the group.
  • INSTANCE_TEMPLATE_URL: the URL of the instance template that you want touse to create VMs in the MIG. The URL can contain either theIDor name of the instance template. Specify one of the following values:
    • For a regional instance template:projects/PROJECT_ID/regions/REGION/instanceTemplates/INSTANCE_TEMPLATE_ID
    • For a global instance template:INSTANCE_TEMPLATE_ID
  • HEALTH_CHECK_URL: The partial URL of the healthcheck that you want to set up for autohealing. For example:
    • Regional health check:projects/example-project/regions/us-central1/healthChecks/example-health-check.
    • Global health check:projects/example-project/global/healthChecks/example-health-check.
  • INITIAL_DELAY: The number of seconds that a new VMtakes to initialize and run its startup script. During a VM's initial delayperiod, the MIG ignores unsuccessful health checks because the VM mightbe in the startup process. This prevents the MIG from prematurelyrecreating a VM. If the health check receives a healthy response duringthe initial delay, it indicates that the startup process is complete andthe VM is ready. The initial delay timer starts when the VM'scurrentAction field changes toVERIFYING. The value of initial delaymust be between0 and3600 seconds. The default value is0.
  • ZONE: The zone where the MIG is located. For aregional MIG, use the--region flag.

Terraform

To configure an autohealing policy in a MIG, use theauto_healing_policiesblock.

The following sample configures autohealing policy in a zonal MIG. For moreinformation about the resource used in the sample, seegoogle_compute_instance_group_manager. For aregional MIG, use thegoogle_compute_region_instance_group_manager resource.

resource "google_compute_instance_group_manager" "default" {  name               = "igm-with-hc"  base_instance_name = "test"  target_size        = 3  zone               = "us-central1-f"  version {    instance_template = google_compute_instance_template.default.id    name              = "primary"  }  auto_healing_policies {    health_check      = google_compute_http_health_check.default.id    initial_delay_sec = 30  }}

To learn how to apply or remove a Terraform configuration, seeBasic Terraform commands.

REST

To configure autohealing policy in an existing MIG, use thepatch method asfollows:

For example, make the following call to set up autohealing in an existingzonal MIG:

  PATCH https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instanceGroupManagers/MIG_NAME  {    "autoHealingPolicies": [      {        "healthCheck": "HEALTH_CHECK_URL",        "initialDelaySec":INITIAL_DELAY      }    ]  }

To configure autohealing policy when creating a MIG, use theinsertmethod as follows:

For example, make the following call to configure autohealing policy whencreating a zonal MIG:

  POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instanceGroupManagers  {    "name": "MIG_NAME",    "targetSize":SIZE,    "instanceTemplate": "INSTANCE_TEMPLATE_URL",    "autoHealingPolicies": [      {        "healthCheck": "HEALTH_CHECK_URL",        "initialDelaySec":INITIAL_DELAY      }    ]  }

Replace the following:

  • PROJECT_ID: Yourproject ID.
  • MIG_NAME: The name of the MIG in which you want toset up autohealing.
  • SIZE: The number of VMs in the group.
  • INSTANCE_TEMPLATE_URL: the URL of the instance template that you want touse to create VMs in the MIG. The URL can contain either theIDor name of the instance template. Specify one of the following values:
    • For a regional instance template:projects/PROJECT_ID/regions/REGION/instanceTemplates/INSTANCE_TEMPLATE_ID
    • For a global instance template:INSTANCE_TEMPLATE_ID
  • HEALTH_CHECK_URL: The partial URL of the healthcheck that you want to set up for autohealing. For example:
    • Regional health check:projects/example-project/regions/us-central1/healthChecks/example-health-check.
    • Global health check:projects/example-project/global/healthChecks/example-health-check.
  • INITIAL_DELAY: The number of seconds that a new VMtakes to initialize and run its startup script. During a VM's initial delayperiod, the MIG ignores unsuccessful health checks because the VM mightbe in the startup process. This prevents the MIG from prematurelyrecreating a VM. If the health check receives a healthy response duringthe initial delay, it indicates that the startup process is complete andthe VM is ready. The initial delay timer starts when the VM'scurrentAction field changes toVERIFYING. The value of initial delaymust be between0 and3600 seconds. The default value is0.
  • ZONE: The zone where the MIG is located. For aregional MIG, useregions/REGION in the URL.

After the autohealing setup is complete, it cantake 10 minutes before autohealing begins monitoring VMs in the group.After the monitoring begins, Compute Engine begins to mark VMs ashealthy (or else recreates them) based on your autohealing configuration. Forexample, if you configure an initial delay of 5 minutes, a health check intervalof 1 minute, and a healthy threshold of 1 check, the timeline looks like thefollowing:

  • 10 minute delay before autohealing begins monitoring VMs in the group
  • + 5 minutes for the configured initial delay
  • + 1 minute for the check interval * healthy threshold (60s * 1)
  • = 16 minutes before the VM is either marked as healthy or is recreated

Configure a health check without autohealing

Preview — Action on failed health check

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

You can turn off autohealing in a MIG and use the configured health check formonitoring your application health oryou can implement your own repair logic. Turning off autohealing in a MIGdoesn't affect the functioning of the health check. The health check continuesto probe the application and provides the VM health states. However, the MIGwill no longer repair unhealthy VMs.

To configure a health check without autohealing, select one of the followingoptions.

Permissions required for this task

To perform this task, you must have the followingpermissions:

  • compute.instanceGroupManagers.update on the MIG.

The permissions are available in the following preconfiguredroles.

  • compute.instanceAdmin.v1 for creating and updating autohealing policies in MIGs.

Console

  1. In the Google Cloud console, go to theInstance groups page.

    Go to Instance groups

  2. Under theName column of the list, click the name of the MIGin which you want to apply the health check.

  3. ClickEdit to modify this MIG.

  4. ClickInstance lifecycle and autohealing to expand the section.

    1. In theAutohealing section, for theHealth check, select a globalor a regional health check.
    2. For theInitial delay, use the default value or modify asneeded.

      Theinitial delay is the number of seconds that a new VM takes toinitialize and run its startup script. During a VM's initial delay period,the MIG ignores unsuccessful health checks because the VM might be in thestartup process. This prevents the MIG from prematurely recreating a VM. Ifthe health check receives a healthy response during the initial delay, itindicates that the startup process is complete and the VM is ready. Theinitial delay timer starts when the VM'scurrentAction fieldchanges toVERIFYING. The value of initial delay must bebetween 0 and 3600 seconds. In the console, the default value is 300seconds.

    1. In theOn failed health check list, selectNo action.
  5. ClickSave to apply your changes.

gcloud

To configure a health check without autohealing, when you specify the healthcheck configuration you must also set the--action-on-vm-failed-health-check flag todo-nothing as follows:

  • In an existing MIG, use thebetaupdate command.

    For example, use the following command in an existing zonal MIG:

    gcloud beta compute instance-groups managed updateMIG_NAME \    --health-checkHEALTH_CHECK_URL \    --initial-delayINITIAL_DELAY \    --action-on-vm-failed-health-check do-nothing \    --zoneZONE
  • When creating a MIG, use thebetacreate command.

    For example, use the following command when creating a zonal MIG:

    gcloud beta compute instance-groups managed createMIG_NAME \    --sizeSIZE \    --templateINSTANCE_TEMPLATE_URL \    --health-checkHEALTH_CHECK_URL \    --initial-delayINITIAL_DELAY \    --action-on-vm-failed-health-check do-nothing \    --zoneZONE

Replace the following:

  • MIG_NAME: The name of the MIG in which you want toset up autohealing.
  • SIZE: The number of VMs in the group.
  • INSTANCE_TEMPLATE_URL: the URL of the instance template that you want touse to create VMs in the MIG. The URL can contain either theIDor name of the instance template. Specify one of the following values:
    • For a regional instance template:projects/PROJECT_ID/regions/REGION/instanceTemplates/INSTANCE_TEMPLATE_ID
    • For a global instance template:INSTANCE_TEMPLATE_ID
  • HEALTH_CHECK_URL: The partial URL of the healthcheck that you want to set up for autohealing. For example:
    • Regional health check:projects/example-project/regions/us-central1/healthChecks/example-health-check.
    • Global health check:projects/example-project/global/healthChecks/example-health-check.
  • INITIAL_DELAY: The number of seconds that a new VMtakes to initialize and run its startup script. During a VM's initial delayperiod, the MIG ignores unsuccessful health checks because the VM mightbe in the startup process. This prevents the MIG from prematurelyrecreating a VM. If the health check receives a healthy response duringthe initial delay, it indicates that the startup process is complete andthe VM is ready. The initial delay timer starts when the VM'scurrentAction field changes toVERIFYING. The value of initial delaymust be between0 and3600 seconds. The default value is0.
  • ZONE: The zone where the MIG is located. For aregional MIG, use the--region flag.

REST

To configure a health check without autohealing, when you specify the healthcheck configuration you must also set theonFailedHealthCheck field toDO_NOTHING as follows:

  • In an existing MIG, use the betapatch method as follows:

    For example, make the following call in an existing zonal MIG:

    PATCH https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instanceGroupManagers/MIG_NAME{  "autoHealingPolicies": [    {      "healthCheck": "HEALTH_CHECK_URL",      "initialDelaySec":INITIAL_DELAY    }  ],  "instanceLifecyclePolicy": {    "onFailedHealthCheck": "DO_NOTHING"  }}
  • When creating a MIG, use the betainsert method as follows:

    For example, make the following call when creating a zonal MIG:

    POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instanceGroupManagers{  "name": "MIG_NAME",  "targetSize":SIZE,  "instanceTemplate": "INSTANCE_TEMPLATE_URL",  "autoHealingPolicies": [    {      "healthCheck": "HEALTH_CHECK_URL",      "initialDelaySec":INITIAL_DELAY    }  ],  "instanceLifecyclePolicy": {    "onFailedHealthCheck": "DO_NOTHING"  }}

Replace the following:

  • PROJECT_ID: Yourproject ID.
  • MIG_NAME: The name of the MIG in which you want toset up autohealing.
  • SIZE: The number of VMs in the group.
  • INSTANCE_TEMPLATE_URL: the URL of the instance template that you want touse to create VMs in the MIG. The URL can contain either theIDor name of the instance template. Specify one of the following values:
    • For a regional instance template:projects/PROJECT_ID/regions/REGION/instanceTemplates/INSTANCE_TEMPLATE_ID
    • For a global instance template:INSTANCE_TEMPLATE_ID
  • HEALTH_CHECK_URL: The partial URL of the healthcheck that you want to set up for autohealing. For example:
    • Regional health check:projects/example-project/regions/us-central1/healthChecks/example-health-check.
    • Global health check:projects/example-project/global/healthChecks/example-health-check.
  • INITIAL_DELAY: The number of seconds that a new VMtakes to initialize and run its startup script. During a VM's initial delayperiod, the MIG ignores unsuccessful health checks because the VM mightbe in the startup process. This prevents the MIG from prematurelyrecreating a VM. If the health check receives a healthy response duringthe initial delay, it indicates that the startup process is complete andthe VM is ready. The initial delay timer starts when the VM'scurrentAction field changes toVERIFYING. The value of initial delaymust be between0 and3600 seconds. The default value is0.
  • ZONE: The zone where the MIG is located. For aregional MIG, useregions/REGION in the URL.

After configuring the health check, you canmonitor the VM health states to confirm that the health check isworking as expected. If you want the MIG to repair unhealthy VMs, you canturn on autohealing.

Remove a health check

You can remove a health check configured in an autohealing policy as follows:

Console

  1. In the Google Cloud console, go to theInstance groups page.

    Go to Instance groups

  2. Click the name of the MIG from which you want to remove the health check.

  3. ClickEdit to modify this MIG.

  4. ClickInstance lifecycle and autohealing to expand the section.

  5. InAutohealing section, forHealth check, selectNo health check.

  6. ClickSave to apply the changes.

gcloud

To remove the health check configuration in an autohealing policy, in theupdate commanduse the--clear-autohealing flag as follows:

gcloud compute instance-groups managed updateMIG_NAME \    --clear-autohealing

ReplaceMIG_NAME with the name of a MIG.

REST

To remove the health check configuration in an autohealing policy, set theautohealing policy to an empty value.

For example, to remove health check in a zonal MIG, make the followingrequest:

PATCH https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instanceGroupManagers/MIG_NAME{  "autoHealingPolicies": [    {}  ]}

Replace the following:

  • PROJECT_ID: Yourproject ID.
  • MIG_NAME: The name of the MIG in which you want toset up autohealing.
  • ZONE: The zone where the MIG is located. For aregional MIG, useregions/REGION.

View autohealing policy in a MIG

You can view the autohealing policy of a MIG as follows:

Console

  1. In the Google Cloud console, go to theInstance groups page.

    Go to Instance groups

  2. Click the name of the MIG of which you want to view the autohealingpolicy.

  3. Go to theDetails tab.

    TheVM instance lifecycle section displays the health check and theinitial delay configured in the autohealing policy.

gcloud

To view the autohealing policy in a MIG, use the following command:

gcloud compute instance-groups managed describeMIG_NAME \    --format="(autoHealingPolicies)"

ReplaceMIG_NAME with the name of a MIG.

The following is a sample output:

autoHealingPolicies:  healthCheck: https://www.googleapis.com/compute/v1/projects/example-project/global/healthChecks/example-health-check  initialDelaySec: 300

REST

To view the autohealing policy in a MIG, use the REST methods asfollows:

For example, make the following request to view the autohealing policy in azonal MIG:

GET https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instanceGroupManagers/MIG_NAME

In the response body, check for theautoHealingPolicies[] object.

The following is a sample response:

{  ...  "autoHealingPolicies": [    {      "healthCheck": "https://www.googleapis.com/compute/v1/projects/example-project/global/healthChecks/example-health-check",      "initialDelaySec": 300    }  ],  ...}

Replace the following:

  • PROJECT_ID: Yourproject ID.
  • MIG_NAME: The name of the MIG in which you want toset up autohealing.
  • ZONE: The zone where the MIG is located. For aregional MIG, useregions/REGION.

Check the status

After you set up an application-based health check in a MIG,you can verify that a VM is running and its application is responding using thefollowing ways:

Check whether VMs are healthy

If you have configured an application-based health check in your MIG, you canreview the health state of eachmanaged instance.

Inspect your managed instance health states to:

  • Identify unhealthy VMs that are not being repaired. A VM might notbe repaired immediately even if it has been diagnosed as unhealthy in thefollowing situations:
    • The VM is still booting, and its initial delay has not passed.
    • A significant share of unhealthy instances is being repaired.The MIG delays further autohealing to ensure that the group keepsrunning a subset of instances.
  • Detect health check configuration errors. For example, you can detectmisconfigured firewall rules or an invalid application health checkingendpoint if the instance reports a health state ofTIMEOUT.
  • Determine the initial delay value to configure by measuring the amount of timebetween when the VM transitions to aRUNNINGstatusand when the VM transitions to aHEALTHY health state. You can measure thisgap by polling thelist-instancesmethod or byobserving the timebetweeninstances.insert operation and the first healthy signal received.

Use theconsole, thegcloudcommand-line tool, orRESTto view health states.

Permissions required for this task

To perform this task, you must have the followingpermissions:

  • compute.instanceGroupManagers.get on the MIG

Console

  1. In the Google Cloud console, go to theInstance groups page.

    Go to Instance groups.

  2. Under theName column of the list, click the name of the MIG thatyou want to examine. A page opens with the instance groupproperties and a list of VMs that are included in the group.

  3. If a VM is unhealthy, you can see its health state in theHealth check status column.

gcloud

Use thelist-instancessub-command.

gcloud compute instance-groups managed list-instancesMIG_NAME    --zoneZONE

The command gives an output similar to the following. TheHEALTH_STATEfield shows each VM's health state.

NAME: igm-with-hc-fvz6ZONE: europe-west1-bSTATUS: RUNNINGHEALTH_STATE: HEALTHYACTION: NONEINSTANCE_TEMPLATE: my-templateVERSION_NAME:LAST_ERROR:NAME: igm-with-hc-gtz3ZONE: europe-west1-bSTATUS: RUNNINGHEALTH_STATE: HEALTHYACTION: NONEINSTANCE_TEMPLATE: my-templateVERSION_NAME:LAST_ERROR:

Replace the following:

  • MIG_NAME: The name of the MIG.
  • ZONE: The zone where the MIG is located. For aregional MIG, use--regionREGION.

REST

For a regional MIG, construct aPOST request to thelistManagedInstancesmethod:

POST https://compute.googleapis.com/compute/v1/projects/project-id/regions/region/instanceGroupManagers/MIG_NAME/listManagedInstances

For a zonal MIG, use the zonal MIGlistManagedInstancesmethod:

POST https://compute.googleapis.com/compute/v1/projects/project-id/zones/zone/instanceGroupManagers/MIG_NAME/listManagedInstances

The request returns a response similar to the following, whichincludes aninstanceHealth field for each managed instance.

{"managedInstances":[{"instance":"https://www.googleapis.com/compute/v1/projects/sproject-id/zones/zone/instances/igm-with-hc-fvz6","instanceStatus":"RUNNING","currentAction":"NONE","id":"6159431761228150698","version":{"instanceTemplate":"https://www.googleapis.com/compute/v1/projects/project-id/global/instanceTemplates/my-template"},"instanceHealth":[{"healthCheck":"https://www.googleapis.com/compute/v1/projects/project-id/global/healthChecks/example-check-01","detailedHealthState":"HEALTHY"}],"name":"igm-with-hc-fvz6"},{"instance":"https://www.googleapis.com/compute/v1/projects/project-id/zones/zone/instances/igm-with-hc-gtz3","instanceStatus":"RUNNING","currentAction":"NONE","id":"6622324799312181783","version":{"instanceTemplate":"https://www.googleapis.com/compute/v1/projects/project-id/global/instanceTemplates/my-template"},"instanceHealth":[{"healthCheck":"https://www.googleapis.com/compute/v1/projects/project-id/global/healthChecks/example-check-01","detailedHealthState":"HEALTHY"}],"name":"igm-with-hc-gtz3"}]}

Health states

The following VM health states are available:

  • HEALTHY: The VM is reachable, a connection to the application healthchecking endpoint can be established, and the response conforms to therequirements defined by the health check.
  • DRAINING: The VM is being drained. Existing connections to theVM have time to complete, but new connections are being refused.
  • UNHEALTHY: The VM is reachable, but does not conform to therequirements defined by the health check.
  • TIMEOUT: The VM is unreachable, a connection to the applicationhealth checking endpoint cannot be established, or the server on a VMdoes not respond within the specified timeout. For example, this may be causedby misconfiguredfirewall rulesor an overloaded server application on a VM.
  • UNKNOWN: The health checking system is not aware of the VM or itshealth is not known at the moment. It can take 10 minutes for monitoring tobegin on new VMs in a MIG.

New VMs return anUNHEALTHY state until they are verified by thehealth checking system.

Whether a VM is repaired depends on its health state:

  • If a VM has a health state ofUNHEALTHY orTIMEOUT, and ithas passed its initialization period, then theMIG immediately attempts to repair it.
  • If a VM has a health state ofUNKNOWN, then the MIG doesn'trepair it immediately. This is to prevent an unnecessary repair of a VMfor which the health checking signal is temporarily unavailable.

Autohealing attempts can be delayed if:

  • A VM remains unhealthy after multiple consecutive repairs.
  • A significant overall share of unhealthy VMs exists in the group.

We want to learn about your use cases, challenges, or feedback about VMhealth state values. You can share your feedback with our team atmig-discuss@google.com.

Check current actions on VMs

When a MIG is in the process of creating a VM instance, the MIG setsthat instance's read-onlycurrentAction field toCREATING. If an autohealingpolicy is attached to the group, after the VM is created and running, the MIGsets the instance's current action toVERIFYING and the health checkerbegins to probe the VM's application. If the application passes this initialhealth check within the time that it takes for the application to start, thenthe VM is verified and the MIG changes the VM'scurrentAction field toNONE.

To check the current actions on VMs, seeView current actions on VMs.

Check whether the MIG is stable

At the group level, Compute Engine populates a read-only field calledstatus that contains anisStable flag.

If all VMs in the group are running and healthy (that is, thecurrentAction field for each managed instance is set toNONE), then the MIG sets thestatus.isStable field totrue. Remember that the stability of a MIG dependson group configurations beyond the autohealing policy; for example, if yourgroup is autoscaled, and if it is being scaled in or out, then the MIG setsthestatus.isStable field tofalse due to the autoscaler operation.

To check the values of your MIG'sstatus.isStable field, seeCheck whether a MIG is stable.

View historical autohealing operations

You can use the gcloud CLI or the REST to view pastautohealing events.

gcloud

Use thegcloud compute operations list command with afilter to see only the autohealing repair events in your project.

gcloud compute operations list --filter='operationType~compute.instances.repair.*'

For more information about a specific repair operation, use thedescribe command. For example:

gcloud compute operations describe repair-1539070348818-577c6bd6cf650-9752b3f3-1d6945e5 --zone us-east1-b

REST

For regional MIGs, submit aGET request to theregionOperations resource and include a filter to scope the output list tocompute.instances.repair.* events.

GET https://compute.googleapis.com/compute/v1/projects/project-id/region/region/operations?filter=operationType+%3D+%22compute.instances.repair.*%22

For zonal MIGs, use thezoneOperations resource.

GET https://compute.googleapis.com/compute/v1/projects/project-id/zones/zone/operations?filter=operationType+%3D+%22compute.instances.repair.*%22

For more information about a specific repair operation, submit aGETrequest for that specific operation. For example:

GET https://compute.googleapis.com/compute/v1/projects/project-id/zones/zone/operations/repair-1539070348818-577c6bd6cf650-9752b3f3-1d6945e5

What makes a good autohealing health check

Health checks used for autohealing should be conservative so they don'tpreemptively delete and recreate your instances. When an autohealer health checkis too aggressive, the autohealer might mistake busy instances for failedinstances and unnecessarily restart them, reducing availability.

  • unhealthy-threshold. Should be more than1. Ideally, set this value to3 or more. This protects against rare failures like a network packet loss.
  • healthy-threshold. A value of2 is sufficient for most apps.
  • timeout. Set this time value to a generous amount (five times or more than the expected response time). This protects against unexpected delays like busy instances or a slow network connection.
  • check-interval. This value should be between 1 second and two times the timeout (not too long nor too short). When a value is too long, a failed instance is not caught soon enough. When a value is too short, the instances and the network can become measurably busy, given the high number of health check probes being sent every second.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.