About host events

Linux Windows

During the lifespan of a virtual machine (VM) instance or bare metal instance,the host machine that your instance runs on can experience multiple host events.A host event can include the regular maintenance of Compute Engineinfrastructure, or, in rare cases a host error. You can choose how yourVM and bare metal instances respond during or after a host event byconfiguring thehost maintenance policy.

By default, most instances are set tolive migrate duringhost events. For all machine series except Z3, you can override this behaviorand explicitly set the instances to terminate and optionally restart. Somemachine types don't support live migration, such as H4D instances, bare metalinstances, instances with attached GPUs, or Z3 instances with more than18 TiB of attached Titanium SSD. These instancesterminate during host events.For more information, seeMaintenance and restart behaviors.

Types of host events

There are two types of host events, which are described in more detail in thefollowing sections:

If your instance becomes unresponsive, then this can also trigger a restart ortermination of the instance.

Maintenance events

Amaintenance event is when Compute Engine has to perform amaintenance or repair activity that requires VMs to be moved out of thehost server. If you enable thelive migrationhost maintenance policyfor a supported instance type, then Compute Engine moves the instanceto a new host, and there is minimal disruption to your application.

Compute Engine also applies some lightweight hypervisor and networkupgrades in the background nondisruptively by retaining the instance onthe same host.

Instance behavior during a maintenance event can vary depending on thetenancy of the instance and themachine type. You can find information about the maintenance behavior for eachmachine type on the respective machine family page, as follows:

C series:
- C2 and C2D:Compute-optimized machine family
- All other C series:General-purpose machine family
E, N, and T series:General-purpose machine family
H series:Compute-optimized machine family
M and X series:Memory-optimized machine family
Z series:Storage-optimized machine family

For information about the maintenance policies for instances with attachedGPUs, seeHandle GPU host maintenance events.

For sole-tenant VMs, the approximate frequency of planned host maintenanceevents is every 4 to 6 weeks. Whether or not live migration is supported dependson thehost maintenance policyfor the sole-tenant VM.

Host errors

A host error (compute.instances.hostError) means that there was a hardware or software issue on the physical machine or the data center infrastructure hosting your compute instance that caused your instance to crash. A host error involving a total hardware failure or other hardware issues might prevent thelive migration of your instance. If your instance is set to automatically restart, which is the default setting, Compute Engine restarts your instance, typically within three minutes from the time the error was detected. Depending on the issue, the restart might take up to 5.5 minutes.

Occasionally, a compute instance might become unresponsive before a host error is signaled. Youcan reduce the amount of time Compute Engine waits to restart or terminate the instance bysetting the host error recovery timeout. For more information, seeSet availability policies.

Physical hardware and software failures can happen occasionally but are rare occurrences. To protect your applications and services from these potentially disruptive system events, review the following resources:

Google also offers managed services such asApp Engine and theApp Engine flexible environment.

Host maintenance policy overview

An instance's host maintenance policy determines how it behaves during thefollowing host events:

Maintenance event
Host error event or instance not responding

You can configure instances to continue running during host maintenance,while Compute Engine live migrates them to another host or you canchoose to stop your instance instead.

You can change an instance'shost maintenance policyby configuring the following settings:

Maintenance behavior: whether the instance is live migrated or stoppedwhen there is a maintenance event.
Restart behavior: whether Compute Engine restarts or terminatesthe instance if the instance crashes, experiences a host error, or becomesunresponsive.
Host error detection time: the maximum amount of time thatCompute Engine waits to restart or terminate an instance afterdetecting that the instance is unresponsive.
Local SSD recovery time: the maximum amount of time thatCompute Engine spends recovering the data on Local SSD disks afterdetecting a host error. The Local SSD data is lost if the specified timeelapses without a successful recovery.

You can update an instance's host maintenance policy at any time to controlhow you want your instances to behave.

Maintenance and restart behaviors

When a host event occurs, the compute instance can either use live migration,or the instance can be terminated. If an instance is terminated, then you canchoose to restart the instance yourself or have Compute Engineautomatically restart it.

The following machine series might not support live migration, and insteadrequiretermination during host events:

Z3 (including Z3-metal),X4, andH4D instancesterminate and restart in place.
Bare metalinstances terminate and restart, meaning they might restart on a differenthost. For more details, see the "Maintenance experience" documentation forthe machine series. For example, for C3 bare metal machine types, seeMaintenance experience for C3 instances.
Confidential VM instancesexcept for N2D machine types with AMD EPYC Milan CPU platformsrunning AMD SEV.
Instances with GPUs
Instances with TPUs

Live migrate

By default, most instance types are set tolive migrate, excluding theinstance types mentioned in the previous section.

During live migration, Compute Engine automatically migrates yourinstance away from an infrastructure maintenance event, and your instanceremains running during the migration. Your instance might experience a shortperiod of decreased performance, but in general, most instances shouldn'tperform noticeably different. This is ideal for instances that require constantuptime and can tolerate a short period of decreased performance.

When Compute Engine migrates your instance, it reportsa system event that is published to the list of zone operations and to theSystem Events logs. You can review this event byviewing the Compute Engine operationsfor a specific zone. Live migration events have the following operation type:

compute.instances.migrateOnHostMaintenance

Terminate and restart

If you don't want your instance to live migrate, or if your instance typedoesn't support live migration, then you can instead choose to allowGoogle Cloud to stop the instance when a host event occurs. With thisconfiguration, if a host event occurs, then Compute Engine sends a softpower-off signal to shut down the instance.It then waits 60 seconds for the instance to shut down cleanly, and sets theinstance status toTERMINATED. If the instance doesn't shut down cleanlyin 60 seconds, then it is forcibly terminated.

This option is ideal if your instances demand constant, maximum performance,and if your overall application is built to handle instance failures or reboots.

When Compute Engine stops an instance because of a host event, it reportsa system event that is published to the list of zone operations and to theSystem Events logs. You can review this event byviewing the Compute Engine operationsfor a specific zone. Instance termination events have the following operationtype:

compute.instances.terminateOnHostMaintenance

Automatic restart

If your instance is configured to stop when there is a maintenance event, orif your instance crashes because of an underlying hardware issue, thenCompute Engine can automatically restart the instance. The instance is eitherrestarted on the same host server, or moved to another server in the same zonethat isn't participating in the maintenance event.

By default, Compute Engine tries to recover instances with attached Local SSDdisks for one hour. If the time limit is reached, then Compute Engine attemptsto restart the instance on a different host server in the same zone.Z3, X4, and H4D instances have different default wait times. These instancetypes restart on the same host server after instance termination.

To configure automatic restart, set the host maintenance policy fieldautomaticRestart totrue. This setting does not apply if the instance istaken offline due to a zonal outage or through manual operation, such ascallingsudo shutdown within the guest OS.

When Compute Engine automatically restarts your instance, it reportsa system event that is published to the list of zone operations. You canreview this event byviewing the Compute Engine operationsfor a specific zone. Automatic restart events have the following operation type:

compute.instances.automaticRestart

Disk persistence following instance termination

Because Persistent Disk andHyperdisk are network-attached storage,when your instance restarts, Compute Engine reattaches the boot disk and anysecondary disks to the instance. The data on those disks persists throughlive migration and instance restarts.

Compute Engine preserves the data on Local SSD disks following a host eventwhen possible. However, Compute Engine doesn't guarantee Local SSD datapersistence.

Local SSD disks are preserved in the following scenarios:
- You configure your instance for live migration and the instance goesthrough a host maintenance event.
- A host error occurs and Compute Engine reconnects the instance to theLocal SSD disks within the timeout limit.
- A compute instance with attached Local SSD disks that supports onlytermination and automatic restart undergoes a maintenance event. Theinstance restarts in place, preserving the Local SSD data, instead ofmigrating to a new host.
Local SSD disks are not preserved in the following scenarios:
- You shutdown the guest operating system and force the instance to stop.
- You configure the instance to stop on host maintenance events and theinstance goes through a host maintenance event.
- A host error occurs and Compute Engine can't reconnect the disks to theinstance before the timeout expires. In this case, the instance is restartedwithout recovering the Local SSD disks. When the instance restarts,Compute Engine attaches blank Local SSD disks to the restarted instance.You mustformat and mountthese disks before the instance can use them. The data on the original LocalSSD disks is unrecoverable.

Google Cloud uses a best-effort approach to keep your Local SSD data intact.However, there are cases where data can't be recovered, such as a timeout.For more information about when Local SSD disks are preserved, seeLocal SSD data persistence.

Local SSD recovery timeout

When a host error occurs, Compute Engine tries to recover any LocalSSD disks attached to the instance. You can control how much timeCompute Engine spends trying to recover the data with the host policylocalSsdRecoveryTimeout setting.

By default, Compute Engine spends 1 hour recovering the data, but valid valuesfor this setting are between 0 and 168, in increments of 1 hour. For Z3instances, the default value is 6, which means Z3 instances will try to recoverthe Local SSD data for 6 hours before reaching the timeout limit.

If you set the Local SSD recovery timeout to 0, then Compute Engine doesn'tattempt to recover any attached Local SSD disks. The instance is restartedas soon as possible and the Local SSD data is unrecoverable. Use thisconfiguration if resuming the workload is more importantthan recovering the Local SSD data.

If the recovery timeout is not set to 0, but the time limit is reached beforethe Local SSD data is recovered, then Compute Enginerestarts the instance without the Local SSD disk. Compute Engine attachesnew, blank Local SSD disks to the restarted instance. You mustformat and mount thesedisks before the instance can use them.

The instance is in aREPAIRINGstate while Compute Engine attempts to recover the Local SSD disks.The instance and Local SSD disks are unavailable during this time.

If you set the Local SSD recovery timeout to the maximum value of 168, thenthe instance remains in theREPAIRING state for up to 7 days whileCompute Engine attempts to recover the Local SSD disks.

Note: You're not charged for the instance while it is in theREPAIRING state,as described in VM instance lifecycle.

Stop Local SSD disk recovery

You can interrupt the Local SSD disk recovery process before Compute Enginereaches the recovery timeout limit. To do so, use thegcloud compute instances stop command with the--discard-local-ssd=Trueflag.

This command stops the recovery process, stops the compute instance, anddiscards the Local SSD data. You can then restart the instance.SeeStop an instance with Local SSDfor more information.

Note: You can't preserve Local SSD data on suspend with Titanium SSD disks. For Titanium instances, the only supported option is--discard-local-ssd=True.

To set the Local SSD recovery timeout, see Set instance host maintenance policy.

Maintenance scheduling

Google Cloud provides features that allow tighter control around maintenance.By usingcertain machine families,you can specify maintenance preferences and get notifications of upcomingmaintenance events through Cloud Logging, the instance's metadata server,the gcloud CLIcompute instances describe command or theRESTinstances.describe method. Upon receipt of anotification,you have a period of time in which you can start the scheduled maintenanceat a time you choose. If you don't trigger the scheduled maintenance, then themaintenance event occurs at the end of the notification time period, which isthe scheduled time listed in the notification.

You can use these features in combination with your host maintenance policy tocustomize a maintenance schedule that fits your workload.

What's next

Learn more aboutlive migration.
Learn more aboutsetting instance host maintenance policy.
Learn more aboutgetting live migration notices.
Learn more aboutsimulating host maintenance.
Learn more abouthandling GPU host maintenance events.
Learn more aboutmanually live migrating sole-tenant VMs.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

About host events Stay organized with collections Save and categorize content based on your preferences.