Live migration process during maintenance events

During a planned maintenance event for the underlying hardware of a virtualmachine (VM) instance or baremetal instance, the host server is unavailable. To keep aninstance running during a host event, Compute Engine performs alive migration of the instance to another host server in the same zone. Formore information about host events, seeAbout host events.

Live migration lets Google Cloud perform maintenance withoutinterrupting a workload, rebooting an instance, or modifying any of theinstance's properties, such as IP addresses, metadata, block storage data,application state, or network settings.

Live migration keeps instances running during the following situations:

  • Infrastructure maintenance. Infrastructure maintenance includes hosthardware, network and power grids in data centers, and host operatingsystem (OS) and BIOS.

  • Security-related updates and system configuration changes. These includeevents such as installing security patches and changing the size of the hostroot partition for storage of the host OS image and packages.

  • Hardware failures. This includes failures in memory, CPUs, networkinterface cards, and disks. If the failure is detected before there isa complete server failure, then Compute Engine performs a preventativelive migration of the instance to a new host server. If the hardware failscompletely or otherwise prevents live migration, then the instance terminatesand restarts automatically.

Compute Engine only performs a live migration of VMs that have thehost maintenance policy set to migrate. For information about how to change thehost maintenance policy, seeSet VM host maintenance policy.

Live migration process and Local SSD disks

Compute Engine can live migrate instances with Local SSD disksattached (excluding Z3 instances with more than 18 TiB of attachedTitanium SSD). Compute Engine moves the VM instances along withtheir Local SSD data to a new machine in advance of any planned maintenance.

Caution: Instances with a large amount of Local SSD data canexperience a longer period of performance degradation or Local SSDdata loss during the migration to the new host server.

Limitations

Live migration is not supported for the following VM types:

  • H4D instances with Local SSD.
  • Bare metal instances. Instances created with abare metal machine type don't support live migration. The maintenance behavior for these instances is set toTERMINATE andRESTART, respectively.
  • Most Confidential VM instances. Live migration for Confidential VM instances is only supported on N2D machine types with AMD EPYC Milan CPU platforms running AMD SEV. All other Confidential VM instances don't support live migration, and must be set to stop and optionally restart during a host maintenance event. SeeLive migration for more details.
  • VMs with GPUs attached. VM instances with GPUs attached must be set to stop and optionally restart. Compute Engine offers a notice before a VM instance with a GPU attached is stopped, depending on the GPU type:

    • For most GPUs, Compute Engine provides a 60-minute notice.
    • For A4X, A4, and A3 Ultra instances, Compute Engine provides a 10-minute notice.

    To learn more about these maintenance event notices, readQuery metadata server for maintenance event notices.

    To learn more about handling host maintenance with GPUs, readHandling host maintenance in the GPUs documentation.

  • Cloud TPUs.Cloud TPUs don't support live migration.
  • Storage-optimized VMs. Z3 VMs with more than 18 TiB of attached Titanium SSD don't support live migration. The maintenance behavior for these VMs is set toTERMINATE andRESTART.Compute Engine preserves the data on Titanium SSD during the maintenance event, as described inDisk persistence following instance termination.
  • Compute-optimized VMs H4D VMs don't support live migration, as live migration is not supported for RDMA-enabled VMs. From an HPC application perspective, performing live migration of an instance would severely impact application performance, and it's better for applications to start from a checkpoint. Maintenance behavior for these VMs is set toTERMINATE andRESTART. Compute Engine preserves the data on Titanium SSD during the maintenance event, as described inDisk persistence following instance termination.

How does the live migration process work?

When a VM is scheduled to live migrate, Compute Engine provides anotification so thatyou can prepare your workloads and applications for this live migrationdisruption. During live migration, Google Cloud observes a minimumdisruption time, which is typically much less than 1 second. If a VM is notset to live migrate, Compute Engine terminates the VM during hostmaintenance. VMs that are set to terminate during a host eventstop and (optionally) restart.

When Google Cloud migrates a running VM from one host to another, itmoves the complete state of the VM from the source to the destination in a waythat is transparent to the guest OS and anything communicating with it.There are many components involved in making this work seamlessly, but thehigh-level steps are shown in the following illustration:

Migrating a VM and each of its resources to a new host system            without requiring the guest operating system to restart.
Live migration components

The process begins with a notification that a VM needs to be moved from itscurrent host machine. The notification might start with a file change indicatingthat a new BIOS version is available, a hardware operation schedulingmaintenance, or an automatic signal from an impending hardware failure.

Google Cloud's cluster management software constantly watches for theseevents and schedules them based on policies that control the data centers, suchas capacity utilization rates and the number of VMs that a single customer canmigrate at once.

After a VM is selected for migration, Google Cloud provides anotification to the guest that a migration is happening soon. After a waitingperiod, a target host is selected and the host is asked to set up a new, empty"target" VM to receive the migrating "source" VM. Authentication is used toestablish a connection between the source and the target.

There are three stages involved in the VM's migration:

  1. Source brownout. The VM is still executing on the source, whilemost state is sent from the source to the target. For example,Google Cloud copies all the guest memory to the target, whiletracking the pages that have been changed on the source. The time spent insource brownout is a function of the size of the guest memory and the rateat which pages are being changed.

  2. Blackout. A very brief moment when the VM is not running anywhere, thesource VM is paused and all the remaining state required to begin runningthe VM on the target is sent. The VM enters the blackout stage when sendingstate changes during the source brownout stage reaches a point ofdiminishing returns. An algorithm is used that balances numbers of bytes ofmemory being sent against the rate at which the guest VM is making changes.

    During blackout events, the system clock appears to jump forward, up to 5seconds. If a blackout event exceeds 5 seconds, Google Cloud stopsand synchronizes the clock using a daemon that is included as part of theVM guest packages.

  3. Target brownout. The VM executes on the target VM. The source VMis present and might provide support for the target VM. Forexample, until the network fabric has caught up with the new location of thetarget VM, the source VM provides forwarding services for packets to and fromthe target VM.

Finally, the migration is complete and the system deletes the source VM. You cansee that the migration took place in theCloud Logging logsfor your VM.

Note: During live migration, VMs might experience a decrease in performance indisk, CPU, memory, and network utilization for a short period of time.

Live migration of sole-tenant VMs

As your workload runs, you might want to move VMs to a different sole-tenantnode or node group. If you move a VM to a group of nodes, Compute Enginedetermines which node to place it on. For information about sole-tenancy, seeSole-tenancy overview.

To move sole-tenant VMs to a different node or node group, you can manuallyinitiate a live migration. You can also manually initiate a live migration tomove a VM on a multi-tenant host into a sole-tenant node. For more information,seeManually live migrate VMs.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.