Manage failures for regional disks

Regional Persistent Disk and Hyperdisk Balanced High Availability are storage options that provide synchronousreplication of data between two zones in a region. You can use RegionalPersistent Disk or Hyperdisk Balanced High Availability as a building block when youimplement high availability (HA) servicesin Compute Engine.

This document explains the various scenarios that can disrupt the working ofyour regional disks and how you can manage these scenarios.

Before you begin

Review the basics about regional disks and failover.For more information, seeAbout synchronous disk replication.
If you haven't already, set upauthentication. Authentication verifies your identity for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine by selecting one of the following options:
Select the tab for how you plan to use the samples on this page:
gcloud
1. Install the Google Cloud CLI. After installation,initialize the Google Cloud CLI by running the following command:
  gcloudinit
  If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
  Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.
2. Set a default region and zone.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Required roles

To get the permissions that you need to migrate regional disk data using a replica recoverycheckpoint, ask your administrator to grant you the following IAM roles:

To migrate regional disk data using a replica recovery checkpoint:Compute Instance Admin (v1) (roles/compute.instanceAdmin.v1) on the project
To view regional disk metrics (one of the following):
- Monitoring viewer (roles/monitoring.viewer) on the project
- Monitoring editor (roles/monitoring.editor) on the project

For more information about granting roles, seeManage access to projects, folders, and organizations.

These predefined roles contain the permissions required to migrate regional disk data using a replica recoverycheckpoint. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to migrate regional disk data using a replica recoverycheckpoint:

To create a standard snapshot from the replica recovery checkpoint:
- compute.snapshots.create on the project
- compute.disks.createSnapshot on the disk
To create a new regional disk from the standard snapshot:compute.disks.create on the project where you want to create the new disk
To migrate VMs to the new disk:
- compute.instances.attachDisk on the VM instance
- compute.disks.use permission on the newly created disk

You might also be able to get these permissions withcustom roles or otherpredefined roles.

Limitations

You can't useforce-attach operations on disks that are in multi-writer mode.

Failure scenarios

With regional disks, when the device isfully replicated, datais automatically replicated to two zones in a region. A write is acknowledgedback to a compute instance when it is durably persisted in bothreplicas.

If replication to one zone fails or is very slow for a while, the diskreplication status switches todegraded. In this mode, write is acknowledgedafter it is durably persisted in one replica.

If and when Compute Engine detects that replication can be resumed, datathat was written to one replica after the other replica entered the degradedstate is synced to both zones and the disk returns to a fully replicated state.This transition is fully automated.

RPO and RTOare undefined while a device is in a degraded state. To minimizedata and availability loss in the event of a failure of a disk operating ina degraded state, we recommend that you back up your regionaldisks regularly usingstandard snapshots.You can recover a disk by restoring the snapshot.

Zonal failures

A replicated disk, orregional disk, is synchronously replicated to diskreplicas in the primary and secondary zones. Zonal failures happen when a zonalreplica becomes unavailable. Zonal failures can happen in either the primary orsecondary zone due to one of the following reasons:

There is a zonal outage.
The replica experiences excessive slowness in write operations.

The following table provides the various zonal failure scenarios that you mightencounter for regional disks and the recommended action for eachscenario. In each of these scenarios, it is assumed that your primary zonalreplica is healthy and synced during the initial state.

Initial state of the disk	Failure in	New state of the disk	Consequences of failure	Action to take
Primary replica: Synced Secondary replica: Synced Disk status: Fully replicated Disk attached in: primary zone	Primary zone	Primary replica: Out of sync or unavailable Secondary replica: Synced Disk status: Degraded Disk attached in: primary zone	The replica in the secondary zone remains healthy and has the latest disk data. The replica in the primary zone is unhealthy and is not guaranteed to have all the disk data.	Failover the disk by force-attaching to a VM in the healthy secondary zone.
Primary replica: Synced Secondary replica: Synced Disk status: Fully replicated Disk attached in: primary zone	Secondary zone	Primary replica: Synced Secondary replica: Out of sync or unavailable Disk status: Degraded Disk attached in: primary zone	The replica in the primary zone remains healthy and has the latest disk data. The replica in the secondary zone is unhealthy and is not guaranteed to have all the disk data.	No action needed. Compute Engine brings the unhealthy replica in the secondary zone back into sync after it is available again.
Primary replica: Synced Secondary replica: Out of sync and unavailable Disk status: Degraded Disk attached in: primary zone	Primary zone	Primary replica: Synced but unavailable Secondary replica: Out of sync Disk status: Unavailable Disk attached in: primary zone	Both zonal replicas are unavailable and cannot serve traffic. The disk becomes unavailable. If the zonal outage or replica failure is temporary, then no data is lost. If the zonal outage or replica failure is permanent, then any data written to the healthy replica while the disk was degraded is permanently lost.	Google recommends that you use an existing standard snapshot and create a new disk to recover your data. As a best practice, back up the regional disks regularly using standard snapshots.
Primary replica: Synced Secondary replica: Catching up but available Disk status: Catching up Disk attached in: primary zone	Primary zone	Primary replica: Unavailable Secondary replica: Catching up but available Disk status: Unavailable Disk attached in: primary zone	Both zonal replicas cannot serve traffic. The disk becomes unavailable. If the zonal outage or replica failure is temporary, then your disk resumes operations after the primary replica is available again. If the zonal outage or replica failure is permanent, your disk becomes unusable.	Google recommends that you use an existing standard snapshot and create a new disk to recover your data. As a best practice, back up the regional disk regularly using standard snapshots. If you don't have any existing standard snapshots of the disk, you can still recover your data from the out of sync replica by using the replica recovery checkpoint.
Primary replica: Synced Secondary replica: Out of sync but available Disk status: Degraded Disk attached in: primary zone	Primary zone	Primary replica: Unavailable Secondary replica: Out of sync but available Disk status: Unavailable Disk attached in: primary zone	Both zonal replicas cannot serve traffic. The disk becomes unavailable. If the zonal outage or replica failure is temporary, then your disk resumes operations after the primary replica is available again. If the zonal outage or replica failure is permanent, your disk becomes unusable.	Google recommends that you use an existing standard snapshot and create a new disk to recover your data. As a best practice, back up the regional disk regularly using standard snapshots. If you don't have any existing standard snapshots of the disk, you can still recover your data from the out of sync replica by using the replica recovery checkpoint.

Application and VM failures

In the event of outages caused by VM misconfiguration, an unsuccessful OSupgrade, or other application failures, you canforce-attach your regionaldisk to a compute instance in the same zone as the healthy replica.

Failure category and (probability)	Failure types	Action
Application failure (High)	Unresponsive applications Failure due to application administrative actions (for example, upgrade) Human error (for example, misconfiguration of parameters such as SSL certificate or ACLs)	Application control plane can trigger failover based on health check thresholds.
VM failure (Medium)	Infrastructure or hardware failure VM unresponsive due to CPU contention, intermediate network interruption	VMs are usually autohealed. The application control plane can trigger failover based on health check thresholds.
Application corruption (Low-Medium)	Application data corruption (for example, due to application bugs or an unsuccessful OS upgrade)	Application recovery: Try application-specific recovery tools if available. For example,MySQL database page corruption. Restore from logical replication archive. For example, a read replica or logical log archive such as PostgreSQL continuous archiving.

Failover a regional disk using`force-attach`

In the event that the primary zone fails, you can fail over yourRegional Persistent Disk or Hyperdisk Balanced High Availability volume to a computeinstance in another zone by using a force-attach operation.

When there's a failure in the primary zone, you might not be able to detach thedisk from the instance because the instance can't be reached to perform thedetach operation.Force-attach lets you attach aRegional Persistent Diskor Hyperdisk Balanced High Availability volume to a compute instanceeven if that volume is attached to another instance.

After you complete the force-attach operation, Compute Engine preventsthe original instance from writing to the regional disk. Using theforce-attach operation lets you safely regain access to your data and recoveryour service. You also have the option tomanually shut down the VM instanceafter you perform the force-attach operation.

To force attach an existing disk to a compute instance, select one of thefollowing tasks:

Console

Go to theVM instances page.
Go to VM instances
Select your project.
Click the name of the instance that you want to change.
On the details page, clickEdit.
In theAdditional disks section, clickAttach additional disk.
Select the regional, or synchronously replicated disk from the drop-downlist.
To force attach the disk, select theForce-attach disk checkbox.
ClickDone, and then clickSave.

You can perform the same steps toforce-attach a disk to the originalcompute instance after the failure is resolved.

gcloud

In the gcloud CLI, use theinstances attach-disk commandto attach the replica disk to a compute instance. Includethe--disk-scope flag and set it toregional.

gcloud compute instances attach-diskVM_NAME \    --diskDISK_NAME --disk-scope regional \    --force-attach

Replace the following:

VM_NAME: the name of the new compute instance in the region
DISK_NAME: the name of the regional disk

After youforce-attach the disk, mount the file systems on the disk,if necessary. The compute instance can use the force-attached disk tocontinue read and write operations to the disk.

REST

Construct aPOST request to thecompute.instances.attachDisk method,and include the URL to the regional disk that you just created.To attach the disk to the new compute instance, theforceAttach=true queryparameter is required if the primary compute instance still hasthe disk attached.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/VM_NAME/attachDisk?forceAttach=true{ "source": "projects/PROJECT_ID/regions/REGION/disks/DISK_NAME"}

Replace the following:

PROJECT_ID: your project ID
ZONE: the location of your compute instance
VM_NAME: the name of the compute instance where you are adding the regional disk
REGION: the region where your regional disk is located
DISK_NAME: the name of the regional disk

After you attach the regional disk, mount the file systems on the disksif necessary. The compute instance can use the replica disk to continueread and write operations to the disk.

Note: You can't attach the disk if doing so exceeds the Hyperdisk size and attachment limits.for the compute instance.

Failover a boot disk to a secondary instance

You can have only one boot disk attached to a compute instance. When failingover a regional boot disk, use one of the following methods, dependingon whether the secondary compute instance already exists:

If you don't have an active standby VM, then create a new instance inthe secondary zone. When you create the second instance, use the regionaldisk for the boot disk, as described inCreate a new VM with a regional boot disk.
If you have a standby VM in the secondary zone, then replace the boot diskof the standby VM with the regional boot disk, as described inAttach a regional boot disk to a VM.

Use replica recovery checkpoint to recover regional disks

A replica recovery checkpoint represents the most recent crash-consistent pointin time of a fully replicatedRegional Persistent Disk or Hyperdisk Balanced High Availability volume. Compute Enginelets you create standard snapshots from the replica recovery checkpoint fordegraded regional disks.

In rare scenarios, when your disk is degraded, the zonal replica that is synced with the latest disk data can also fail before the out-of-sync replica catches up. You won't be able to force-attach your disk to compute instances in either zone. Your replicated disk becomes unavailable and you must migrate the data to a new disk. In such scenarios, if you don't have any existing standard snapshots available for your disk, you might still be able to recover your disk data from the incomplete replica by using a standard snapshot created from the replica recovery checkpoint.SeeProcedure to migrate and recover disk datafor detailed steps.

Important: In case of an unavailable disk, Google recommends that you always use any existing standard snapshots to create a new Regional Persistent Disk or Hyperdisk Balanced High Availability volume and recover disk data. Create standard snapshots from a checkpoint only if you don't have any existing standard snapshots available.

Procedure to migrate and recover disk data

To recover and migrate the data of a regional disk by usingthe replica recovery checkpoint, perform the following steps:

Create a standard snapshot of the impactedRegional Persistent Disk or Hyperdisk Balanced High Availability volume from itsreplica recovery checkpoint.
You can create the standard snapshot for a disk from its replica recoverycheckpoint by using only gcloud CLI or REST.
gcloud
To create a snapshot using the replica recovery checkpoint, use the gcloud compute snapshots create command.Include the--source-disk-for-recovery-checkpoint flag tospecify that you want to create the snapshot using a replica recoverycheckpoint. Exclude the--source-disk and--source-disk-region parameters.
```
gcloud compute snapshots createSNAPSHOT_NAME \    --source-disk-for-recovery-checkpoint=SOURCE_DISK \    --source-disk-for-recovery-checkpoint-region=SOURCE_REGION \    --storage-location=STORAGE_LOCATION \    --snapshot-type=SNAPSHOT_TYPE
```
Replace the following:
- DESTINATION_PROJECT_ID: The ID of project in which you want to create the snapshot.
- SNAPSHOT_NAME: A name for the snapshot.
- SOURCE_DISK: The name or full path of the source disk that you want to use to create the snapshot. To specify the full path of a source disk, use the following syntax:
  projects/SOURCE_PROJECT_ID/regions/SOURCE_REGION/disks/SOURCE_DISK_NAME
  If you specify the full path to the source disk, you can exclude the--source-disk-for-recovery-checkpoint-region flag. If you specify only the disk's name, then you must include this flag.
  To create a snapshot from the recovery checkpoint of a source disk in a different project, you must specify the full path to the source disk.
- SOURCE_PROJECT_ID: The project ID of the source disk whose checkpoint you want to use to create the snapshot.
- SOURCE_REGION: The region of the source disk whose checkpoint you want to use to create the snapshot.
- SOURCE_DISK_NAME: The name of the source disk whose checkpoint you want to use to create the snapshot.
- STORAGE_LOCATION: Optional:TheCloud Storage multi-regionor theCloud Storage regionwhere you want to store your snapshot. You can specify only one storage location.
  Use the--storage-location flagonly if you want to override the predefined or customized default storagelocation configured in your snapshot settings.
- SNAPSHOT_TYPE: The snapshot type, eitherSTANDARD orARCHIVE. If a snapshot type is not specified, aSTANDARDsnapshot is created.
You can use replica recovery checkpoint to create a snapshotonly ondegraded disks.If you try to create a snapshot from a replica recoverycheckpoint when the device is fully replicated, you see the following error message:
```
The device is fully replicated and should not create snapshots out of a recovery checkpoint. Pleasecreate regular snapshots instead.
```
REST
To create a snapshot using the replica recovery checkpoint, make aPOST request to thesnapshots.insert method.Exclude thesourceDisk parameter and instead include thesourceDiskForRecoveryCheckpoint parameter to specify thatyou want to create the snapshot using the checkpoint.
```
POST https://compute.googleapis.com/compute/v1/projects/DESTINATION_PROJECT_ID/global/snapshots{  "name": "SNAPSHOT_NAME",  "sourceDiskForRecoveryCheckpoint": "projects/SOURCE_PROJECT_ID/regions/SOURCE_REGION/disks/SOURCE_DISK_NAME",  "storageLocations": "STORAGE_LOCATION",  "snapshotType": "SNAPSHOT_TYPE"}
```
Replace the following:
- DESTINATION_PROJECT_ID: The ID of project in which you want to create the snapshot.
- SNAPSHOT_NAME: A name for the snapshot.
- SOURCE_DISK: The name or full path of the source disk that you want to use to create the snapshot. To specify the full path of a source disk, use the following syntax:
  projects/SOURCE_PROJECT_ID/regions/SOURCE_REGION/disks/SOURCE_DISK_NAME
  If you specify the full path to the source disk, you can exclude the--source-disk-for-recovery-checkpoint-region flag. If you specify only the disk's name, then you must include this flag.
  To create a snapshot from the recovery checkpoint of a source disk in a different project, you must specify the full path to the source disk.
- SOURCE_PROJECT_ID: The project ID of the source disk whose checkpoint you want to use to create the snapshot.
- SOURCE_REGION: The region of the source disk whose checkpoint you want to use to create the snapshot.
- SOURCE_DISK_NAME: The name of the source disk whose checkpoint you want to use to create the snapshot.
- STORAGE_LOCATION: Optional:TheCloud Storage multi-regionor theCloud Storage regionwhere you want to store your snapshot. You can specify only one storage location.
  Use thestorageLocations parameteronly if you want to override the predefined or customized default storagelocation configured in your snapshot settings.
- SNAPSHOT_TYPE: The snapshot type, eitherSTANDARD orARCHIVE. If a snapshot type is not specified, aSTANDARDsnapshot is created.
You can use replica recovery checkpoint to create a snapshotonly ondegraded disks.If you try to create a snapshot from a replica recoverycheckpoint when the device is fully replicated, you see the following error message:
```
The device is fully replicated and should not create snapshots out of a recovery checkpoint. Pleasecreate regular snapshots instead.
```
Create a new Regional Persistent Disk or Hyperdisk Balanced High Availability disk using thissnapshot. When you create the new disk, you recover all the data from themost recent replica recovery checkpoint by restoring the data to the newdisk from the snapshot. For detailed steps, seeCreate a new instance with a regional boot disk.
Migrate all the VM workloads to the newly created disk and validatethat these VM workloads are running correctly. For more information,seeMove a VM across zones or regions.

After you recover and migrate your disk data and VMs to the newly createdRegional Persistent Disk or Hyperdisk Balanced High Availability disk, you can resumeyour operations.

Determine the RPO provided by replica recovery checkpoint

This section explains how to determine the RPO provided by the latest replicarecovery checkpoint of a Regional Persistent Disk or Hyperdisk Balanced High Availability volume.

Zonal replicas are fully synced

Compute Engine refreshes the replica recovery checkpoint of yourRegional Persistent Disk orHyperdisk Balanced High Availability volume approximately every 15minutes. As a result,when your zonal replicas are fully synced, the RPO is approximately15 minutes.

Zonal replicas are out of sync

You can't view the exact creation and refresh timestamps of a replica recoverycheckpoint. However, you can estimate the approximate RPO that your latestcheckpoint provides by using the following data:

Most recent timestamp of the fully replicated disk state: You can get thisinformation by using the Cloud Monitoring data for thereplica_state metric of the regional disk. Check thereplica_statemetric data for the out of sync replica to determine when the replica wentout of sync. Because Compute Engine refreshes the disk's checkpoint every15minutes, the most recent checkpoint refresh could have been approximately15 minutes before this timestamp.
Most recent write operation timestamp: You can get this information byusing the Cloud Monitoring data for thewrite_ops_countmetric of the regional disk. Check thewrite_ops_count metric data todetermine the most recent write operation for the disk.

After you determine these timestamps, use the following formula to calculatethe approximate RPO provided by the replica recovery checkpoint of your disk. Ifthe calculated value is less than zero, then the RPO is effectively zero.

Approximate RPO provided by the latest checkpoint =(Most recent write operation timestamp - (Most recent timestamp of the fullyreplicated disk state - 15 minutes))

What's next

Learn how tomonitor replica states and replication status for regional disks.
Learn how todetermine the exact disk replication status.
Learn how tocreate a snapshot of a disk.
Learn how tobuild high availability services using regional disks.
Learn how tobuild scalable and resilient web applications on Google Cloud.
Review thedisaster recovery planning guide.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換

Manage failures for regional disks Stay organized with collections Save and categorize content based on your preferences.

Before you begin

gcloud

REST

Required roles

Required permissions

Limitations

Failure scenarios

Zonal failures

Application and VM failures

Failover a regional disk usingforce-attach

Console

gcloud

REST

Failover a boot disk to a secondary instance

Use replica recovery checkpoint to recover regional disks

Procedure to migrate and recover disk data

gcloud

REST

Determine the RPO provided by replica recovery checkpoint

Zonal replicas are fully synced

Zonal replicas are out of sync

What's next

Manage failures for regional disks

Failover a regional disk using`force-attach`