About synchronous disk replication Stay organized with collections Save and categorize content based on your preferences.
Regional Persistent Disk and Hyperdisk Balanced High Availability volumes are designed for workloads thatrequire a lower Recovery Point Objective (RPO) and Recovery Time Objective(RTO). To learn more about RPO and RTO, seeBasics of disaster recovery planning.
Regional Persistent Disk and Hyperdisk Balanced High Availability volumes aredesigned to work with regionalmanaged instance groups.This document provides an overview of how to build HA services withRegional Persistent Disk and Hyperdisk Balanced High Availability volumes.
When you decide to use Regional Persistent Disk or Hyperdisk Balanced High Availability, make sure that youcompare the different options for increasing service availability and thecost, performance, and resiliencyfor different service architectures.
About synchronous disk replication
A Regional Persistent Disk or Hyperdisk Balanced High Availability volume, also referred toas a regional disk, or synchronously replicated disk, has a primary and asecondary zone within its region where it stores disk data:
- Primary zone is the same zone where the compute instance that you attachthe disk to is located.
- Secondary zone is an alternate zone of your choice within the sameregion.
Compute Engine maintains replicas of your disk in boththese zones. When you write data to your disk, Compute Engine synchronouslyreplicates that data to the disk replicas in both zones to ensure HA. The dataof each zonal replica is spread across multiple physical machines within thezone to ensure durability. Zonal replicas ensure that the data of thedisk remains available and provide protection against temporaryoutages in one of the disk zones.
Replica state for zonal replicas
Disk replica state for Regional Persistent Disk or Hyperdisk Balanced High Availability shows you the state ofa zonal replica in comparison to the content of the disk. Zonal replicas foryour disks are in one of the following disk replica states at all times:
- Synced: The replica is available, synchronously receives all the writesperformed to the disk, and is up to date with all the data on the disk.
- Catching up: The replica is available but is still catching up withthe data on the disk from the other replica.
- Out of sync: The replica is temporarily unavailable and out of syncwith the data on the disk.
To learn how to check and track the replica states of your zonal replicas,seeMonitor the disk replica states.
Replication states for regional disks
Depending on the state of the individual zonal replicas, yourRegional Persistent Disk or Hyperdisk Balanced High Availability volume can be in one ofthe following replication states:
- Fully replicated: Replicas in both zones are available and aresynced with the latest disk data.
- Catching up: Your zonal replicas are available, but one of the zonalreplicas is catching up with the latest disk data.
- Degraded: One of the zonal replicas has a status of
out of syncdue to a failure or an outage.
If the disk replication status iscatching up ordegraded, then one ofthe zonal replicas is not updated with all the data. Any outage during thistime in the zone of the healthy replica results in an unavailability of thedisk until the healthy replica zone is restored.
When your Regional Persistent Disk or Hyperdisk Balanced High Availability volume is catching up,Google Cloud starts healing the zonal replica that is catching up.Google recommends that you wait for the affected zonal replica to catch up withthe data on the disk, at which point its status changes toSynced. After thezonal replica then moves to the synced state, the regional disk statuschanges back to theFully replicated state.
If the regional disk has a status ofcatching up ordegraded for aprolonged period of time and does not meet your organization's RPO requirements,we recommend that you take snapshots of the primary replica in either offollowing ways:
- Enable scheduled snapshots.
- Create a manual snapshot of yourRegional Persistent Disk or Hyperdisk Balanced High Availability disk.
After you create a snapshot, you can create a newRegional Persistent Disk or Hyperdisk Balanced High Availability disk by using thatsnapshot as the source. This restores the snapshot to the new disk. Your newdisk also starts in a fully replicated state with healthy data replication.
To learn how to check the replication state of yourRegional Persistent Disk or Hyperdisk Balanced High Availability disk, seeDetermine the replication state of disks.
Replica recovery checkpoint
Areplica recovery checkpoint is a disk attribute thatrepresents the most recentcrash-consistentpoint in time of a fully replicated disk. Compute Engine automatically createsand maintains a single replica recovery checkpoint for each regional disk.When a disk is fully replicated, Compute Enginekeeps refreshing its checkpoint approximately every15 minutes to ensure thatthe checkpoint remains updated. When the disk replication status isdegraded, Compute Engine lets you create a standard snapshot from thereplica recovery checkpoint of that disk. The resulting standard snapshotcaptures the data from the most recent crash-consistent version of the fullyreplicated disk.
In rare scenarios, when your disk is degraded, the zonal replica that is synced with the latest disk data can also fail before the out-of-sync replica catches up. You won't be able to force-attach your disk to compute instances in either zone. Your replicated disk becomes unavailable and you must migrate the data to a new disk. In such scenarios, if you don't have any existing standard snapshots available for your disk, you might still be able to recover your disk data from the incomplete replica by using a standard snapshot created from the replica recovery checkpoint.
Compute Engine automatically creates replica recovery checkpoints for eachmounted Regional Persistent Disk or Hyperdisk Balanced High Availability disk. You don't incur anyadditional charges for the creation of these checkpoints. However, you do incurany applicable storage charges for the creation of snapshots and computeinstances when you use these checkpoints to migrate your regional disk tofunctioning zones.
Learn more about how torecover your regional disk data using a replica recovery checkpoint.
Regional disk failover
In the event of an outage in a zone, the zone becomes inaccessible and thecompute instance in that zone can't perform read or write operations on itsdisk. To allow the instance to keep performing read and write operations forthe regional disk, Compute Engine allows migration of disk data to the otherzone where the disk has a replica. This process is calledfailover.
The failover process involves detaching the zonal replica from the instance inthe affected zone and then attaching the zonal replica to a new instance inthe secondary zone. Compute Engine synchronously replicates the data on yourdisk to the secondary zone to ensure a quick failover in case of a singlereplica failure.
Failover by application-specific regional control plane
The application-specific regional control plane is not a Google Cloud service.When you design HA service architectures, you must build your ownapplication-specific regional control plane. This application control planedecides which instance must have the regional disk attached and whichinstance is the current primary instance.
When a failure is detected in the primary instance or database ofthe regional disk, the application-specific regional control planeof your HA service architecture can automatically initiate failover to thestandby instance in the secondary zone. During the failover, theapplication-specific regional control plane reattaches the regional disk tothe standby instance in the secondary zone. Compute Engine then directs alltraffic to that instance based on health check signals.
The overall failover latency, excluding failure-detection time, is the sum ofthe following latencies:
- Less than 1 minute to attach a regional disk to a standby instance
- Time required for application initialization and crash recovery
For more information, seeUnderstanding the application-specific regional control plane.
TheDisaster Recovery Building Blockspage covers the building blocks available on Compute Engine.
Failover by force-attach
One of the benefits of Regional Persistent Disk and Hyperdisk Balanced High Availability is that in theunlikely event of a zonal outage, you can manually failover your workloadto another zone. When the original zone has an outage, you can'tcomplete the disk detach operation until that zonal replica is restored. In thisscenario, you might need to attach the secondary zonal replica to a newcompute instance without detaching the primary zonal replica from your primaryinstance. This process is calledforce-attach.
When your compute instance in the primary zone becomes unavailable, you canforce attach your disk to an instance in the secondary zone.To perform this task, you must do one of the following:
- Start another compute instance in the same zone as the regional disk replicathat you are force attaching.
- Maintain a hot standby compute instance in that zone. Ahot standby is arunning instance that is identical to the one in the primary zone. The twoinstances have the same data.
Compute Engine executes the force-attach operation in less than one minute.The totalrecovery time objective (RTO)depends not only on the storage failover (the force attachment of the regionaldisk), but also on other factors, including the following:
- Whether you must first create a secondary instance
- The length of time that it takes the underlying file system to detecta hot-attached disk
- The recovery time of the corresponding applications
For more information about how to failover your compute instance usingforce-attach, seeFailover your regional disk usingforce-attach.
Limitations
The following sections list the limitations that apply forRegional Persistent Disk and Hyperdisk Balanced High Availability.
General limitations for regional disks
- You can attach regional Persistent Disk only to VMs that useE2,N1,N2, andN2D machine types.
- You can attach Hyperdisk Balanced High Availability only tosupported machine types.
- You can't create a regional Persistent Disk from anOS image, or from a disk that was created from an OS image.
- You can't create a Hyperdisk Balanced High Availability disk by cloning a zonal disk. To create a Hyperdisk Balanced High Availability disk from an zonal disk, complete the steps inChange a zonal disk to a Hyperdisk Balanced High Availability disk.
- When using read-only mode, you can attach a regional balanced Persistent Disk to a maximum of 10 VM instances.
- The minimum size of a regional standard Persistent Disk is 200 GiB.
- You can only increase the size of a regional Persistent Disk or Hyperdisk Balanced High Availability volume; you can't decrease its size.
- Regional Persistent Disk and Hyperdisk Balanced High Availability volumes have different performance characteristics than their corresponding zonal disks. For more information, seeAbout Persistent Disk performance andHyperdisk Balanced High Availability performance limits.
- You can't use a Hyperdisk Balanced High Availability volume that's in multi-writer mode as a boot disk.
- If you create a replicated disk by cloning a zonal disk, then the two zonal replicas aren't fully in sync at the time of creation. After creation, you can use the regional disk clone within 3 minutes, on average. However, you might need to wait for tens of minutes before the disk reaches a fully replicated state and the recovery point objective (RPO) is close to zero. Learn how to check if your replicated disk is fully replicated.
Limitations for replica recovery checkpoints
- A replica recovery checkpoint is part of the device metadata and doesn't show you any disk data by itself. You can only use the checkpoint as a mechanism to create a snapshot of your degraded disk. After you create the snapshot by using the checkpoint, you can use the snapshot to restore your data.
- You can create snapshots from a replica recovery checkpoint only when your disk is degraded.
- Compute Engine refreshes the replica recovery checkpoint of your disk only when the disk is fully replicated.
- Compute Engine maintains only one replica recovery checkpoint for a disk and only maintains the latest version of that checkpoint.
- You can't view the exact creation and refresh timestamps of a replica recovery checkpoint.
- You can create a snapshot from your replica recovery checkpoint only by using the Compute Engine API.
What's next
- Learn how tobuild high availability services using regional disks.
- Review thedisaster recovery planning guide.
- Learn aboutdisk pricing.
- Learn how tocreate and manage regional disks.
- Learn how tomonitor the replica states of disks.
- Learn how todetermine the replication state of a disk.
- Learn how tomanage failures for regional disks.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.