Architecting disaster recovery for locality-restricted workloads

Last reviewed 2024-07-20 UTC

This document discusses how you can use Google Cloud to architect for disasterrecovery (DR) to meet location-specific requirements. For someregulated industries, workloads must adhere to these requirements. In thisscenario, one or more of the following requirements apply:

  • Data at rest must be restricted to a specified location.
  • Data must be processed in the location where it resides.
  • Workloads are accessible only from predefined locations.
  • Data must be encrypted by using keys that the customer manages.
  • If you are using cloud services, each cloud service must provide aminimum of two locations that are redundant to each other. For an example oflocation redundancy requirements, see theCloud Computing Compliance Criteria Catalogue (C5).

The series consists of these parts:

Terminology

Before you begin architecting for DR for locality-restricted workloads, it's agood idea to review locality terminology used in Google Cloud.

Google Cloud provides services inregions throughout the Americas,Europe and the Middle East, and Asia Pacific. For example, London(europe-west2) is a region in Europe, and Oregon (us-west1) is a region inNorth America. Some Google Cloud products group multiple regions into aspecificmulti-region location which is accessible in the same way that you would use a region.

Regions are further divided intozones where you deploy certain Google Cloudresources such as virtualmachines, Kubernetes clusters, or Cloud SQL databases.Resources on Google Cloud are multi-regional, regional, or zonal. Some resources andproducts that are by default designated asmulti-regional can also be restricted to a region. The different types of resources areexplained as follows:

  • Multi-regional resources are designed by Google Cloud to beredundant and distributed in and across regions. Multi-regionalresources are resilient to the failure of a single region.
  • Regional resources are redundantly deployed across multiple zones in a region,and are resilient to the failure of a zone within the region.

    Note: For more information about region-specific considerations, seeGeography and regions.
  • Zonal resources operate in a single zone. If a zone becomes unavailable,all zonal resources in that zone are unavailable until service is restored.Consider a zone as a single-failure domain. You need to architect yourapplications to mitigate the effects of a single zone becoming unavailable.

For more information, seeGeography and regions.

Planning for DR for locality-restricted workloads

The approach you take to designing your application depends on the type ofworkload and the locality requirements you must meet. Also consider why you mustmeet those requirements because what you decide directly influences your DRarchitecture.

Start by reading theGoogle Cloud disaster recovery planning guide.And as you consider locality-restricted workloads, focus on the requirementsdiscussed in this planning section.

Define your locality requirements

Before you start your design, define your locality requirements by answeringthese questions:

  • Where is the data at rest? The answer dictates what services youcan use and the high availability (HA) and DR methods you can employ toachieve your RTO/RPO values. Use theCloud locations page to determine what products are in scope.
  • Can you use encryption techniques to mitigate the requirement? Ifyou are able to mitigate locality requirements by employing encryptiontechniques using Cloud External Key Manager and Cloud Key Management Service, you can use multi-regionaland dual-regional services and follow the standard HA/DR techniquesoutlined inDisaster recovery scenarios for data.
  • Can data be processed outside of where it rests? You can useproducts such asGoogle Distributed Cloud to help create a hybrid environment to address your requirements orimplement product-specific controls such as load-balancingCompute Engine instances acrossmultiple zones in a region. Use the Organization policyResource Location constraint to restrict where resources can be deployed .

    If data can be processed outside of where it needs to be at rest, youcan design the "processing" parts of your application by following theguidance inDisaster recovery building blocks andDisaster recovery scenarios for applications.

    Configure a VPC Security Controls perimeter to control who can access thedata and to restrict what resources can process the data.

  • Can you use more than one region? If you can use more than oneregion, you can use many of the techniques outlined in the DisasterRecovery series. Check themulti-region and region constraints for Google Cloud products.

  • Do you need to restrict who can access your application?Google Cloud has several products and features that help you restrictwho can access your applications:

    • Identity-Aware Proxy (IAP).Verifies a user's identity and then determines whether that user shouldbe permitted to access an application. Organization policy uses thedomain-restricted sharing constraint to define the allowed Cloud Identity or Google WorkspaceIDs that are permitted in Identity and Access Management (IAM) policies.
    • Product-specific locality controls. Refer to each product youwant to use in your architecture for appropriate locality constraints.For example, if you're using Cloud Storage, create buckets inspecified regions.

Identify the services that you can use

Identify what services can be used based on your locality and regionalgranularity requirements. Designing applications that are subject to localityrestrictions requires understanding what products can be restricted to whatregion and what controls can be applied to enforce location restrictionrequirements.

Identify the regional granularity for your application and data

Identify the regional granularity for your application and data by answeringthese questions:

  • Can you use multi-regional services in your design? By usingmulti-regional services, you can create highly available resilientarchitectures.
  • Does access to your application have location restrictions? Usethese Google Cloud products to help enforce where your applicationscan be accessed from:
  • Is your data at rest restricted to a specific region? If you usemanaged services, ensure that the services you are using can be configuredso that your data stored in the service is restricted to a specific region.For example, use BigQuerylocality restrictions to dictate where your datasets are stored and backed up to.
  • What regions do you need to restrict your application to? SomeGoogle Cloud products do not have regional restrictions. Use theCloud locations page and the product-specific pages to validate what regions you can usethe product in and what mitigating features if any are available torestrict your application to a specific region.

Meeting locality restrictions using Google Cloud products

This section details features and mitigating techniques for usingGoogle Cloud products as part of your DR strategy for locality-restrictedworkloads. We recommend reading this section along withDisaster recovery building blocks.

Organization policies

TheOrganization Policy Service gives you centralized control over your Google Cloud resources. Usingorganization policies, you can configure restrictions across your entireresource hierarchy.Consider the followingpolicy constraints when architecting for locality-restricted workloads:

  • Domain-restricted sharing:By default, all user identities are allowed to be added toIAM policies. Theallowed_values list must specify one ormore Cloud Identity or Google Workspace customer identities.If this constraint is active, only identities in the allowed list areeligible to be added to IAM policies.

  • Location-restricted resources:This constraint refers to the set of locations where location-basedGoogle Cloud resources can be created. Policies for this constraintcan specify as allowed or denied locations any of the following:multi-regions such as Asia and Europe, regions such asus-east1 oreurope-west1, or individual zones such aseurope-west1-b. For a list ofsupported services, seeResource locations supported services.

Encryption

If your data locality requirements concern restricting who can access the data,then implementing encryption methods might be an applicable strategy. By usingexternal key management systems to manage keys that you supply outside ofGoogle Cloud, you might be able to deploy a multi-region architecture tomeet your locality requirements. Without the keys available, the data cannot bedecrypted.

Google Cloud has two products that let you use keys that you manage:

  • Cloud External Key Manager (Cloud EKM):Cloud EKM lets you encrypt data in BigQuery andCompute Engine with encryption keys that are stored and managed ina third-party key management system that's deployed outside Google'sinfrastructure.
  • Customer-supplied encryption keys (CSEK): You can use CSEK withCloud Storage andCompute Engine.Google uses your key to protect the Google-generated keys that are used toencrypt and decrypt your data.

    If you provide a customer-supplied encryption key, Google does notpermanently store your key on Google's servers or otherwise manage yourkey. Instead, you provide your key for each operation, and your key ispurged from Google's servers after the operation is complete.

When managing your own key infrastructure, you must carefully consider latencyand reliability issues and ensure that you implement appropriate HA andrecovery processes for your external key manager. You mustalso understand your RTO requirements. The keys are integral to writing thedata, so RPO isn't the critical concern because no data can be safely writtenwithout the keys. The real concern is RTO because without your keys you cannotunencrypt or safely write data.

Storage

When architecting DR for locality-restricted workloads, you must ensure thatdata at rest is located in the region you require. You can configureGoogle Cloud object and file store services to meet your requirements

Cloud Storage

You can create Cloud Storage buckets that meetlocality restrictions.

Beyond the features discussed in the Cloud Storage section of theDisaster Recovery Building Blocks article, when you architect for DR forlocality-restricted workloads, consider whetherredundancy across regions is a requirement: objects stored inmulti-regions anddual-regions are stored in at least two geographically separate areas, regardless of theirstorage class. This redundancy ensures maximum availability of your data, evenduring large-scale disruptions, such as natural disasters. Dual-regions achievethis redundancy by using a pair of regions that you choose.Multi-regions achieve this redundancy by using any combination of data centersin the specified multi-region, which might include data centers that are notexplicitly listed as available regions.

Data synchronization between the buckets occurs asynchronously. If you need ahigh degree of confidence that the data has been written to an alternativeregion to meet your RTO and RPO values, one strategy is to use two single-regionbuckets. You can then either dual-write the object or write to one bucket andhave Cloud Storagecopy it to the second bucket.

Single-region mitigation strategies when using Cloud Storage

If your requirements restrict you to using a single region, then you can'timplement an architecture that is redundantacross geographic locations using Google Cloud alone. In this scenario,consider using one or more of the following techniques:

  • Adopt a multi-cloud or hybrid strategy. This approach lets you chooseanother cloud or on-premises solution in the same geographic area as yourGoogle Cloud region. You can store copies of your data inCloud Storage buckets on-premises, or alternatively, useCloud Storage as the target for your backup data.

    To use this approach, do the following:

    • Ensure that distance requirements are met.
    • If you are using AWS as your other cloud provider, refer to theCloud Storage interoperability guide for how to configure access to Amazon S3 using Google Cloud tools.
    • For other clouds and on-premises solutions, consider open sourcesolutions such asminIO andCeph to provide an on-premises object store.
    • Consider using Cloud Composer with thegcloud storage command-lineutility to transfer data from an on-premises object store toCloud Storage.
    • Use theTransfer service for on-premises data to copy data stored on-premises to Cloud Storage.
  • Implement encryption techniques. If your locality requirements permitusing encryption techniques as a workaround, you can then use multi-regionor dual-region buckets.

Filestore

Filestore provides managed file storage that you can deploy inregions and zones according to your locality restriction requirements.

Managed databases

Disaster recovery scenarios for data describes methods for implementing backup and recovery strategies forGoogle Cloudmanaged database services.In addition to using these methods, you must also consider locality restrictionsfor each managed database service that you use in your architecture—forexample:

  • Bigtable is available in zonal locations in a region.Production instances have a minimum of two clusters, which must be inunique zones in the region.Replication between clusters in a Bigtable instance is automatically managedby Google. Bigtable synchronizes your data between the clusters,creating a separate, independent copy of your data in each zone where yourinstance has a cluster. Replication makes it possible for incoming traffictofail over to another cluster in the same instance.

  • BigQuery haslocality restrictions that dictate where your datasets are stored. Dataset locations can be regional or multi-regional. To provide resilience during a regionaldisaster, you need to back up data to another geographic location. In the case of BigQuery multi-regions, we recommend that you avoid backing up to regions within the scope of the multi-region.If you select the EU multi-region, you exclude Zürich and London from being part of the multi-region configuration.For guidance on implementing a DR solution for BigQuery that addresses the unlikely event of a physical regional loss, seeLoss of region.

    To understand the implications of adopting single-region or multi-regionBigQuery configurations, see theBigQuery documentation.

  • You can useFirestore to store your Firestore data in either a multi-region location ora regional location. Data in a multi-region location operates in amulti-zone and multi-region replicated configuration.Select a multi-region location if your locality restriction requirementspermit it and you want to maximize the availability and durability of yourdatabase. multi-region locations can withstand loss of entire regions andmaintain availability without data loss. Data in aregional locationoperatesin amulti-zone replicated configuration.

  • You can configure Cloud SQL forhigh availability.A Cloud SQL instance configured for HA is also called a regionalinstance and is located in a primary and secondary zone in the configuredregion. In a regional instance, the configuration is made up of a primaryinstance and a standby instance. Ensure that you understand thetypical failover time from the primary to the standby instance.

    If your requirements permit, you can configure Cloud SQL withcross-region replicas.If a disaster occurs, the read replica in a different region can bepromoted.Because read replicascan be configured for HA in advance,they don't need to go through additional changes after that promotion for HA.You can alsoconfigure read replicas to have their own cross-region replicas that can offer immediate protection from regional failures after replicapromotion.

  • You can configure Spanner as eitherregional or multi-region. For any regional configuration, Spanner maintainsthreeread-write replicas,each in a different Google Cloudzone in that region. Each read-write replica contains a full copy of youroperational database that is able to serve read-write and read-onlyrequests.

    Spanner uses replicas in different zones so that if asingle-zone failure occurs, your database remains available.A Spanner multi-region deployment provides a consistent environmentacross multiple regions, including two read-write regions and onewitness region containing awitness replica.You must validate that thelocations of all the regionsmeet your locality restriction requirements.

Compute Engine

Compute Engine resources areglobal, regional, or zonal.Compute Engine resources such asvirtual machine instances or zonalpersistent disks are referred to aszonal resources. Other resources, such asstatic external IP addresses,are regional. Regional resources can be used by any resources in that region,regardless of zone, while zonal resources can only be used by other resources inthe same zone.

Putting resources in different zones in a region isolates those resources frommost types of physical infrastructure failure and infrastructuresoftware-service failures. Also, putting resources in different regions providesan even higher degree of failure independence. This approach lets you designrobust systems with resources spread across different failure domains.

For more information, seeregions and zones.

Using on-premises or another cloud as a production site

You might be using a Google Cloud region that prevents you from usingdual or multi-region combinations for your DR architecture. To meet localityrestrictions in this case, consider using your own data center or another cloudas the production site or as the failover site.

This section discusses Google Cloud products that are optimized forhybrid workloads. DR architectures that use on-premises and Google Cloudare discussed inDisaster recovery scenarios for applications.

Google Kubernetes Engine

With products like Google Kubernetes Engine (GKE), Google Distributed Cloud, andGKE attached clusters, you can securely run yourcontainer-based workloads anywhere.These products enable consistency between on-premises and cloudenvironments, letting you have a consistent operating model.

As part of your DR strategy, GKE simplifies the configurationand operation of HA and failover architectures across dissimilar environments(between Google Cloud and on-premises or another cloud). You can run yourproduction GKE clusters on-premises and if a disaster occurs,you can fail over to run the same workloads on GKE clusters inGoogle Cloud.

GKE has three types of clusters:

  • Single-zone cluster. A single-zone cluster has a single control planerunning in one zone. This control plane manages workloads on nodes that arerunning in the same zone.
  • Multi-zonal cluster. A multi-zonal cluster has a single replica of thecontrol plane running in a single zone, and has nodes running in multiple zones
  • Regional cluster.Regional clusters replicate cluster primaries and nodes across multiple zones in a singleregion. For example, a regional cluster in theus-east1 region createsreplicas of the control plane and nodes in threeus-east1 zones:us-east1-b,us-east1-c, andus-east1-d.

Regional clusters are the most resilient to zonal outages.

Note: For more information about region-specific considerations, seeGeography and regions.

Google Cloud VMware Engine

Google Cloud VMware Engine lets you run VMware workloads in the cloud. If your on-premises workloads areVMware based, you can architect your DR solution to run on the samevirtualization solution that you are running on-premises. You can select theregion that meets your locality requirements.

Networking

When your DR plan is based on moving data from on-premises toGoogle Cloud or from another cloud provider to Google Cloud, thenyou must address your networking strategy. For more information, see theTransferring data to and from Google Cloud section of the "Disaster recovery building blocks" document.

VPC Service Controls

When planning your DR strategy, you must ensure that the security controls thatapply to your production environment also extend to your failover environment.By using VPC Service Controls, you can define a security perimeter fromon-premises networks to your projects in Google Cloud.

VPC Service Controls enables acontext-aware access approach to controlling your cloud resources. You can create granular accesscontrol policies in Google Cloud based on attributes like user identityand IP address. These policies help ensure that the appropriate securitycontrols are in place in your on-premises and Google Cloud environments.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-07-20 UTC.