Global deployment with Compute Engine and Spanner

Last reviewed 2025-08-12 UTC

This document provides a reference architecture for a multi-tier applicationthat runs on Compute Engine VMs and Spanner in a globaltopology in Google Cloud. The document also provides guidance to help you buildan architecture that uses other Google Cloud infrastructure services. Itdescribes the design factors that you should consider when you build a globalarchitecture for your cloud applications. The intended audience for thisdocument is cloud architects.

This architecture is aligned with theglobal deployment archetype.We recommend this archetype for applications that serve users across the worldand need high availability and robustness against outages in multipleregions.This architecture supports elastic scaling at the network, application, anddatabase levels. It lets you align costs with usage without having to compromiseon performance, availability, or scalability.

Architecture

The following diagram shows an architecture for an application that runs oninfrastructure that's distributed globally across multiple Google Cloudregions.

Global deployment architecture using Compute Engine and Spanner.

In this architecture, a global load balancer distributes incoming requests toweb servers in appropriate regions based on their availability, capacity, andproximity to the source of the traffic. A cross-regional internal load balancinglayer handles distribution of traffic from the web servers to the appropriateapplication servers based on their availability and capacity. The applicationservers write data to, and read from, a synchronously replicated database that'savailable in all the regions.

The architecture includes the following Google Cloud resources:

ComponentPurpose
Global external load balancer

The global external load balancer receives and distributes user requests to the application. The global external load balancer advertises a single anycast IP address, but the load balancer is implemented as a large number of proxies on Google Front Ends (GFEs). Client requests are directed to the GFE that's closest to the client.

Depending on your requirements, you can use a global external Application Load Balancer or a global external proxy Network Load Balancer. For more information, see Choose a load balancer.

To protect your application against threats like distributed denial-of-service (DDoS) attacks and cross-site scripting (XSS), you can use Google Cloud Armor security policies.

Regional managed instance groups (MIGs) for the web tier

The web tier of the application is deployed on Compute Engine VMs that are part of regional MIGs. These MIGs are the backends for the global load balancer.

Each MIG contains Compute Engine VMs in three different zones. Each of these VMs hosts an independent instance of the web tier of the application.

Cross-region internal load balancing layer

Internal load balancers with cross-regional backends handle the distribution of traffic from the web tier VMs in any region to the application tier VMs across all the regions.

Depending on your requirements, you can use a cross-region internal Application Load Balancer or a cross-region internal proxy Network Load Balancer. For more information, see Choose a load balancer.

Regional MIGs for the application tier

The application tier is deployed on Compute Engine VMs that are part of regional MIGs. These MIGs are the backends for the internal load balancing layer.

Each MIG contains Compute Engine VMs in three different zones. Each VM hosts an independent instance of the application tier.

Spanner multi-region instance

The application writes data to and reads from a multi-region Spanner instance. The multi-region configuration in this architecture includes the following replicas:

  • Four read-write replicas in separate zones across two regions.
  • A witness replica in a third region.
Virtual Private Cloud (VPC) network and subnets

All the resources in the architecture use a single VPC network. The VPC network has the following subnets:

  • A subnet in each region for the web server VMs.
  • A subnet in each region for the application server VMs.
  • (Not shown in the architecture diagram) A proxy-only subnet in each region for the cross-region internal load balancer.

Instead of using a single VPC network, you can create a separate VPC network in each region and connect the networks by using Network Connectivity Center.

Products used

This reference architecture uses the following Google Cloud products:

  • Compute Engine: A secure and customizable compute service that lets youcreate and run VMs on Google's infrastructure.
  • Cloud Load Balancing: A portfolio of high performance, scalable, global andregional load balancers.
  • Spanner: A highly scalable, globally consistent, relationaldatabase service.

Design considerations

This section provides guidance to help you use this reference architecture todevelop an architecture that meets your specific requirements for system design,security and compliance, reliability, cost, operational efficiency, andperformance.

Note: The guidance in this section isn't exhaustive. Depending on thespecific requirements of your application and the Google Cloud productsand features that you use, there might be additional design factors andtrade-offs that you should consider.

System design

This section provides guidance to help you to choose Google Cloud regionsfor your global deployment and to select appropriate Google Cloudservices.

Region selection

When you choose the Google Cloud regions where your applications must bedeployed, consider the following factors and requirements:

Some of these factors and requirements might involve trade-offs. Forexample, the most cost-efficient region might not have the lowestcarbon footprint. For more information, seeBest practices for Compute Engine regions selection.

Compute infrastructure

The reference architecture in this document uses Compute Engine VMs forcertain tiers of the application. Depending on the requirements of yourapplication, you can choose from other Google Cloud compute services:

  • Containers: You can runcontainerized applications inGoogle Kubernetes Engine (GKE) clusters. GKE is a container-orchestration engine thatautomates deploying, scaling, and managing containerized applications.
  • Serverless: If you prefer to focus your IT efforts on your data andapplications instead of setting up and operating infrastructure resources,then you can useserverless services likeCloud Run.

The decision of whether to use VMs, containers, or serverless services involvesa trade-off between configuration flexibility and management effort. VMs andcontainers provide more configuration flexibility, but you're responsible formanaging the resources. In a serverless architecture, you deploy workloads to apreconfigured platform that requires minimal management effort. For moreinformation about choosing appropriate compute services for your workloads inGoogle Cloud, seeHosting Applications on Google Cloud.

Storage services

The architecture shown in this document usesregional Persistent Disk volumes for the VMs. Regional Persistent Disk volumes provide synchronous replicationof data across two zones within a region. Data in Persistent Disk volumes isnot replicated across regions.

Google Cloud Hyperdisk provides better performance, flexibility, and efficiency than Persistent Disk.With Hyperdisk Balanced, you can provision IOPS and throughputseparately and dynamically, which lets you tune the volume to a wide variety ofworkloads.

For low-cost storage that's replicated across multiple locations, you can useCloud Storage regional, dual-region, or multi-region buckets.

  • Data in regional buckets is replicated synchronously across the zones in theregion.
  • Data in dual-region or multi-region buckets is stored redundantly in at leasttwo separate geographic locations. Metadata is written synchronously acrossregions, and data is replicated asynchronously. For dual-region buckets, youcan useturbo replication,which ensures that objects are replicated across region pairs, with arecovery point objective (RPO) of 15 minutes. For more information, seeData availability and durability.

To store data that's shared across multiple VMs in a region, such as across allthe VMs in the web tier or application tier, you can use aFilestore regional instance.The data that you store in a Filestore regional instance is replicatedsynchronously across three zones within the region. This replication ensureshigh availability and robustness against zone outages. You can store shared configuration files,common tools and utilities, and centralized logs in the Filestoreinstance, and mount the instance on multiple VMs. For robustness against regionoutages, you can replicate a Filestore instance to a differentregion. For more information, seeInstance replication.

If your database is Microsoft SQL Server, we recommend usingCloud SQL for SQL Server. In scenarios when Cloud SQL doesn't support yourconfiguration requirements, or if you need access to the operating system, youcan deploy aMicrosoft SQL Server failover cluster instance (FCI).In this scenario, you can use the fully managedGoogle Cloud NetApp Volumes to provide continuous availability (CA) SMB storage for the database.

When you design storage for your workloads, consider the functionalcharacteristics, resilience requirements, performance expectations, and costgoals. For more information, seeDesign an optimal storage strategy for your cloud workload.

Database services

The reference architecture in this document uses Spanner, afully managed, horizontally scalable, globally distributed, andsynchronously-replicated database. We recommend a multi-regionalSpanner configuration for mission-critical deployments thatrequire strong cross-region consistency. Spanner supportssynchronous cross-region replication without downtime for failover, maintenance,or resizing.

For information about other managed database services that you can choose frombased on your requirements, seeGoogle Cloud databases.When you choose and configure the database for a multi-regional deployment,consider your application's requirements for cross-region data consistency, andbe aware of the performance and cost trade-offs.

Network design

Choose a network design that meets your business and technical requirements. Youcan use a single VPC network or multiple VPC networks. For more information, seethe following documentation:

External load balancing options

An architecture that uses a global external load balancer, such as thearchitecture in this document, supports certain features that help you toenhance the reliability of your deployments. For example, if you use theglobal external Application Load Balancer, you can implement edge caching by usingCloud CDN.

If your application requires Transport Layer Security (TLS) to be terminated ina specific region, or if you need the ability to serve content from specificregions, you can use regional load balancers with Cloud DNS to route trafficto different regions. For information about the differences between regional andglobal load balancers, see the following documentation:

Security, privacy, and compliance

This section describes factors that you should consider when you use thisreference architecture to design and build a global topology inGoogle Cloud that meets the security, privacy, and compliance requirements of yourworkloads.

Protection against external threats

To protect your application against threats like distributed-denial-of-service(DDoS) attacks and cross-site scripting (XSS), you can use Google Cloud Armorsecurity policies. Each policy is a set of rules that specifies certainconditions that should be evaluated and actions to take when the conditions aremet. For example, a rule could specify that if the source IPaddress of the incoming traffic matches a specific IP address or CIDR range,then the traffic must be denied. You can also apply preconfigured webapplication firewall (WAF) rules. For more information, seeSecurity policy overview.

External access for VMs

In the reference architecture that this document describes, theCompute Engine VMs don't need inbound access from the internet. Don'tassignexternal IP addresses to the VMs. Google Cloud resources that have only a private, internal IPaddress can still access certain Google APIs and services by usingPrivate Service Connect or Private Google Access. For moreinformation, seePrivate access options for services.

To enable secure outbound connections from Google Cloud resources thathave only private IP addresses, like the Compute Engine VMs in thisreference architecture, you can useSecure Web Proxy orCloud NAT.

Service account privileges

For the Compute Engine VMs in the architecture, instead of using thedefault service accounts, we recommend that you create dedicated serviceaccounts and specify the resources that the service account can access. Thedefault service account has a broad range of permissions, including some thatmight not be necessary. You can tailor dedicated service accounts tohave only the essential permissions. For more information, seeLimit service account privileges.

SSH security

To enhance the security of SSH connections to the Compute Engine VMs inyour architecture, implementIdentity-Aware Proxy (IAP) andCloud OS Login API.IAP lets you control network access based on user identity andIdentity and Access Management (IAM) policies. Cloud OS Login API lets you controlLinux SSH access based on user identity and IAM policies. Formore information about managing network access, seeBest practices for controlling SSH login access.

More security considerations

When you build the architecture for your workload, consider the platform-levelsecurity best practices and recommendations that are provided in theEnterprise foundations blueprint andGoogle Cloud Well-Architected Framework: Security, privacy, and compliance.

Reliability

This section describes design factors that you should consider when you usethis reference architecture to build and operate reliable infrastructure for aglobal deployment in Google Cloud.

MIG autoscaling

When you run your application on multiple regional MIGs, the application remainsavailable during isolated zone outages or region outages. The autoscalingcapability of stateless MIGs lets you maintain application availability andperformance at predictable levels.

To control the autoscalingbehavior of your stateless MIGs, you can specify target utilization metrics,such as average CPU utilization. You can also configure schedule-basedautoscaling for stateless MIGs.Stateful MIGs can't be autoscaled. For more information, seeAutoscaling groups of instances.

MIG size limit

When you decide the size of your MIGs, consider the default and maximum limitson the number of VMs that can be created in a MIG. For more information, seeAdd and remove VMs from a MIG.

VM autohealing

Sometimes the VMs that host your application might be running and available, butthere might be issues with the application itself. The application might freeze,crash, or not have sufficient memory. To verify whether an application isresponding as expected, you can configure application-based health checks aspart of the autohealing policy of your MIGs. If the application on a particularVM isn't responding, the MIG autoheals (repairs) the VM. For more informationabout configuring autohealing, seeAbout repairing VMs for high availability.

VM placement

In the architecture that this document describes, the application tier and webtier run on Compute Engine VMs that are distributed across multiplezones. This distribution ensures that your application is robust against zoneoutages.

To improve the robustness of the architecture, you can create aspread placement policy and apply it to the MIG template. When the MIG creates VMs, it places the VMswithin each zone on different physical servers (calledhosts), so your VMs arerobust against failures of individual hosts. For more information, seeCreate and apply spread placement policies to VMs.

VM capacity planning

To make sure that capacity for Compute Engine VMs is available when VMsneed to be provisioned, you can createreservations. A reservation providesassured capacity in a specific zone for a specified number of VMs of a machinetype that you choose. A reservation can be specific to a project, or sharedacross multiple projects. For more information about reservations, seeChoose a reservation type.

Stateful storage

A best practice in application design is to avoid the need for stateful localdisks. But if the requirement exists, you can configure your persistent disks tobe stateful to ensure that the data is preserved when the VMs are repaired orrecreated. However, we recommend that you keep the boot disks stateless, so thatyou can update them to the latest images with new versions and securitypatches. For more information, seeConfiguring stateful persistent disks in MIGs.

Data durability

You can useBackup and DR to create, store, and manage backups of the Compute Engine VMs.Backup and DR stores backup data in its original, application-readableformat. When required, you can restore your workloads to production by directlyusing data from long-term backup storage and avoid the need to prepare or move data.

Compute Engine provides the following options to help you to ensure thedurability of data that's stored in Persistent Disk volumes:

Database reliability

Data that's stored in a multi-region Spanner instance isreplicated synchronously across multiple regions. The Spannerconfiguration that's shown in the preceding architecture diagram includes thefollowingreplicas:

  • Four read-write replicas in separate zones across two regions.
  • A witness replica in a third region.

A write operation to a multi-region Spanner instance isacknowledged after at least three replicas—in separate zones across tworegions—have committed the operation. If a zone or region failure occurs,Spanner has access to all of the data, including data from thelatest write operations, and it continues to serve read and write requests.

Spanner usesdisaggregated storage where the compute and storage resources are decoupled. You don't have tomove data when you addcompute capacity for HA or scaling. The new compute resources get data when they need it from theclosestColossus node. This makes failover and scaling faster and less risky.

Spanner providesexternal consistency, which is a stricterproperty than serializability for transaction-processing systems. For moreinformation, see the following:

More reliability considerations

When you build the cloud architecture for your workload, review thereliability-related best practices and recommendations that are provided in thefollowing documentation:

Cost optimization

This section provides guidance to optimize the cost of setting up and operatinga global Google Cloud topology that you build by using this referencearchitecture.

VM machine types

To help you optimize the resource utilization of your VM instances,Compute Engine providesmachine type recommendations.Use the recommendations to choose machine types that match your workload'scompute requirements. For workloads with predictable resource requirements, youcan customize the machine type to your needs and save money by usingcustom machine types.

VM provisioning model

If your application is fault tolerant, thenSpot VMs can help to reduce your Compute Engine costs for the VMs in theapplication and web tiers. The cost of Spot VMs is significantly lowerthan regular VMs. However, Compute Engine might preemptively stop ordelete Spot VMs to reclaim capacity.

Spot VMs are suitable forbatch jobs that can tolerate preemption and don't have high availabilityrequirements. Spot VMs offer the same machine types, options, andperformance as regular VMs. However, when the resource capacity in a zone islimited, MIGs might not be able to scale out (that is, create VMs) automaticallyto the specified target size until the required capacity becomes availableagain.

VM resource utilization

Theautoscaling capability of stateless MIGs enables your application to handle increases intraffic gracefully, and it helps you to reduce cost when the need for resourcesis low.Stateful MIGs can't be autoscaled.

Database cost

Spanner helps ensure that your database costs are predictable.The compute capacity that you specify (number of nodes or processing units)determines the storage capacity. The read and write throughputs scale linearlywith compute capacity. You pay for only what you use. When you need to aligncosts with the needs of your workload, you can adjust the size of yourSpanner instance.

Third-party licensing

When you migrate third-party workloads to Google Cloud, you might be ableto reduce cost by bringing your own licenses (BYOL). For example, to deployMicrosoft Windows Server VMs, instead of using apremium image that incurs additional cost for the third-party license, you can create and useacustom Windows BYOL image.You then pay only for the VM infrastructure that you use on Google Cloud.This strategy helps you continue to realize value from your existing investmentsin third-party licenses.If you decide to use the BYOL approach, then the following recommendations mighthelp to reduce cost:

  • Provision the required number of compute CPU cores independently ofmemory by usingcustom machine types.By doing this, you limit the third-party licensing cost to the number ofCPU cores that you need.
  • Reduce the number of vCPUs per core from 2 to 1 by disablingsimultaneous multithreading (SMT).

If you deploy a third-party database like Microsoft SQL Server onCompute Engine VMs, then you must consider the license costs for thethird-party software. When you use a managed database service likeCloud SQL, the database license costs are included in the charges forthe service.

More cost considerations

When you build the architecture for your workload, also consider the generalbest practices and recommendations that are provided inGoogle Cloud Well-Architected Framework: Cost optimization.

Operational efficiency

This section describes the factors that you should consider when you use thisreference architecture to design and build a global Google Cloud topologythat you can operate efficiently.

VM configuration updates

To update the configuration of the VMs in a MIG (such as the machine type orboot-disk image), you create a new instance template with the requiredconfiguration and then apply the new template to the MIG. The MIG updates theVMs by using the update method that you choose: automatic or selective. Choosean appropriate method based on your requirements for availability andoperational efficiency. For more information about these MIG update methods, seeApply new VM configurations in a MIG.

VM images

For your VMs, instead of using Google-provided publicimages, we recommend that you create and usecustom OS images that contain theconfigurations and software that your applications require. You can group yourcustom images into a custom image family. An image family always points to themost recent image in that family, so your instance templates and scripts can usethat image without you having to update references to a specific imageversion. You must regularly update your custom images to include the securityupdates and patches that are provided by the OS vendor.

Deterministic instance templates

If the instance templates that you use for your MIGs include startup scripts toinstall third-party software, make sure that the scripts explicitly specifysoftware-installation parameters such as the software version. Otherwise, whenthe MIG creates the VMs, the software that's installed on the VMs might not beconsistent. For example, if your instance template includes a startup script toinstall Apache HTTP Server 2.0 (theapache2 package), then make sure that thescript specifies the exactapache2 version that should be installed, such asversion2.4.53. For more information, seeDeterministic instance templates.

Migration to Spanner

You can migrate your data to Spanner from other databases likeMySQL, SQL Server, and Oracle Database. The migration process depends on factorslike the source database, the size of your data, downtime constraints, andcomplexity of the application code. To help you plan and implement the migrationto Spanner efficiently, we provide a range of Google Cloudand third-party tools. For more information, seeMigration overview.

Database administration

With Spanner, you don't need to configure or monitor replicationor failover. Synchronous replication and automatic failover are built-in. Yourapplication experiences zero downtime for database maintenance and failover. Tofurther reduce operational complexity, you can configureautoscaling.With autoscaling enabled, you don't need to monitor and scale the instance sizemanually.

More operational considerations

When you build the architecture for your workload, consider the general bestpractices and recommendations for operational efficiency that are described inGoogle Cloud Well-Architected Framework: Operational excellence.

Performance optimization

This section describes the factors that you should consider when you use thisreference architecture to design and build a global topology inGoogle Cloud that meets the performance requirements of your workloads.

Network performance

For workloads that need low inter-VM network latency within the application andweb tiers, you can create a compact placement policy and apply it to the MIGtemplate that's used for those tiers. When the MIG creates VMs, it places theVMs on physical servers that are close to each other. While a compact placementpolicy helps improve inter-VM network performance, a spread placement policy canhelp improve VM availability as described earlier. To achieve an optimal balancebetween network performance and availability, when you create a compactplacement policy, you can specify how far apart the VMs must be placed. For moreinformation, seePlacement policies overview.

Compute Engine has a per-VM limit for egressnetwork bandwidth.This limit depends on the VM's machine type and whether traffic is routedthrough the same VPC network as the source VM. For VMs with certain machinetypes, to improve network performance, you can get a higher maximum egressbandwidth by enablingTier_1 networking.

Compute performance

Compute Engine offers a wide range of predefined and customizablemachine types for the workloads that you run on VMs. Choose an appropriatemachine type based on your performance requirements. For more information, seeMachine families resource and comparison guide.

VM multithreading

Each virtual CPU (vCPU) that you allocate to a Compute Engine VM isimplemented as a single hardware multithread. By default, two vCPUs share aphysical CPU core. For applications that involve highly parallel operations or that performfloating point calculations (such as genetic sequence analysis, and financialrisk modeling), you can improve performance by reducing the number of threadsthat run on each physical CPU core. For more information, seeSet the number of threads per core.

VM multithreading might have licensing implications for some third-partysoftware, like databases. For more information, read the licensing documentationfor the third-party software.

Network Service Tiers

Network Service Tiers lets you optimize the network cost and performance of your workloads. You canchoose Premium Tier or Standard Tier. Premium Tier delivers traffic on Google'sglobal backbone to achieve minimal packet loss and low latency. Standard Tierdelivers traffic using peering, internet service providers (ISP), or transitnetworks at an edge point of presence (PoP) that's closest to the region whereyour Google Cloud workload runs. To optimize performance, we recommendusing Premium Tier. To optimize cost, we recommend using Standard Tier.

The architecture in this document uses a global external load balancer with anexternal IP address and backends in multiple regions. This architecture requiresyou to use Premium Tier, which uses Google's highly reliable global backbone tohelp you achieve minimal packet loss and latency.

If you use regional external load balancers and route traffic to regions byusing Cloud DNS, then you can choose Premium Tier or Standard Tierdepending on your requirements. The pricing for Standard Tier is lower thanPremium Tier. Standard Tier is suitable for traffic that isn't sensitive topacket loss and that doesn't have low latency requirements.

Spanner performance

When you provision a Spanner instance, you specify the computecapacity of the instance in terms of the number of nodes or processing units.Monitor the resource utilization of your Spanner instance, andscale the capacity based on the expected load and your application's performancerequirements. You can scale the capacity of a Spanner instancemanually or automatically. For more information, seeAutoscaling overview.

With a multi-region configuration, Spanner replicates datasynchronously across multiple regions. This replication enables low-latencyread operations from multiple locations. The trade-off is higher latency forwrite operations, because the quorum replicas are spread across multipleregions. To minimize the latency for read-write transactions in a multi-regionconfiguration, Spanner usesleader-aware routing (enabled by default).

For recommendations to optimize the performance of your Spannerinstance and databases, see the following documentation:

Caching

If your application serves static website assets and if your architectureincludes a global external Application Load Balancer,then you can use Cloud CDN to cache regularly accessed static contentcloser to your users. Cloud CDN can help to improve performance foryour users, reduce your infrastructure resource usage in the backend, and reduceyour network delivery costs. For more information, seeFaster web performance and improved web protection for load balancing.

More performance considerations

When you build the architecture for your workload, consider the general bestpractices and recommendations that are provided inGoogle Cloud Well-Architected Framework: Performance optimization.

What's next

Contributors

Authors:

Other contributors:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-12 UTC.