Best practices for Compute Engine regions selection

This article describes criteria to consider when choosing which Google Cloudregions to use for your Compute Engine resources, adecision that is typically made by cloud architects or engineering management.This document primarily focuses on the latency aspect of the selection processand is intended for apps accessed by consumers, such as mobile or webapps or games, but many of the concepts can apply to other usecases.

Google offersmultiple regions worldwide to deploy your Compute Engine resources. Several factors play a role inchoosing your regions:

Region-specific restrictions
User latency by region
Latency requirements of your app
Amount of control over latency
Balance between low latency and simplicity

Terminology

region: An independent geographic area where you run your resources. Each regionconsists of zones, typically at least three zones.
zone: A deployment area for Google Cloud resources within a region. Putting resources in different zones in a region reduces the risk of an infrastructure outage affecting all resources simultaneously.
Compute Engine resources: Resources in Compute Engine, such asVirtual machine instances,are deployed in a zone within a region. Other products, such asGoogle Kubernetes Engine andDataflow,use Compute Engine resources and therefore, can be deployed in thesame regions or zone.
round-trip time (RTT): The time it takes to send an IP packet and to receive the acknowledgment.

When to choose your Compute Engine regions

Early in the architecture phase of an app, decide how many and whichCompute Engine regions to use. Your choice might affect yourapp, for example:

Architecture of your app might change if you sync some databetween copies because the same users could connect through differentregions at different times.
Price differs by region.
Process to move an app and its data between regions is cumbersome, andsometimes costly, so should be avoided once the app is live.

Note: The availability of regions and the user profile of apps might changeafter launch, and this document can still provide guidance.

Factors to consider when selecting regions

It's common for people to deploy in a region where they're located, but theyfail to consider if this is the best user experience. Suppose that you'relocated in Europe with a global user base and want to deploy in a single region.Most people would consider deploying in a region in Europe, but it is usuallythe best choice to have this app hosted in one of the US regions–because theUS is the most connected to other regions.

Multiple factors affect where you decide to deployyour app.

Latency

The main factor to consider is the latency your user experiences. However, thisis a complex problem because user latency is affected by multiple aspects, suchas caching and load-balancing mechanisms.

In enterprise use cases, latency to on-premises systems or latency for a certainsubset of users or partners is more critical. For example, choosing the closestregion to your developers or on-premises database services interconnected withGoogle Cloud might be the deciding factor.

Pricing

Google Cloud resource costs differ by region. The following resources areavailable to estimate the price:

If you decide to deploy in multiple regions, be aware that there aredata transfer charges for data synced between regions.

Colocation with other Google Cloud services

Colocate your Compute Engine resources with other Google Cloudservices, wherever possible. While most latency-sensitive services are availablein every region, some services are available only inspecific locations.

Machine-type availability

Not all CPU platforms and machine types are available in every region. Theavailability of specific CPU platforms or specific instance types differs byregion and even zone. If you want to deploy resources using certain machinetypes, find out aboutzonal availability of these resources.

Resource quotas

Your ability to deploy Compute Engine resources is limited by regionalresource quotas,so make sure that you request sufficient quota for the regions you plan to deployin. If you are planning an especially large deployment, work with the sales teamearly to discuss your region selection choices to ensure that you havesufficient quota capacity.

Carbon-free energy percentage

To power each Google Cloud region, Google uses electricity from the grid wherethe region is located. This electricity generates more or less carbon emissions,depending on the type of power plants generating electricity for that grid and whenGoogle consumes it. Google recently set the goal that by 2030, we'll have carbon-freeelectricity powering your applications in the time and the place that you needthem—24 hours a day, in every Google Cloud region.

Until that goal is achieved, each Google Cloud region will be supplied by a mixof carbon-based and carbon-free energy sources every hour. We call this metric ourcarbon-free energy percentage (CFE%) andwe publish CFE% for Google Cloud regions. For new applications on Google Cloud, you can use this table to begin incorporating carbon impact into your architecture decisions. Choosing a region with a higher CFE % means that, on average, your application will be powered with carbon-free energy a higher percentage of the hours that it runs, reducing the gross carbon emissions of that application.

Evaluate latency requirements

Latency is often the key consideration for your region selection because highuser latency can lead to an inferior user experience. You can affect someaspects of latency, but some are outside of your control.

When optimizing for latency, many system architects consider only networklatency or distance between the user's ISP and the virtual machine instance.However, this is only one of many factors affecting user latency, as you cansee in the following diagram.

Evaluate latency in compute engine region selection

As an app architect, you can optimize the region selection andapp latency, but have no control over the users' last mile and latencyto the closest Google edgePoints of Presence (POP).

Region selection can only affect the latency to the Compute Engineregion and not the entirety of the latency. Depending on the use case, thismight be only a small part of overall latency. For example, if your users areprimarily using cellular networks, it might not be valuable to try to optimizeyour regions, as this hardly affects total user latency.

Last mile latency

The latency of this segment differs depending on the technology used to accessthe internet. For example, the typical latency to reach an ISP is 1-10 mson modern networks. Conversely, typical latencies on a 3G cellular networkare 100-500 ms. The latency range for DSL and cable providers is roughly10-60 ms.

Google frontend and edge POP latency

Depending on your deployment model, the latency to Google's network edge is alsoimportant. This is where global load-balancing products terminate TCP and SSLsessions and from which Cloud CDN delivers cached results. Based on thecontent served, many round-trips might already end here because only part of thedata needs to be retrieved the whole way. This latency might be significantlyhigher if you use thestandard network service tier.

Compute Engine region latency

The user request enters Google's network at the edge POP. TheCompute Engine region is where Google Cloud resources handlingrequests are located. This segment is the latency between the edge POP andCompute Engine region, and sits wholly within Google's global network.

App latency

This is the latency from the app responding to requests, including thetime the app needs in order to process the request.

Different apps have different latency requirements. Depending on theapp, users are more forgiving of latency issues. Apps thatinteract asynchronously or mobile apps with a high latency threshold—100milliseconds or more—can be deployed in a single region without degrading theuser experience. However, for apps such as real-time games, a fewmilliseconds of latency can have a greater effect on user experience. Deploythese types of apps in multiple regions close to the users.

Global deployment patterns

This section explains how various deployment models affect latency factors.

Single region deployment

The following image illustrates a single region deployment.

Latency of single frontend deployment

Even if your app serves a global user base, in many cases, asingle region is still the best choice. The lower latency benefits might notoutweigh the added complexity of multi-region deployment. Even with a singleregion deployment, you can still use optimizations, such as Cloud CDNand global load balancing, to reduce user latency. You can choose to use asecond region for backup and disaster recovery reasons, but this does not affectthe app's serving path and therefore, won't affect user latency.

Distributed frontend in multiple regions and backend in a single region

The following diagram shows a deployment model where you distribute the frontendacross multiple regions but limit the backend to a single region. Thismodelgives you the benefit of lower latency to the frontend servers while notnot having to sync data across multiple regions.

Latency of distributed frontend deployment

This deployment model provides low user latency in scenarios where the averageuser request involves no data requests or involves just a few data requeststo the central backend before the app can produce a result. An example is an appthat deploys an intelligent caching layer on the frontend or that handles datawrites asynchronously. An app that makes many requests that require a fullroundtrip to the backend may not benefit from this model.

Distributed frontend and backend in multiple regions

A deployment model where you distribute the frontend and backend in multipleregions lets you minimize user latency because the app can fully answerany request locally. However, this model comes with added complexity because all dataneeds to be stored and accessible locally. To answer all user requests, dataneeds to be fully replicated across all regions.

Latency of distributed multi deployment

Spanner—the globally consistent managed database offering—has athree-continent multi-regional option, where, in addition to read-write replicas in the US, two read replicasare situated in Europe and Asia. This option provides low-latency read access tothe data to compute instances situated in US, Europe, or Asia. If your serviceis targeting the US, a multi-regional option with replication within the US alsoexists.

If you decide to run your own database service on Compute Engine, youreplicate the data yourself. This replication is a significant undertakingbecause keeping data consistently synced globally is difficult to do. It iseasier to manage if the database gets written to only one region byasynchronous writes and the other regions host read-only replicas of thedatabase.

Replicating databases across regions is difficult, and we recommend engaginga strong partner with experience in this area, such asDatastax for Cassandra replication.

Multiple parallel apps

Depending on the nature of your app, with a variation of the previousapproach, you can preserve the low user latency while reducing the need forconstant data replication. As illustrated in the following image, there aremultiple parallel apps, all consisting of a frontend and backend, and users aredirected to the correct app. Only a small fraction of data is shared between thesites.

Latency of parallel apps

For example, when running a retail business you might serve users in differentregions through different country domains and run parallel sites in all thoseregions, only syncing product and user data when necessary. Local sites maintaintheir local stock availability and users connect to a locally hosted site byselecting a country domain. When a user visits a different country domain, theyare redirected to the correct domain.

Another example is in real-time games. You might only have a global lobbyservice where users choose a game room or world close to their location andthose rooms or worlds do not share data with each other.

A third example is offering Software-as-a-Service (SaaS) in different regions,where data location is selected upon account creation, either based on userlocation or their choice. After they log in, the user is redirected to alocation specific subdomain and uses the service regionally.

Optimize latency between users and regions

Regardless of your deployment model, you can combine optimization methods toreduce the visible latency to the end user. Some of these methods areGoogle Cloud features, while others require you to change your app.

Use Premium Tier networking

Google offers premium (default) and standardNetwork Service Tiers.Standard Tier traffic is delivered over transit ISPs from Google Cloudregions, while Premium Tier offers lower latency by delivering the trafficthrough Google's global private network. Premium Tier networking reducesuser latency and should be used for all parts of the app in theserving path. Premium Tier networking is also necessary to use Google's globalload-balancing products.

Use Cloud Load Balancing and Cloud CDN

Cloud Load Balancing, such as HTTP(S) load balancing, TCP, and SSL proxyload balancing, let you automatically redirect users to the closest region wherethere are backends with available capacity.

Even if your app is only in a single region,using Cloud Load Balancing still provides lower user latency because TCP and SSL sessions are terminated at the network edge. Easilyterminate user traffic with HTTP/2 and Quick UDP Internet Connections (QUIC).You can also integrate Cloud CDN with HTTP(S) load balancing to deliverstatic assets directly from the network edge, further reducing user latency.

Cache locally

When your frontend locations are different from your backend locations, makesure to cache answers from backend services whenever possible. When the frontendand backend are in the same region, app latency is reduced becausetime-consuming queries are also reduced. Memorystore for Redis is afully managed in-memory data store you can use.

Optimize your app client or web frontend

You can use techniques on the client side, either a mobile app orthe web frontend, to optimize user latency. For example, preload some assetsor cache data within the app.

You can also optimize the way your app fetches information by reducingthe number of requests and retrieving information in parallel, wheneverpossible.

Measure user latency

Once you establish a baseline of your latency requirements, look at your userbase to decide the best placement of your Google Cloud resources.Depending on whether this is a new or existing app, there are differentstrategies to employ.

Use the following strategies to measure latency to partners that you access duringapp serving or to measure latency to your on-premises network thatmight be interconnected to your Google Cloud project using Cloud VPN orDedicated Interconnect.

Estimate latency for new workloads

If you don't have an existing app with a similar user base to your new app,estimate latency from various Google Cloud regions based on rough locationdistribution of your expected user base.

Estimate 1 ms ofround-trip latency for every 100 km traveled. Because networks do not follow anideal path from source to destination, you can usually guess that actualdistance is around 1.5 to 2 times the distance measured on a map. Of course, insome less densely populated regions, networks might follow an even less idealpath. The latency added through active equipment within ISP networks is usuallynegligible when looking at cross-regional distances.

These numbers can help you estimate latency to edge POP and Cloud CDNnodes, as well as Compute Engine regions around the globe as listed onthenetwork map.

Measure latency to existing users

If you already have an existing app with a similar user base, there areseveral tools that you can use to better estimate latencies.

Representative users: If you have users or partners, that representa cross-section of your geographical distributions and that are willing towork with you, or employees in those countries, ask them to measure thelatency to various Google Cloud regions. Third-party websites such asGoogle Cloud ping can help you get some measurements.
Access logs: If you have an active app hosted outside ofGoogle Cloud, use data from the access logs to get a roughcross-section of users. Your logs might provide country or city information,which also lets you estimate latencies.
IP address: If you have access to your users' IP addresses, createscripts to test reachability and latencies from various Google Cloudregions. If their firewall blocks your probes, try to randomize the last IPoctet to get a response from another device with similar latency to yourapp.
Latency information from Google Cloud: If you have an existingapp in Google Cloud, there are several ways to collect latencyinformation.
- User-defined request headers**:Activate headers for customers' country, subregion, and cityinformation, as well as estimated RTT between the load balancer and the client.
- Cloud Monitoring metrics for HTTP(S) load balancing:Include frontend RTT and backend latencies.
- VPC Flow Logs:You get the TCP RTT between both ends of a connection as part of themetrics provided.

Global connectivity

When estimating latency, keep thetopology of Google's global network in mind.

POPs: Where user traffic enters the network.
Cloud CDN nodes: Where traffic is cached.
Regions: Where your resources can be located.
Connectivity: Between the POPs and regions.

Find a list of locations where Google interconnects with other ISPs inPeeringDB.

Make sure to take interregional topology into consideration when deciding whichregions to deploy in. For example, if you want to deploy an app with aglobal user base in a single region, it is usually best to have this apphosted in one of the US regions–because the US is connected to most other regions.Although there is direct connectivity between many continents, there are caseswhere it is missing, for instance, between Europe and Asia, so traffic betweenEurope and Asia flows through the US.

If your app is hosted across multiple regions and you need tosynchronize data, be aware of latency between those regions. While this latencycan change over time, it is usually stable. Either measure latency yourself bybringing up test instances in all potential regions or usethird-party websites to get an idea of current latencies between regions.

Put it all together

Now that you have considered latency requirements, potential deployment models,and the geographic distribution of your user base, you understand how thesefactors affect latency to certain regions. It is time to decide which regions tolaunch your app in.

Although there isn't a right way to weigh the different factors, the followingstep-by-step methodology might help you decide:

See if there are non-latency related factors that block you fromdeploying in specific regions, such as price or colocation. Remove themfrom your list of regions.
Choose a deployment model based on the latency requirements and thegeneral architecture of the app. For most mobile and othernon-latency critical apps, a single region deployment withCloud CDN delivery of cacheable content and SSL termination at theedge might be the optimal choice.
Based on your deployment model, choose regions based on the geographicdistribution of your user base and your latency measurements:
- For a single region deployment:
  - If you need low-latency access to your corporatepremises, deploy in the region closest to this location.
  - If your users are primarily from one country or region,deploy in a region closest to your representative users.
  - For a global user base, deploy in a region in the US.
- For a multi-region deployment:
  - Choose regions close to your users based on theirgeographic distribution and the app's latency requirement.Depending on your app, optimize for a specific medianlatency or make sure that 95-99% of users are served with aspecific target latency. Users in certain geographical locationsoften have a higher tolerance for latency because of theirinfrastructure limitations.
If user latency is similar in multiple regions, pricing might be thedeciding factor.

When selecting Compute Engine regions, latency is one of the biggestfactors to consider. Evaluate and measure latency requirements to deliver aquality user experience, and repeat the process if the geographic distributionof your user base changes.

What's next

Review Compute Engineregions and zones.
Learn aboutOptimizing application latency with load balancing.
Read theGoogle Cloud for data center professionals guide.
Watch theCloud performance atlas video series.
For a more complete view on how to optimize user latency, see theHigh-performance browser networking site.
Explore reference architectures, diagrams, and best practices about Google Cloud.Take a look at ourCloud Architecture Center.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Best practices for Compute Engine regions selection Stay organized with collections Save and categorize content based on your preferences.