Optimize application latency with load balancing Stay organized with collections Save and categorize content based on your preferences.
Options for load balancing
Depending on the type of traffic sent to your application, you have severaloptions for external load balancing. The following table summarizes your options:
| Option | Description | Traffic flow | Scope |
|---|---|---|---|
| External Application Load Balancer | Supports HTTP(S) traffic and advanced features, such as URL mapping and SSL offloading Use anexternal proxy Network Load Balancer for non-HTTP traffic on specific ports. | The TCP or SSL (TLS) session is terminated at Google Front Ends (GFEs), at Google's network edge, and traffic is proxied to the backends. | Global |
| External passthrough Network Load Balancer | Allows TCP/UDP traffic using any port to pass through the load balancer. | Delivered using Google'sMaglev technology to distribute the traffic to the backends. | Regional |
Because the internal load balancers and Cloud Service Mesh don'tsupport user-facing traffic, they are out of scope for this article.
This article's measurements use thePremium Tier inNetwork Service Tiers because global load balancingrequires this service tier.
Measuring latency
When accessing a website that is hosted inus-central1, a user in Germany usesthe following methods to test latency:
- Ping: While ICMP ping is a common way to measure server reachability, ICMPping doesn't measure end-user latency. For more information, seeAdditional latency effects of an external Application Load Balancer.
- Curl: Curl measures Time To First Byte (TTFB). Issue a
curlcommandrepeatedly to the server.
When comparing results, be aware that latency on fiber links isconstrained by the distance and the speed of light in fiber,which is roughly 200,000 km per second (or 124,724 miles per second).
The distance between Frankfurt, Germany and Council Bluffs, Iowa (theus-central1 region), is roughly 7,500 km. With straight fiber between thelocations, round-trip latency is the following:
7,500 km * 2 / 200,000 km/s * 1000 ms/s = 75 milliseconds (ms)
Fiber optic cable doesn't follow a straight path between the user andthe data center. Light on the fiber cable passes through active and passiveequipment along its path. An observed latency of approximately 1.5 times theideal, or 112.5 ms, indicates a near-ideal configuration.
Comparing latency
This section compares load balancing in the following configurations:
- No load balancing
- External passthrough Network Load Balancer
- External Application Load Balancer or External proxy Network Load Balancer
In this scenario, the application consists of a regional managed instance groupof HTTP web servers. Because the application relies on low-latency calls to acentral database, the web servers must be hosted in one location. Theapplication is deployed in theus-central1 region, and users are distributedacross the globe. The latency that the user in Germany observes in this scenarioillustrates what users worldwide might experience.
No load balancing
When a user makes an HTTP request, unless load balancing is configured, thetraffic flows directly from the user’s network to the virtual machine (VM)hosted on Compute Engine. For Premium Tier, traffic then enters Google'snetwork at an edge point of presence (PoP) close to the user's location. ForStandard Tier, the user trafficenters Google's network at a PoP close to the destination region.For more information, see theNetwork Service Tiersdocumentation.
The following table shows the results when the user in Germany tested latencyof a system with no load balancing:
| Method | Result | Minimum latency |
|---|---|---|
| Ping the VM IP address (Response is directly from web server) | ping -c 5 compute-engine-vm PING compute-engine-vm (xxx.xxx.xxx.xxx) 56(84) bytes of data. 64 bytes from compute-engine-vm (xxx.xxx.xxx.xxx): icmp_seq=1 ttl=56 time=111 ms 64 bytes from compute-engine-vm (xxx.xxx.xxx.xxx): icmp_seq=2 ttl=56 time=110 ms [...] --- compute-engine-vm ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4004ms rtt min/avg/max/mdev = 110.818/110.944/111.265/0.451 ms | 110 ms |
| TTFB | for ((i=0; i < 500; i++)); do curl -w / "%{time_starttransfer}\n" -o /dev/null -s compute-engine-vm; done0.230 0.230 0.231 0.231 0.230 [...] 0.232 0.231 0.231 | 230 ms |
The TTFB latency is stable, as shown in the following graph of the first500 requests:

When pinging the VM IP address, the response comes directly from the web server.The response time from the web server is minimal compared to the network latency(TTFB). This is because a new TCP connection is opened for every HTTPrequest. An initial three-way handshake is needed before the HTTP responseis sent, as shown in the following diagram. Therefore, the observed latency isclose to double the ping latency.
External passthrough Network Load Balancer
With external passthrough Network Load Balancers, user requests still enter the Google networkat the closest edge PoP (in Premium Tier). In the region where the project's VMsare located, traffic flows first through an external passthrough Network Load Balancer. It is thenforwarded without changes to the target backend VM. The external passthrough Network Load Balancerdistributes traffic based on a stable hashing algorithm. The algorithm uses acombination of source and destination port, IP address, and protocol. The VMslisten to the load balancer IP and accept the traffic unaltered.
The following table shows the results when the user in Germany tested latencyfor the network-load-balancing option.
| Method | Result | Minimum latency |
|---|---|---|
| Ping the external passthrough Network Load Balancer | ping -c 5 net-lb PING net-lb (xxx.xxx.xxx.xxx) 56(84) bytes of data. 64 bytes from net-lb (xxx.xxx.xxx.xxx): icmp_seq=1 ttl=44 time=110 ms 64 bytes from net-lb (xxx.xxx.xxx.xxx): icmp_seq=2 ttl=44 time=110 ms [...] 64 bytes from net-lb (xxx.xxx.xxx.xxx): icmp_seq=5 ttl=44 time=110 ms --- net-lb ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4007ms rtt min/avg/max/mdev = 110.658/110.705/110.756/0.299 ms | 110 ms |
| TTFB | for ((i=0; i < 500; i++)); do curl -w / "%{time_starttransfer}\n" -o /dev/null -s net-lb0.231 0.232 0.230 0.230 0.232 [...] 0.232 0.231 | 230 ms |
Because load balancing takes place within a region and traffic is onlyforwarded, there is no significant latency impact compared with having no loadbalancer.
External load balancing
With external Application Load Balancers, GFEs proxy traffic. These GFEs are at the edgeof Google's global network. The GFE terminates the TCP session and connects to abackend in the closest region that can serve the traffic.
The following table shows the results when the user in Germany tested latencyfor the HTTP-load-balancing option.
| Method | Result | Minimum latency |
|---|---|---|
| Ping the external Application Load Balancer | ping -c 5 http-lb PING http-lb (xxx.xxx.xxx.xxx) 56(84) bytes of data. 64 bytes from http-lb (xxx.xxx.xxx.xxx): icmp_seq=1 ttl=56 time=1.22 ms 64 bytes from http-lb (xxx.xxx.xxx.xxx): icmp_seq=2 ttl=56 time=1.20 ms 64 bytes from http-lb (xxx.xxx.xxx.xxx): icmp_seq=3 ttl=56 time=1.16 ms 64 bytes from http-lb (xxx.xxx.xxx.xxx): icmp_seq=4 ttl=56 time=1.17 ms 64 bytes from http-lb (xxx.xxx.xxx.xxx): icmp_seq=5 ttl=56 time=1.20 ms --- http-lb ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4005ms rtt min/avg/max/mdev = 1.163/1.195/1.229/0.039 ms | 1 ms |
| TTFB | for ((i=0; i < 500; i++)); do curl -w / "%{time_starttransfer}\n" -o /dev/null -s http-lb; done0.309 0.230 0.229 0.233 0.230 [...] 0.123 0.124 0.126 | 123 ms |
The results for the external Application Load Balancer are significantly different. Whenpinging the external Application Load Balancer, the round-trip latency is slightly over1 ms. This result represents latency to the closest GFE, which is located in thesame city as the user. This result doesn't reflect the actual latency that theuser experiences when trying to access the application that is hosted in theus-central1 region. Experiments using protocols (ICMP) that differ from yourapplication communication protocol (HTTP) can be misleading.
When measuring TTFB, the initial requests show similar responselatency. Some requests achieve the lower minimum latency of 123 ms, as shown inthe following graph:

Two round trips between the client and the VM take more than 123 mseven with straight fiber. The latency is lower because GFEs proxy the traffic.GFEs maintain persistent connections to the backend VMs. Therefore, only thefirst request from a specific GFE to a specific backend requires a three-wayhandshake.
Each location has multiple GFEs. The latency graph shows multiple, fluctuatingspikes the first time that traffic reaches each GFE-backend pair. The GFEmust then establish a new connection to that backend. These spikesreflect differing request hashes. Subsequent requests show lower latency.
These scenarios demonstrate the reduced latency that users can experience in aproduction environment. The following table summarizes the results:
| Option | Ping | TTFB |
|---|---|---|
| No load balancing | 110 ms to the web server | 230 ms |
| External passthrough Network Load Balancer | 110 ms to the in-region external passthrough Network Load Balancer | 230 ms |
| External Application Load Balancer | 1 ms to the closest GFE | 123 ms |
When a healthy application is serving users in a specific region,GFEs in that region maintain a persistent connection open to all servingbackends. Because of this, users in that region notice reduced latency on theirfirst HTTP request if users are far from the application backend. If users arenear the application backend, the users don't notice latency improvement.
For subsequent requests, such as clicking a page link, there is no latencyimprovement because modern browsers maintain a persistent connection to theservice. This differs from acurl command issued from the command line.
Additional latency effects of the external Application Load Balancer
Additional observable effects with the external Application Load Balancer depend on trafficpatterns.
The external Application Load Balancer has less latency for complex assets than theexternal passthrough Network Load Balancer because fewer round trips are needed before aresponse completes. For example, when the user in Germany measures latencyover the same connection by repeatedly downloading a 10 MB file, the averagelatency for the external passthrough Network Load Balancer is 1911 ms. With theexternal Application Load Balancer, the average latency is 1341 ms. This savesapproximately 5 round trips per request. Persistent connections between GFEsand serving backends reduce the effects ofTCP SlowStart.
The external Application Load Balancer significantly reduces the additional latency for aTLS handshake (typically 1–2 extra roundtrips). This is because theexternal Application Load Balancer uses SSL offloading, and only the latency to theedge PoP is relevant. For the user in Germany, the minimum observed latency is201 ms using the external Application Load Balancer, versus 525 ms using HTTP(S) throughthe external passthrough Network Load Balancer.
The external Application Load Balancer allows an automatic upgrade of theuser-facing session toHTTP/2.HTTP/2 can reduce the number of packets needed, by using improvements inbinary protocol, header compression, and connection multiplexing. Theseimprovements can reduce the observed latency even more than that observed byswitching to the external Application Load Balancer. HTTP/2 is supported with currentbrowsers that use SSL/TLS. For the user in Germany, minimum latency decreasedfurther from 201 ms to 145 ms when using HTTP/2 instead of HTTPS.
Optimizing external Application Load Balancers
You can optimize latency for your application by using the external Application Load Balanceras follows:
If some of the traffic you serve is cacheable, you can integrate withCloud CDN. Cloud CDN reduces latency by servingassets directly at Google's network edge. Cloud CDN also uses the TCPand HTTP optimizations from HTTP/2 mentioned in theAdditional latencyeffects of the external Application Load Balancer section.
You can use any CDN partner with Google Cloud. By using one of Google'sCDN Interconnectpartners,you benefit from discounted data transfer costs.
If content is static, you can reduce the load on the web servers by servingcontent directly from Cloud Storage through the external Application Load Balancer.This option combines with the Cloud CDN.
Deploying your web servers in multiple regions close to your users can reducelatency because the load balancer automatically directs users to the closestregion. However, if your application is partly centralized, design it todecrease the number of inter-regional round trips.
To reduce latency inside your applications, examine any remote procedure calls(RPCs) that communicate between VMs. This latency typically occurs whenapplications communicate between tiers or services. Tools such asCloud Trace can help you decrease latency caused byapplication-serving requests.
Because external proxy Network Load Balancers are basedon GFEs, the effect on latency is the same as observed with theexternal Application Load Balancer. Because the external Application Load Balancer hasmore features than the external proxy Network Load Balancer, werecommend using external Application Load Balancers for HTTP(S) traffic.
Next steps
We recommend that you deploy your application close to most of yourusers. For more information about the different load balancing options inGoogle Cloud, see the following documents:
- External passthrough Network Load Balancer
- External Application Load Balancer
- External proxy Network Load Balancer
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.