Request distribution for external Application Load Balancers

This document delves into the intricacies of how external Application Load Balancers load balancershandle connections, route traffic, and maintain session affinity.

How connections work

Google Cloud's external Application Load Balancers—global and regional—streamlinerouting using distributed proxies (GFEs) or Envoy-managed subnets. Withconfigurable timeouts, TLS termination, and built-in security, they ensurecompliant, scalable application delivery worldwide or regionally.

Global external Application Load Balancer connections

The global external Application Load Balancers are implemented by many proxies called Google FrontEnds (GFEs). There isn't just a single proxy. In Premium Tier, the same globalexternal IP address is advertised from various points of presence, and clientrequests are directed to the client's nearest GFE.

Depending on where your clients are, multiple GFEs can initiate HTTP(S)connections to your backends. Packets sent from GFEs have source IP addressesfrom the same range used by health check probers:35.191.0.0/16 and130.211.0.0/22.

Note: For internet NEGs, requests from the load balancer come from different IPranges depending on the type of NEG (global or regional). For more information,seeAuthenticatingrequests.

Depending on the backend service configuration, the protocol used by each GFE toconnect to your backends can be HTTP, HTTPS, or HTTP/2. For HTTP or HTTPSconnections, the HTTP version used is HTTP 1.1.

HTTP keepalive is enabled by default, as specified in the HTTP 1.1specification. HTTP keepalives attempt to efficiently use the same TCP session;however, there's no guarantee. The GFE uses a client HTTP keepalive timeout of610 seconds and a default backend keepalive timeout value of 600 seconds. Youcan update the client HTTP keepalive timeout but the backend keepalive timeoutvalue is fixed. You can configure the request and response timeout by setting thebackend service timeout. Though closely related, an HTTP keepalive and a TCPidle timeout are not the same thing. For more information, seetimeouts andretries.

To ensure that traffic is load balanced evenly, the load balancer might cleanlyclose a TCP connection either by sending aFIN ACK packet after completing aresponse that included aConnection: close header, or it might issue an HTTP/2GOAWAY frame after completing a response. This behavior does not interferewith any active requests or responses.

The numbers of HTTP connections and TCP sessions vary depending on the number ofGFEs connecting, the number of clients connecting to the GFEs, the protocol tothe backends, and where backends are deployed.

For more information, seeHow external Application Load Balancersworkin the solutions guide: Application Capacity Optimizations with Global LoadBalancing.

Regional external Application Load Balancer connections

The regional external Application Load Balancer is a managed service implemented on the Envoy proxy.The regional external Application Load Balancer uses a shared subnet called a proxy-only subnet toprovision a set of IP addresses that Google uses to run Envoy proxies on yourbehalf. The--purpose flag for this proxy-only subnet is set toREGIONAL_MANAGED_PROXY. Allregional Envoy-based loadbalancers in a particularnetwork and region share this subnet.

Clients use the load balancer's IP address and port to connect to the loadbalancer. Client requests are directed to the proxy-only subnet in the sameregion as the client. The load balancer terminates clients requests and thenopens new connections from the proxy-only subnet to your backends. Therefore,packets sent from the load balancer have source IP addresses from the proxy-onlysubnet.

Depending on the backend service configuration, the protocol used by Envoyproxies to connect to your backends can be HTTP, HTTPS, or HTTP/2. If HTTP orHTTPS, the HTTP version is HTTP 1.1. HTTP keepalive is enabled by default, asspecified in the HTTP 1.1 specification. The Envoy proxy sets both the clientHTTP keepalive timeout and the backend keepalive timeout to a default value of600 seconds each. You can update the client HTTP keepalive timeout but thebackend keepalive timeout value is fixed. You can configure the request/responsetimeout by setting the backend service timeout. For more information, seetimeouts and retries.

Client communications with the load balancer

  • Clients can communicate with the load balancer by using the HTTP/1.0, HTTP/1.1, HTTP/2, or HTTP/3 protocol.
  • When HTTPS is used, modern clients default to HTTP/2. This is controlledonthe client, not on the HTTPS load balancer.
  • You cannot disable HTTP/2 by making a configuration change on the loadbalancer. However, you can configure some clients to use HTTP 1.1 instead ofHTTP/2. For example, withcurl, use the--http1.1 parameter.
  • External Application Load Balancers support theHTTP/1.1 100 Continue response.

For the complete list of protocols supported by external Application Load Balancerforwarding rules in each mode, seeLoad balancerfeatures.

Source IP addresses for client packets

The source IP address for packets, as seen by the backends, isnot theGoogle Cloud external IP address of the load balancer. In other words,there are two TCP connections.

For the global external Application Load Balancers:
  • Connection 1, from original client to the load balancer (GFE):

    • Source IP address: the original client (or external IP address if theclient is behind a NAT gateway or a forward proxy).
    • Destination IP address: your load balancer's IP address.
  • Connection 2, from the load balancer (GFE) to the backend VM or endpoint:

    • Source IP address: an IP address in one of the ranges specified inFirewall rules.

    • Destination IP address: the internal IP address of the backend VM orcontainer in the VPC network.

Note: For internet NEGs, requests from the load balancer come from different IPranges. For more information, seeAuthenticatingrequests.For the regional external Application Load Balancers:
  • Connection 1, from original client to the load balancer (proxy-only subnet):

    • Source IP address: the original client (or external IP address if theclient is behind a NAT gateway or a forward proxy).
    • Destination IP address: your load balancer's IP address.
  • Connection 2, from the load balancer (proxy-only subnet) to the backend VM orendpoint:

    • Source IP address: an IP address in theproxy-onlysubnet that is shared among all the Envoy-based loadbalancers deployed in the same region and network as the load balancer.

    • Destination IP address: the internal IP address of the backend VM orcontainer in the VPC network.

Special routing paths

Google Cloud uses special routes not defined in your VPCnetwork to route packets for the following types of traffic:

Google Cloud uses subnet routes forproxy-onlysubnets toroute packets for the following types of traffic:

  • When using distributed Envoy health checks.

For regional external Application Load Balancers, Google Cloud uses open-source Envoyproxies to terminate client requests to the load balancer. The load balancerterminates the TCP session and opens a new TCP session from the region's proxy-only subnet to your backend. Routes defined within your VPCnetwork facilitate communication from Envoy proxies to your backends and fromyour backends to the Envoy proxies.

Open ports

GFEs have several open ports to support other Google services that run on thesame architecture. When you run a port scan, you might see other open ports forother Google services running on GFEs.

Both GFE-based load balancers—global external Application Load Balancers andclassic Application Load Balancers—always show ports 80 and 443 as open (along withany other port you've configured in your load balancer's forwarding rules).However, if you haven't configured a forwarding rule for port 80 or for port443, any connections sent to those ports are refused. Conversely,regional external Application Load Balancers are implemented using Envoy proxies and don't showextra open ports during a scan.

Running a port scan on the IP address of a GFE-based load balancer isn't usefulfrom an auditing perspective for the following reasons:

  • A port scan (for example, withnmap) generally expects no response packet ora TCP RST packet when performing TCP SYN probing. GFEs send SYN-ACKpackets in response to SYN probes only for ports on which you have configureda forwarding rule. GFEs only send packets to your backends in response topackets sent to your load balancer's IP addressand the destination portconfigured on its forwarding rule. Packets that are sent to a different IPaddress or port aren't sent to your backends.

    GFEs implement security features such as Google Cloud Armor. WithCloud Armor Standard, GFEsprovide always-on protection from volumetric and protocol-based DDoS attacksand SYN floods. This protection is available even if you haven't explicitlyconfigured Cloud Armor. You are charged only if you configuresecurity policies or if you enroll in Managed Protection Plus.

  • Packets sent to the IP address of your load balancer can be answered by anyGFE in Google's fleet; however, scanning a load balancer IP address anddestination port combination only interrogates a single GFE perTCP connection. The IP address of your load balancer isn't assigned to asingle device or system. Thus, scanning the IP address of a GFE-based loadbalancer doesn't scan all the GFEs in Google's fleet.

With that in mind, the following are some more effective ways to audit thesecurity of your backend instances:

  • A security auditor should inspect the forwarding rules configuration for theload balancer's configuration. The forwarding rules define the destinationport for which your load balancer accepts packets and forwardsthem to the backends. For GFE-based load balancers,each external forwardingrule can only reference a single destination TCPport.For a load balancer using TCP port 443, UDP port 443 is used whenthe connection is upgraded to QUIC (HTTP/3).

  • A security auditor should inspect the firewall rule configuration applicableto backend VMs. The firewall rules that you set block traffic from the GFEsto the backend VMs but don't block incoming traffic to the GFEs. For bestpractices, see thefirewall rules section.

TLS termination

The following table summarizes how TLS termination is handled by external Application Load Balancers.

Load balancer modeTLS termination
Global external Application Load BalancerTLS is terminated on a GFE, which can be anywhere in the world.
Classic Application Load BalancerTLS is terminated on a GFE, which could be anywhere in the world.
Regional external Application Load BalancerTLS is terminated on Envoy proxies located in a proxy-only subnet in a region chosen by the user. Use this load balancer mode if you need geographic control over the region where TLS is terminated.

Timeouts and retries

External Application Load Balancers support the following types of timeouts for HTTP or HTTPStraffic:

Timeout type and descriptionDefault valuesSupports custom timeout values
GlobalClassicRegional
Backend service timeout1

A request and response timeout. Represents the maximum amount of time allowed between the load balancer sending the first byte of a request to the backend and the backend returning the last byte of the HTTP response to the load balancer. If the backend hasn't returned the entire HTTP response to the load balancer within this time limit, the remaining response data is dropped.

  • For serverless NEGs on a backend service: 60 minutes
  • For all other backend types on a backend service: 30 seconds
  • For backend buckets: 24 hours (86,400 seconds)

Connection setup timeout


The maximum time allowed for the proxy to establish a connection to the backend. A successful setup includes completing the TCP three-way handshake, and—if the backend protocol is HTTPS—completing the TLS handshake.

  • For global external Application Load Balancers and classic Application Load Balancers, the load balancer's proxy is a second-layer GFE.
  • For regional external Application Load Balancers, the load balancer's proxy is Envoy software.
4.5 seconds

Client TLS timeout


The maximum amount of time that the load balancer's proxy waits for the TLS handshake to complete.

  • For global external Application Load Balancers and classic Application Load Balancers, the load balancer's proxy is a first-layer GFE.
  • For regional external Application Load Balancers, the load balancer's proxy is Envoy software.
10 seconds
Client HTTP keepalive timeout

The maximum amount of time that the TCP connection between a client and the load balancer's proxy can be idle. (The same TCP connection might be used for multiple HTTP requests.)

  • For global external Application Load Balancers and classic Application Load Balancers, the load balancer's proxy is a first-layer GFE.
  • For regional external Application Load Balancers, the load balancer's proxy is Envoy software.
610 seconds
Backend HTTP keepalive timeout

The maximum amount of time that the TCP connection between the load balancer's proxy and a backend can be idle. (The same TCP connection might be used for multiple HTTP requests.)

  • For global external Application Load Balancers and classic Application Load Balancers, the load balancer's proxy is a second-layer GFE.
  • For regional external Application Load Balancers, the load balancer's proxy is Envoy software.
  • For backend services: 10 minutes (600 seconds)
  • For backend buckets: 6 minutes (360 seconds)
QUIC session idle timeout

The maximum amount of time that a QUIC session can be idle between the (downstream) client and the GFE of a global external Application Load Balancer or a classic Application Load Balancer.

For global external Application Load Balancers and classic Application Load Balancers:

The QUIC session idle timeout is the minimum of either the client idle timeout or the GFE idle timeout (300 seconds).

The GFE idle timeout is fixed at 300 seconds. The client idle timeout can be configured.

1Not configurable for serverless NEG backends. Not configurable for backend buckets.

Backend service timeout

The configurablebackend service timeout represents the maximum amount oftime that the load balancer waits for your backend to process an HTTP request andreturn the corresponding HTTP response. Except for serverless NEGs, the defaultvalue for the backend service timeout is 30 seconds.

For example, if you want to download a 500-MB file, and the value of the backendservice timeout is 90 seconds, the load balancer expects the backend to deliverthe entire 500-MB file within 90 seconds. It is possible to configure thebackend service timeout to be insufficient for the backend to send its completeHTTP response. In this situation, if the load balancer has at least receivedHTTP response headers from the backend, the load balancer returns the completeresponse headers and as much of the response body as it could obtain within thebackend service timeout.

We recommend that you set the backend service timeout to the longest amount oftime that you expect your backend to need in order to process an HTTP response.If the software running on your backend needs more time to process an HTTPrequest and return its entire response, we recommend that you increase thebackend service timeout.For example, we recommend that you increase thetimeout if you see HTTP408 status code responses withjsonPayload.statusDetail client_timed_out errors.

The backend service timeout accepts values between1 and2,147,483,647seconds; however, larger values aren't practical configuration options.Google Cloud also doesn't guarantee that an underlying TCP connection canremain open for the entirety of the value of the backend service timeout. In case of the global andclassic Application Load Balancers, GFEs impose an effective maximum backend servicetimeout of86,400 seconds (1 day).Client systems must implement retry logic instead of relying on a TCPconnection to be open for long periods of time.

To configure the backend service timeout, use one of the following methods:

Console

Modify theTimeout field of the load balancer's backend service.

gcloud

Use thegcloud compute backend-services update command to modify the--timeout parameter of the backend service resource.

API

For a global external Application Load Balancer or a classic Application Load Balancer, modify thetimeoutSec parameter for theglobalbackendServices resource.

For a regional external Application Load Balancer, modify thetimeoutSec parameter for theregionBackendServices resource.

Websocket connection timeouts aren't always the same as backend service timeouts.Websocket connection timeouts depend on the type of load balancer:

Load balancer modeDefault valuesTimeout description for websockets
Global external Application Load Balancerbackend service timeout: 30 seconds

Active websocket connections don't use the configured backend service timeout of the load balancer. The connections are automatically closed after 24 hours (86,400 seconds). This 24-hour limit is fixed and overrides the backend service timeout if it is greater than 24 hours.

Idle websocket connections are closed after the backend service times out.

We don't recommend backend service timeout values greater than 24 hours (86,400 seconds) because Google Cloud periodically restarts GFEs for software updates and other routine maintenance. Your backend service timeout value doesn't delay the maintenance activities. The longer the backend service timeout value, the more likely it is that Google Cloud terminates TCP connections for maintenance.

Classic Application Load Balancerbackend service timeout: 30 seconds

Websocket connections, whether idle or active, automatically close after the backend service times out.

We don't recommend backend service timeout values greater than 24 hours (86,400 seconds) because Google Cloud periodically restarts GFEs for software updates and other routine maintenance. Your backend service timeout value doesn't delay the maintenance activities. The longer the backend service timeout value, the more likely it is that Google Cloud terminates TCP connections for maintenance.

Regional external Application Load Balancerbackend service timeout: 30 seconds

Active websocket connections don't use the backend service timeout of the load balancer.

Idle websocket connections are closed after the backend service times out.

Google Cloud periodically restarts or changes the number of serving Envoy software tasks. The longer the backend service timeout value is, the more likely it is that Envoy tasks restart or terminate TCP connections.

Regional external Application Load Balancers use the configuredrouteActions.timeout parameter of the URL maps and ignores the backend service timeout. WhenrouteActions.timeout isn't configured, the value of the backend service timeout is used. WhenrouteActions.timeout is supplied, the backend service timeout is ignored, and the value ofrouteActions.timeout is used as the request and response timeout instead.

Client HTTP keepalive timeout

Theclient HTTP keepalive timeout represents the maximum amount of timethat a TCP connection can be idle between the (downstream) client and one of thefollowing types of proxies:

  • For a global external Application Load Balancer or a classic Application Load Balancer: a first-layerGoogle Front End
  • For a regional external Application Load Balancer: an Envoy proxy

The client HTTP keepalive timeout represents the TCP idle timeout for theunderlying TCP connections. The client HTTP keepalive timeout doesn't apply towebsockets.

The default value for the client HTTP keepalive timeout is 610 seconds. Forglobal and regional external Application Load Balancers, you can configure the client HTTP keepalivetimeout with a value between 5 and 1200 seconds.

To configure the client HTTP keepalive timeout, use one of the followingmethods:

Console

Modify theHTTP keepalive timeout field of the load balancer's frontend configuration.

gcloud

For global external Application Load Balancers, use thegcloud compute target-http-proxies update command or thegcloud compute target-https-proxies update command to modify the--http-keep-alive-timeout-sec parameter of the target HTTP proxy or the target HTTPS proxy resource.

For a regional external Application Load Balancer, you cannot update the keepalive timeout parameter of a regional target HTTP(S) proxy directly. To update the keepalive timeout parameter of a regional target proxy, you need to do the following:

  1. Create a new target proxy with the intended timeout settings.
  2. Mirror all other settings from the current target proxy on the new one. For target HTTPS proxies, this includes linking any SSL certificates or certificate maps to the new target proxy.
  3. Update the forwarding rules to point to the new target proxy.
  4. Delete the previous target proxy.

API

For global external Application Load Balancers, modify thehttpKeepAliveTimeoutSec parameter for thetargetHttpProxies resource or thetargetHttpsProxies resource.

For a regional external Application Load Balancer, you cannot update the keepalive timeout parameter of a regional target HTTP(S) proxy directly. To update the keepalive timeout parameter of a regional target proxy, you need to do the following:

  1. Create a new target proxy with the intended timeout settings.
  2. Mirror all other settings from the current target proxy on the new one. For target HTTPS proxies, this includes linking any SSL certificates or certificate maps to the new target proxy.
  3. Update the forwarding rules to point to the new target proxy.
  4. Delete the previous target proxy.

The load balancer's client HTTP keepalive timeout must be greater than theHTTP keepalive (TCP idle) timeout used by downstream clients or proxies.If a downstream client has a greater HTTP keepalive (TCP idle) timeout thanthe load balancer's client HTTP keepalive timeout, it's possible for a racecondition to occur. From the perspective of a downstream client, an establishedTCP connection is permitted to be idle for longer than permitted by the loadbalancer. This means that the downstream client can send packets after the loadbalancer considers the TCP connection to be closed. When that happens, the loadbalancer responds with a TCP reset (RST) packet.

When the client HTTP keepalive timeout expires, either the GFE or the Envoyproxy sends a TCP FIN to the client to gracefully close the connection.

Backend HTTP keepalive timeout

External Application Load Balancers are proxies that use at least two TCP connections:

  • For a global external Application Load Balancer or a classic Application Load Balancer, a first TCPconnection exists between the (downstream) client and a first-layer GFE.First-layer GFEs connect to second layer GFEs, and then the second-layer GFEsopen a second TCP connection to your backends.
  • For a regional external Application Load Balancer, a first TCP connection exists between the(downstream) client and an Envoy proxy. The Envoy proxy then opens a secondTCP connection to your backends.

The load balancer's secondary TCP connections might not get closed after eachrequest; they can stay open to handle multiple HTTP requests and responses. Thebackend HTTP keepalive timeout defines the TCP idle timeout between theload balancer and your backends. The backend HTTP keepalive timeout doesn'tapply to websockets.

The backend keepalive timeout is fixed at 10 minutes (600 seconds) and cannotbe changed. This helps ensure that the load balancer maintains idle connectionsfor at least 10 minutes. After this period, the load balancer can sendtermination packets to the backend at any time.

The load balancer's backend keepalive timeout must be less than the keepalivetimeout used by software running on your backends. This avoids a race conditionwhere the operating system of your backends might close TCP connections with aTCP reset (RST). Because the backend keepalive timeout for the load balancerisn't configurable, you must configure your backend software so that itsHTTP keepalive (TCP idle) timeout value is greater than 600 seconds.

When the backend HTTP keepalive timeout expires, either the GFE or the Envoyproxy sends a TCP FIN to the backend VM to gracefully close the connection.

The following table lists the changes necessary to modify keepalive timeoutvalues for common web server software.

Web server softwareParameterDefault settingRecommended setting
ApacheKeepAliveTimeoutKeepAliveTimeout 5KeepAliveTimeout 620
nginxkeepalive_timeoutkeepalive_timeout 75s;keepalive_timeout 620s;

QUIC session idle timeout

TheQUIC session idle timeout represents the maximum amount of timethat a QUIC session can be idle between the client and theGFE of a global external Application Load Balancer or a classic Application Load Balancer.

The QUIC session idle timeout value is defined as the minimum of either theclient idle timeout or the GFE idle timeout (300 seconds).The GFE idle timeout is fixed at 300 seconds. The client idle timeout can beconfigured.

Retries

Support for retry logic depends on the mode of the external Application Load Balancer.

Load balancer modeRetry logic
Global external Application Load Balancer

Configurable by using aretry policy in the URL map. The maximum number of retries (numRetries) that can be configured by using the retry policy is 25. The maximum configurableperTryTimeout is 24 hours.

If you want to disable retries, you must explicitly setnumRetries to 1.

Without a retry policy, unsuccessful requests that have no HTTP body (for example,GET requests) that result in HTTP502,503, or504 responses (retryConditions=["gateway-error"]) are retried once.

HTTPPOST requests aren't retried.

Retried requests only generate one log entry for the final response.

Classic Application Load Balancer

The retry policy cannot be changed for connection retries.

HTTPPOST requests aren't retried.

HTTPGET requests are always retried once as long as 80% or more of the backends are healthy. If there is a single backend instance in a group and the connection to that backend instance fails, the percentage of unhealthy backend instances is 100%, so the GFE doesn't retry the request.

The load balancer retries a failedGET request if the first request failed before receiving response headers from the backend instance.

Retried requests only generate one log entry for the final response. For more information, seeExternal Application Load Balancer logging and monitoring.

Unsuccessful requests result in the load balancer synthesizing an HTTP502 response.

Regional external Application Load Balancer

Configurable by using aretry policy in the URL map. The default number of retries (numRetries) is 1. The maximum number of retries that can be configured by using the retry policy is 25. The maximum configurableperTryTimeout is 24 hours.

Without a retry policy, unsuccessful requests that have no HTTP body (for example,GET requests) that result in HTTP502,503, or504 responses are retried once.

HTTPPOST requests aren't retried.

Retried requests only generate one log entry for the final response.

The WebSocket protocol issupported with GKE Ingress.

Illegal request and response handling

The load balancer blocks both client requests and backend responses fromreaching the backend or the client, respectively, for a number of reasons. Somereasons are strictly for HTTP/1.1 compliance and others are to avoid unexpecteddata being passed to or from the backends. None of the checks can be disabled.

The load balancer blocks the following requests for HTTP/1.1 compliance:

  • It cannot parse the first line of the request.
  • A header is missing the colon (:) delimiter.
  • Headers or the first line contain invalid characters.
  • The content length is not a valid number, or there are multiple contentlength headers.
  • There are multiple transfer encoding keys, or there are unrecognizedtransfer encoding values.
  • There's a non-chunked body and no content length specified.
  • Body chunks are unparseable. This is the only case where some data reachesthe backend. The load balancer closes the connections to the client andbackend when it receives an unparseable chunk.

Request handling

The load balancer blocks the request if any of the following are true:

  • The total size of request headers and the request URL exceeds the limit forthe maximum requestheader size forexternal Application Load Balancers.
  • The request method does not allow a body, but the request has one.
  • The request contains anUpgrade header, and theUpgrade header is notused to enable WebSocket connections.
  • The HTTP version is unknown.

Response handling

The load balancer blocks the backend's response if any of the following aretrue:

  • The total size of response headers exceeds the limit for maximum responseheader size for external Application Load Balancers.
  • The HTTP version is unknown.

When handling both the request and response, the load balancer might remove oroverwritehop-by-hopheaders inHTTP/1.1 before forwarding them to the intended destination.

Traffic distribution

When you add a backend instance group or NEG to a backend service, you specify abalancing mode, which defines a method measuring backend load and a targetcapacity. External Application Load Balancers support two balancing modes:

  • RATE, for instance groups or NEGs, is the target maximum number ofrequests (queries) per second (RPS, QPS). The target maximum RPS/QPS can beexceeded if all backends are at or above capacity.

  • UTILIZATION is the backend utilization of VMs in an instance group.

How traffic is distributed among backends depends on the mode of the loadbalancer.

Global external Application Load Balancer

Before a Google Front End (GFE) sends requests to backend instances, the GFEestimates which backend instances have capacity to receive requests. Thiscapacity estimation is made proactively, not at the same time as requests arearriving. The GFEs receive periodic information about the available capacity anddistribute incoming requests accordingly.

Whatcapacity means depends in part on the balancing mode. For theRATEmode, it is relatively simple: a GFE determines exactly how many requests it canassign per second.UTILIZATION-based load balancing is more complex: the loadbalancer checks the instances' current utilization and then estimates a queryload that each instance can handle. This estimate changes over time as instanceutilization and traffic patterns change.

Both factors—the capacity estimation and the proactive assignment—influence thedistribution among instances. Thus, Cloud Load Balancing behavesdifferently from a simple round-robin load balancer that spreads requestsexactly 50:50 between two instances. Instead, Google Cloud load balancingattempts to optimize the backend instance selection for each request.

For the global external Application Load Balancer, load balancing is two tiered. The balancingmode determines the weighting or fraction of traffic to send to eachbackend (instance group or NEG). Then, the load balancing policy(LocalityLbPolicy) determines how traffic is distributed to instances orendpoints within the group. For more information, see theLoad balancinglocality policy (backend service APIdocumentation).

For the classic Application Load Balancer, the balancing mode is used to select the mostfavorable backend (instance group or NEG). Traffic is then distributed in around robin fashion among instances or endpoints within the backend.

How requests are distributed

GFE-based external Application Load Balancers use the following process to distributeincoming requests:

  1. From client to first-layer GFE. Edge routers advertise the forwarding rule's external IP address at the borders of Google's network. Each advertisement lists a next hop to a Layer 3 and Layer 4 load balancing system (Maglev). The Maglev systems route traffic to a first-layer Google Front End (GFE).
    • When using Premium Tier, Google advertises your load balancer's IP address from all points of presence, worldwide. Each load balancer IP address is global anycast.
    • When using Standard Tier, Google advertises your load balancer's IP address from points of presence associated with the forwarding rule's region. The load balancer uses a regional external IP address. Using a Standard Tier forwarding rule limits you to instance group and zonal NEG backends in the same region as the load balancer's forwarding rule.
  2. From first-layer GFE to second-layer GFE. The first-layer GFE terminates TLS if required and then routes traffic to second-layer GFEs according to the following process:
    • First-layer GFEs parse the URL map and select a backend service or backend bucket.
    • For backend services with internet NEGs, the first layer-GFEs select a second-layer external forwarding gateway colocated with the first-layer GFE. The forwarding gateway sends requests to the internet NEG endpoint. This concludes the request distribution process for internet NEGs.
    • For backend services withserverless NEGs andPrivate Service Connect (PSC) NEGs, and single-region backend buckets, first-layer GFEs select a second-layer GFE in the region matching the region of the NEG or bucket. For multi-region Cloud Storage buckets, first-layer GFEs select second-layer GFEs either in the region of the bucket, or a region as close as possible to the multi-region bucket (defined by network round trip time).
    • For backend services with instance groups,zonal NEGs withGCE_VM_IP_PORT endpoints, andhybrid NEGs, Google's capacity management system informs first-layer GFEs about the used and configured capacity for each backend. The configured capacity for a backend is defined by thebalancing mode, thetarget capacity of the balancing mode, and thecapacity scaler.
      • Standard Tier: first-layer GFEs select a second layer GFE in the region containing the backends.
      • Premium Tier: first-layer GFEs select second-layer GFEs from a set of applicable regions. Applicable regions are all regions where backends have been configured, excluding those regions with configured backends having zero capacity. First-layer GFEs select the closest second-layer GFE in an applicable region (defined by network round-trip time). If backends are configured in two or more regions, first-layer GFEs can spill requests over to other applicable regions if a first-choice region is full. Spillover to other regions is possible when all backends in the first-choice region are at capacity.
  3. Second layer GFEs select backends. Second-layer GFEs are located in zones of a region. They use the following process to select a backend:
    • For backend services with serverless NEGs, Private Service Connect NEGs, and backend buckets, second-layer GFEs forward requests to Google's production systems. This concludes the request distribution process for these backends.
    • For backend services with instance groups, zonal NEGs withGCE_VM_IP_PORT endpoints, and hybrid NEGs, Google's health check probe systems inform second-layer GFEs about the health check status of the backend instances or endpoints.

      Premium Tier only: If the second-layer GFE has no healthy backend instances or endpoints in its region, it might send requests to another second-layer GFE in a different applicable region with configured backends. Spillover between second-layer GFEs in different regions doesn't exhaust all possible region-to-region combinations. If you need to direct traffic away from backends in a particular region, instead of configuring backends to fail health checks,set the capacity scaler of the backend to zero so that the first-layer GFE excludes the region during the previous step.

    The second-layer GFE then directs requests to backend instances or endpoints in zones within its region as discussed in the next step.

  4. Second layer GFE selects a zone. By default, second-layer GFEs use theWATERFALL_BY_REGION algorithm where each second-layer GFE prefers to select backend instances or endpoints in the same zone as the zone that contains the second-layer GFE. BecauseWATERFALL_BY_REGION minimizes traffic between zones, at low request rates, each second-layer GFE might exclusively send requests to backends in the same zone as the second-layer GFE itself.

    For global external Application Load Balancers only, second-layer GFEs can be configured to use one of the following alternative algorithms by using aserviceLbPolicy:

    • SPRAY_TO_REGION: Second-layer GFEs don't prefer selecting backend instances or endpoints in the same zone as the second-layer GFE. Second-layer GFEs attempt to distribute traffic to all backend instances or endpoints in all zones of the region. This can lead to more even distribution of load at the expense of increased traffic between zones.
    • WATERFALL_BY_ZONE: Second-layer GFEs strongly prefer selecting backend instances or endpoints in the same zone as the second-layer GFE. Second-layer GFEs only direct requests to backends in different zones after all backends in the current zone have reached their configured capacities.
  5. Second layer GFE selects instances or endpoints within the zone. By default, a second-layer GFE distributes requests among backends in a round-robin fashion. For global external Application Load Balancers only, you can change this by using a load balancing locality policy (localityLbPolicy). The load balancing locality policy applies only to backends within the selected zone discussed in the previous step.

Regional external Application Load Balancer

For regional external Application Load Balancers, traffic distribution is based on the loadbalancing mode and the load balancing locality policy.

The balancing mode determines the weight and fraction of traffic to send toeach group (instance group or NEG). The load balancing locality policy(LocalityLbPolicy) determines how backends within the group are load balanced.

When a backend service receives traffic, it first directs traffic to a backend(instance group or NEG) according to the backend's balancing mode. After abackend is selected, traffic is then distributed among instances or endpoints inthat backend group according to the load balancing locality policy.

For more information, see the following:

Session affinity

Session affinity, configured on the backend service of Application Load Balancers,provides a best-effort attempt to send requests from a particular client to thesame backend as long as the number of healthy backend instances or endpointsremains constant, and as long as the previously selected backend instance orendpoint is not at capacity. Thetarget capacity of the balancingmode determines when thebackend is at capacity.

The following table outlines the different types of session affinity optionssupported for the different Application Load Balancers. In the sectionthat follows,Types of session affinity, each session affinity type is discussed in further detail.

Table: Supported session affinity settings
ProductSession affinity options

Global external Application Load Balancer

and

Regional external Application Load Balancer

  • None (NONE)
  • Client IP (CLIENT_IP)
  • Generated cookie (GENERATED_COOKIE)
  • Header field (HEADER_FIELD)
  • HTTP cookie (HTTP_COOKIE)
  • Stateful cookie-based affinity (STRONG_COOKIE_AFFINITY)

Also note:

  • The effective default value of the load balancing locality policy(localityLbPolicy) changes according to your sessionaffinity settings. If session affinity is not configured—that is, ifsession affinity remains at the default value ofNONE—thenthe default value forlocalityLbPolicy isROUND_ROBIN.If session affinity is set to a value other thanNONE, then thedefault value forlocalityLbPolicy isMAGLEV.
  • For the global external Application Load Balancer, don't configure session affinity if you're using weighted traffic splitting. If you do, the weighted traffic splitting configuration takes precedence.
Classic Application Load Balancer
  • None (NONE)
  • Client IP (CLIENT_IP)
  • Generated cookie (GENERATED_COOKIE)

Keep the following in mind when configuring session affinity:

  • Don't rely on session affinity for authentication or security purposes.Session affinity, except forstateful cookie-based sessionaffinity, can break whenever thenumber of serving and healthy backends changes. For more details, seeLosingsession affinity.

  • The default values of the--session-affinity and--subsetting-policyflags are bothNONE, and only one of them at a time can be set to adifferent value.

Types of session affinity

The session affinity for external Application Load Balancers can be classified into one ofthe following categories:
  • Hash-based session affinity (NONE,CLIENT_IP)
  • HTTP header-based session affinity (HEADER_FIELD)
  • Cookie-based session affinity (GENERATED_COOKIE,HTTP_COOKIE,STRONG_COOKIE_AFFINITY)

Hash-based session affinity

For hash-based session affinity, the load balancer uses theconsistent hashing algorithm to select an eligible backend. The session affinity setting determines which fields from the IP header are used to calculate the hash.

Hash-based session affinity can be of the following types:

None

A session affinity setting ofNONE doesnot mean that there is nosession affinity. It means that no session affinity option is explicitly configured.

Hashing is always performed to select a backend. And a session affinity setting ofNONE means that the load balancer uses a 5-tuple hash to select a backend. The 5-tuplehash consists of the source IP address, the source port, the protocol, the destination IP address,and the destination port.

A session affinity ofNONE is the default value.

Client IP affinity

Client IP session affinity (CLIENT_IP) is a 2-tuple hash created from thesource and destination IP addresses of the packet. Client IP affinity forwardsall requests from the same client IP address to the same backend, as long asthat backend has capacity and remains healthy.

When you use client IP affinity, keep the following in mind:

  • The packet destination IP address is only the same as the load balancer forwarding rule's IP address if the packet is sent directly to the load balancer.
  • The packet source IP address might not match an IP address associated with the original client if the packet is processed by an intermediate NAT or proxy system before being delivered to a Google Cloud load balancer. In situations where many clients share the same effective source IP address, some backend VMs might receive more connections or requests than others.

HTTP header-based session affinity

With header field affinity (HEADER_FIELD), requests are routed to the backends based on the value of the HTTP header in theconsistentHash.httpHeaderName fieldof the backend service. To distribute requests across all available backends,each client needs to use a different HTTP header value.

Header field affinity is supported when the followingconditions are true:

  • The load balancing locality policy isRING_HASH orMAGLEV.
  • The backend service'sconsistentHash specifies the name of the HTTP header(httpHeaderName).

Cookie-based session affinity

Cookie-based session affinity can be of the following types:

Generated cookie affinity

When you use generated cookie-based affinity (GENERATED_COOKIE), the loadbalancer includes an HTTP cookie in theSet-Cookie header in response to theinitial HTTP request.

The name of the generated cookie varies depending on the type of the loadbalancer.

ProductCookie name
Global external Application Load BalancersGCLB
Classic Application Load BalancersGCLB
Regional external Application Load BalancersGCILB

The generated cookie's path attribute is always a forward slash (/), so itapplies to all backend services on the same URL map, provided that the otherbackend services also use generated cookie affinity.

You can configure the cookie's time to live (TTL) value between0 and1,209,600 seconds (inclusive) by using theaffinityCookieTtlSec backendservice parameter. IfaffinityCookieTtlSec isn't specified, the default TTLvalue is0.

When the client includes the generated session affinity cookie in theCookierequest header of HTTP requests, the load balancer directs thoserequests to the same backend instance or endpoint, as long as the sessionaffinity cookie remains valid. This is done by mapping the cookie value to anindex that references a specific backend instance or an endpoint,and by making sure that the generated cookie session affinity requirementsare met.

To use generated cookie affinity, configure the following balancingmode andlocalityLbPolicy settings:

  • For backend instance groups, use theRATE balancing mode.
  • For thelocalityLbPolicy of the backend service, use eitherRING_HASH orMAGLEV. If you don't explicitly set thelocalityLbPolicy,the load balancer usesMAGLEV as an implied default.

For more information, seelosing session affinity.

HTTP cookie affinity

When you use HTTP cookie-based affinity (HTTP_COOKIE), the load balancerincludes an HTTP cookie in theSet-Cookie header in response to the initialHTTP request. You specify the name, path, and time to live (TTL) for the cookie.

All Application Load Balancers support HTTP cookie-based affinity.

You can configure the cookie's TTL values using seconds, fractions of a second(as nanoseconds), or both seconds plus fractions of a second (as nanoseconds)using the following backend service parameters and valid values:

  • consistentHash.httpCookie.ttl.seconds can be set to a value between0and315576000000 (inclusive).
  • consistentHash.httpCookie.ttl.nanos can be set to a value between0and999999999 (inclusive). Because the units are nanoseconds,999999999means.999999999 seconds.

If bothconsistentHash.httpCookie.ttl.seconds andconsistentHash.httpCookie.ttl.nanos aren't specified, the value of theaffinityCookieTtlSec backend service parameter is used instead. IfaffinityCookieTtlSec isn't specified, the default TTL value is0.

When the client includes the HTTP session affinity cookie in theCookierequest header of HTTP requests, the load balancer directs thoserequests to the same backend instance or endpoint, as long as the sessionaffinity cookie remains valid. This is done by mapping the cookie value to anindex that references a specific backend instance or an endpoint,and by making sure that the generated cookie session affinity requirementsare met.

To use HTTP cookie affinity, configure the following balancingmode andlocalityLbPolicy settings:

  • For backend instance groups, use theRATE balancing mode.
  • For thelocalityLbPolicy of the backend service, use eitherRING_HASH orMAGLEV. If you don't explicitly set thelocalityLbPolicy,the load balancer usesMAGLEV as an implied default.

For more information, seelosing session affinity.

Stateful cookie-based session affinity

When you use stateful cookie-based affinity (STRONG_COOKIE_AFFINITY), the loadbalancer includes an HTTP cookie in theSet-Cookie header in response to theinitial HTTP request. You specify the name, path, and time to live (TTL) for thecookie.

All Application Load Balancers, except for classic Application Load Balancers, support stateful cookie-based affinity.

You can configure the cookie's TTL values using seconds, fractions of a second(as nanoseconds), or both seconds plus fractions of a second (as nanoseconds).The duration represented bystrongSessionAffinityCookie.ttl cannot be set to avalue representing more than two weeks (1,209,600 seconds).

The value of the cookie identifies a selected backend instance or endpoint byencoding the selected instance or endpoint in the value itself. For as longas the cookie is valid, if the client includes the session affinity cookie intheCookie request header of subsequent HTTP requests, the load balancerdirects those requests to selected backend instance or endpoint.

Unlike other session affinity methods:

  • Stateful cookie-based affinity has no specific requirements for the balancingmode or for the load balancing locality policy (localityLbPolicy).

  • Stateful cookie-based affinity is not affected when autoscaling adds a newinstance to a managed instance group.

  • Stateful cookie-based affinity is not affected when autoscaling removes aninstance from a managed instance groupunless the selected instance isremoved.

  • Stateful cookie-based affinity is not affected when autohealing removes aninstance from a managed instance groupunless the selected instance isremoved.

For more information, seelosing session affinity.

Meaning of zero TTL for cookie-based affinities

All cookie-based session affinities, such as generated cookie affinity, HTTP cookie affinity, and stateful cookie-based affinity, have a TTL attribute.

A TTL of zero seconds means the load balancer does not assign anExpiresattribute to the cookie. In this case, the client treats the cookie as a sessioncookie. The definition of asession varies depending on the client:

  • Some clients, like web browsers, retain the cookie for the entire browsingsession. This means that the cookie persists across multiple requests untilthe application is closed.

  • Other clients treat a session as a single HTTP request, discarding the cookieimmediately after.

Losing session affinity

All session affinity options require the following:

  • The selected backend instance or endpoint must remain configured as a backend. Session affinity can break when one of the following events occurs:
    • You remove the selected instance from its instance group.
    • Managed instance group autoscaling or autohealing removes the selected instance from its managed instance group.
    • You remove the selected endpoint from its NEG.
    • You remove the instance group or NEG that contains the selected instance or endpoint from the backend service.
  • The selected backend instance or endpoint must remain healthy. Session affinity can break when the selected instance or endpoint fails health checks.
  • For Global external Application Load Balancers and Classic Application Load Balancers, session affinity can break if a different first-layer Google Front End (GFE) is used for subsequent requests or connections after the change in routing path. A different first-layer GFE might be selected if the routing path from a client on the internet to Google changes between requests or connections.
Except forstateful cookie-based session affinity,all session affinity options have the following additional requirements:
  • The instance group or NEG that contains the selected instance or endpointmust not be full as defined by itstarget capacity. (Forregional managed instance groups, the zonal component of the instance groupthat contains the selected instance must not be full.) Session affinity canbreak when the instance group or NEG is full and other instance groups orNEGs are not. Because fullness can change in unpredictable ways when usingtheUTILIZATION balancing mode, you should use theRATE orCONNECTIONbalancing mode to minimize situations when session affinity can break.

  • Thetotal number of configured backend instances or endpoints must remainconstant. When at least one of the following events occurs, the number ofconfigured backend instances or endpoints changes, and session affinity canbreak:

    • Adding new instances or endpoints:

      • You add instances to an existing instance group on the backend service.
      • Managed instance group autoscaling adds instances to a managed instancegroup on the backend service.
      • You add endpoints to an existing NEG on the backend service.
      • You add non-empty instance groups or NEGs to the backend service.
    • Removing any instance or endpoint,not just the selected instance orendpoint:

      • You remove any instance from an instance group backend.
      • Managed instance group autoscaling or autohealing removes any instancefrom a managed instance group backend.
      • You remove any endpoint from a NEG backend.
      • You remove any existing, non-empty backend instance group or NEG fromthe backend service.
  • Thetotal number of healthy backend instances or endpoints must remainconstant. When at least one of the following events occurs, the number ofhealthy backend instances or endpoints changes, and session affinity canbreak:

    • Any instance or endpoint passes its health check, transitioning fromunhealthy to healthy.
    • Any instance or endpoint fails its health check, transitioning fromhealthy to unhealthy or timeout.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.