Backend services overview Stay organized with collections Save and categorize content based on your preferences.
A backend service defines how Cloud Load Balancing distributes traffic.The backend service configuration contains a set of values, such as theprotocol used to connect to backends, various distribution and sessionsettings, health checks, and timeouts. These settings provide fine-grainedcontrol over how your load balancer behaves. To get you started,most of the settings have default values that allow for fastconfiguration. A backend service is either global orregional in scope.
Load balancers, Envoy proxies, and proxyless gRPC clients use the configurationinformation in the backend service resource to do the following:
- Direct traffic to the correctbackends, which are instance groups ornetwork endpoint groups (NEGs).
- Distribute traffic according to abalancing mode, which is a setting foreach backend.
- Determine whichhealth check is monitoringthe health of the backends.
- Specifysession affinity.
- Determine whether other services are enabled, including the followingservices that are only available forcertain loadbalancers:
- Cloud CDN
- Google Cloud Armor security policies
- Identity-Aware Proxy
- Designate global and regional backend services as a service inApp Hub applications.
You set these values when you create a backend service or add a backend to thebackend service.
Note: If you're using either the global external Application Load Balancer or theclassic Application Load Balancer, and your backends servestatic content, considerusing backend buckets instead of backend services. Seebackend buckets for global external Application Load Balancerorbackend buckets for classic Application Load Balancer.The following table summarizes which load balancers use backend services. Theproduct that you are using also determines the maximum number of backendservices, the scope of a backend service, the type of backends supported, and thebackend service'sload balancing scheme. The load balancing scheme is anidentifier that Google uses to classify forwarding rules and backend services. Eachload balancing product uses one load balancing scheme for its forwarding rulesand backend services. Some schemes are shared among products.
| Product | Maximum number of backend services | Scope of backend service | Supported backend types | Load balancing scheme |
|---|---|---|---|---|
| Global external Application Load Balancer | Multiple | Global | Each backend service supports one of the following backend combinations:
| EXTERNAL_MANAGED |
| Classic Application Load Balancer | Multiple | Global‡ | Each backend service supports one of the following backend combinations:
| EXTERNAL# |
| Regional external Application Load Balancer | Multiple | Regional | Each backend service supports one of the following backend combinations:
| EXTERNAL_MANAGED |
| Cross-region internal Application Load Balancer | Multiple | Global | Each backend service supports one of the following backend combinations:
| INTERNAL_MANAGED |
| Regional internal Application Load Balancer | Multiple | Regional | Each backend service supports one of the following backend combinations:
| INTERNAL_MANAGED |
| Global external proxy Network Load Balancer | 1 | Global‡ | The backend service supports one of the following backend combinations:
| EXTERNAL_MANAGED |
| Classic proxy Network Load Balancer | 1 | Global‡ | The backend service supports one of the following backend combinations:
| EXTERNAL |
| Regional external proxy Network Load Balancer | 1 | Regional | The backend service supports one of the following backend combinations:
| EXTERNAL_MANAGED |
| Regional internal proxy Network Load Balancer | 1 | Regional | The backend service supports one of the following backend combinations:
| INTERNAL_MANAGED |
| Cross-region internal proxy Network Load Balancer | Multiple | Global | The backend service supports one of the following backend combinations:
| INTERNAL_MANAGED |
| External passthrough Network Load Balancer | 1 | Regional | The backend service supports one of the following backend combinations:
| EXTERNAL |
| Internal passthrough Network Load Balancer | 1 | Regional, but configurable to be globally accessible | The backend service supports one of the following backend combinations:
| INTERNAL |
| Cloud Service Mesh | Multiple | Global | Each backend service supports one of the following backend combinations:
| INTERNAL_SELF_MANAGED |
GCE_VM_IP_PORT type endpoints.- Theforwarding rule and its external IP address are regional.
- All backends connected to the backend service must be located in the same region as the forwarding rule.
EXTERNAL_MANAGED backend services toEXTERNAL forwarding rules. However,EXTERNAL backendservices cannot be attached toEXTERNAL_MANAGED forwarding rules.To take advantage ofnew features availableonly with the global external Application Load Balancer, werecommend that you migrate your existingEXTERNAL resources toEXTERNAL_MANAGED by using the migration process described atMigrateresources from classic to global external Application Load Balancer.Load balancer naming
For Proxy Network Load Balancers and Passthrough Network Load Balancers, the name of the loadbalancer is always the same as the name of the backend service. The behavior foreach Google Cloud interface is as follows:
- Google Cloud console. If you create either a proxy Network Load Balancer or apassthrough Network Load Balancer by using the Google Cloud console, the backend service isautomatically assigned the same name that you entered for the load balancername.
- Google Cloud CLI or API. If you create either a proxy Network Load Balancer or apassthrough Network Load Balancer by using the gcloud CLI or the API, you enter aname of your choice while creating the backend service. This backend servicename is then reflected in the Google Cloud console as the name of the loadbalancer.
To learn about how naming works for Application Load Balancers, seeURL mapsoverview: Load balancer naming.
Backends
A backend is one or more endpoints that receive traffic from a Google Cloudload balancer, a Cloud Service Mesh-configured Envoy proxy, or a proxyless gRPCclient. There are several types of backends:
- Instance group containing virtual machine (VM) instances.An instance group can be amanaged instancegroup (MIG),with or withoutautoscaling, or it can be anunmanaged instancegroup.More than one backend service can reference an instance group, but all backendservices that reference the instance groupmust use the same balancing mode.
- Zonal NEG
- Serverless NEG
- Private Service Connect NEG
- Internet NEG
- Hybrid connectivity NEG
- Port mapping NEG
- Service Directory service bindings
You cannot delete a backend instance group or NEG that is associated with abackend service. Before you delete an instance group or NEG, you must firstremove it as a backend from all backend services that reference it.
Instance groups
This section discusses how instance groups work with the backend service.
Backend VMs and external IP addresses
Backend VMs in backend services don't need external IP addresses:
- For global external Application Load Balancers andexternal proxy Network Load Balancers: Clients communicate with a Google Front End (GFE) whichhosts your load balancer's external IP address. GFEs communicate with backendVMs or endpoints by sending packets to an internal address created by joiningan identifier for the backend's VPC network with the internalIPv4 address of the backend. Communication between GFEs and backend VMs orendpoints is facilitated throughspecialroutes.
- For instance group backends, the internal IPv4address is always the primary internal IPv4 address that corresponds to the
nic0interface of the VM. - For
GCE_VM_IP_PORTendpoints in a zonal NEG, you can specify theendpoint's IP address as either the primary IPv4 address associated with anynetwork interface of a VM or any IPv4 address from an alias IP address rangeassociated with any network interface of a VM.
- For instance group backends, the internal IPv4address is always the primary internal IPv4 address that corresponds to the
For regional external Application Load Balancers: Clients communicate with an Envoy proxywhich hosts your load balancer's external IP address. Envoy proxiescommunicate with backend VMs or endpoints by sending packets to an internaladdress created by joining an identifier for the backend's VPCnetwork with the internal IPv4 address of the backend.
- For instance group backends, the internal IPv4 address is always the primaryinternal IPv4 address that corresponds to the
nic0interface of the VM,andnic0must be in the same network as the load balancer. - For
GCE_VM_IP_PORTendpoints in a zonal NEG, you can specify theendpoint's IP address as either the primary IPv4 address associated with anynetwork interface of a VM or any IPv4 address from an alias IP address rangeassociated with any network interface of a VM, as long as the networkinterface is in the same network as the load balancer.
- For instance group backends, the internal IPv4 address is always the primaryinternal IPv4 address that corresponds to the
For external passthrough Network Load Balancers: Clients communicate directly with backends by wayof Google'sMaglevpass-through load balancing infrastructure. Packets are routed and deliveredto backends with the original source anddestination IP addresses preserved. Backends respond to clientsusing directserver return.The methods used to select a backend and to track connections areconfigurable.
- For instance group backends, packets are always delivered to the
nic0interface of the VM. - For
GCE_VM_IPendpoints in a zonal NEG, packets are delivered to the VM'snetwork interface that is in the subnetwork associated with the NEG.
- For instance group backends, packets are always delivered to the
Named ports
The backend service'snamed port attribute is only applicable to proxy-basedload balancers (Application Load Balancers and Proxy Network Load Balancers) usinginstance group backends. The named port defines the destination port used forthe TCP connection between the proxy (GFE or Envoy) and the backend instance.
Named ports are configured as follows:
On each instance group backend, you must configure one or morenamed portsusing key-value pairs. The key represents a meaningful port name that youchoose, and the value represents the port number you assign to the name. Themapping of names to numbers is done individually for each instance groupbackend.
On the backend service, you specify a single named port using just the portname (
--port-name).
On a per-instance group backend basis, the backend service translates the portname to a port number. When an instance group's named port matches the backendservice's--port-name, the backend service uses this port number forcommunication with the instance group's VMs.
For example, you might set the named port on an instance group with the namemy-service-name and the port8888:
gcloud compute instance-groups unmanaged set-named-ports my-unmanaged-ig \ --named-ports=my-service-name:8888
Then you refer to the named port in the backend service configuration with the--port-name on the backend service set tomy-service-name:
gcloud compute backend-services update my-backend-service \ --port-name=my-service-name
A backend service can use a different port number when communicating with VMsin different instance groups if each instance group specifies a different portnumber for the same port name.
The resolved port number used by the proxy load balancer's backend servicedoesn't need to match the port number used by the load balancer's forwardingrules. A proxy load balancer listens for TCP connections sent to the IP addressand destination port of its forwarding rules. Because the proxy opens a secondTCP connection to its backends, the second TCP connection's destination port canbe different.
Named ports are only applicable to instance group backends. Zonal NEGs withGCE_VM_IP_PORT endpoints, hybrid NEGs withNON_GCP_PRIVATE_IP_PORTendpoints, and internet NEGs define ports using a different mechanism, namely,on the endpoints themselves. Serverless NEGs reference Google services and PSCNEGs reference service attachments using abstractions that don't involvespecifying a destination port.
Internal passthrough Network Load Balancers and external passthrough Network Load Balancers don'tuse named ports. This is because they are pass-through load balancers that routeconnections directly to backends instead of creating new connections. Packetsare delivered to the backends preserving the destination IP address and port ofthe load balancer's forwarding rule.
To learn how to create named ports, see the following instructions:
- Unmanaged instance groups:Working with namedports
- Managed instance groups:Assigning named ports to managed instancegroups
Restrictions and guidance for instance groups
Keep the following restrictions and guidance in mind when you create instancegroups for your load balancers:
Don't put a VM in more than one load-balanced instance group. If a VM is amember of two or more unmanaged instance groups, or a member of one managedinstance group and one or more unmanaged instance groups, Google Cloudlimits you to only using one of those instance groups at a time as a backendfor a particular backend service.
If you need a VM to participate in multiple load balancers, you must use thesame instance group as a backend on each of the backend services.
For proxy load balancers, when you want to balance traffic to differentports, specify the requirednamed ports on oneinstance group and have each backend service subscribe to a unique namedport.
You can use the same instance group as a backend for more than onebackend service. In this situation, the backends must use compatiblebalancing modes.Compatible means that the balancing modes must be thesame, or they must be a combination of compatible balancing modes—forexample,
CONNECTIONandRATE.Incompatible balancing mode combinations are as follows:
CONNECTIONwithUTILIZATIONRATEwithUTILIZATIONCUSTOM_METRICSwithUTILIZATIONCUSTOM_METRICSwithRATECUSTOM_METRICSwithCONNECTION
Consider the following example:
- You have two backend services:
external-https-backend-servicefor anexternal Application Load Balancer andinternal-tcp-backend-servicefor aninternal passthrough Network Load Balancer. - You're using an instance group called
instance-group-aininternal-tcp-backend-service. - In
internal-tcp-backend-service, you must apply theCONNECTIONbalancing mode because internal passthrough Network Load Balancers only support theCONNECTIONbalancing mode. - You can also use
instance-group-ainexternal-https-backend-serviceifyou apply theRATEbalancing mode inexternal-https-backend-service. - Youcannot also use
instance-group-ainexternal-https-backend-servicewith theUTILIZATIONbalancing mode.
To change the balancing mode for an instance group serving as a backend formultiple backend services:
- Remove the instance group from all backend services except for one.
- Change the balancing mode for the backend on the one remaining backend service.
- Re-add the instance group as a backend to the remaining backend services,if they support the new balancing mode.
If your instance group is associated with several backend services,each backend service can reference the same named port or a differentnamed port on the instance group.
We recommendnot adding an autoscaled managed instance group to more thanone backend service. Doing so might cause unpredictable and unnecessaryscaling of instances in the group, especially if you use theHTTP LoadBalancing Utilization autoscaling metric.
- While not recommended, this scenario might work if the autoscalingmetric is eitherCPU Utilization or aCloud Monitoring Metric thatis unrelated to the load balancer's serving capacity. Using one of theseautoscaling metrics might prevent erratic scaling.
Zonal network endpoint groups
Network endpoints represent services by their IP address or an IP address andport combination, rather than referring to a VM in an instance group. Anetworkendpoint group (NEG) is a logical grouping of network endpoints.
Zonal network endpoint groups (NEGs) arezonalresources that represent collections of either IP addresses or IPaddress and port combinations for Google Cloud resources within a singlesubnet.
A backend service that uses zonal NEGs as its backendsdistributes traffic among applications or containers runningwithin VMs.
There are two types of network endpoints available for zonal NEGs:
GCE_VM_IPendpoints (supported only with internal passthrough Network Load Balancers and backendservice-based external passthrough Network Load Balancers).GCE_VM_IP_PORTendpoints.
To see which products support zonal NEG backends,seeTable: Backend services and supported backendtypes.
For details, seeZonal NEGsoverview.
Internet network endpoint groups
Internet NEGs are resources that define external backends.An external backend is a backend that is hosted within on-premisesinfrastructure or on infrastructure provided by third parties.
An internet NEG is a combination of a hostname or an IP address, plus anoptional port. There are two types of network endpoints available for internetNEGs:INTERNET_FQDN_PORTandINTERNET_IP_PORT.
For details, seeInternet network endpoint groupoverview.
Serverless network endpoint groups
A network endpoint group (NEG) specifies a group of backend endpoints for a loadbalancer. Aserverless NEG is a backend that points to aCloud Run,App Engine,Cloud Run functions, orAPI Gatewayresource.
A serverless NEG can represent one of the following:
- A Cloud Run resource or a group of resources.
- A Cloud Run function or group of functions (formerlyCloud Run functions 2nd gen).
- A Cloud Run function (1st gen) or group of functions
- An App Engine standard environment or App Engine flexible environment app, a specific service within an app,a specific version of an app, or a group of services.
- An API Gateway that provides access to your services through aREST APIconsistent across all services, regardless of service implementation.This capability is inPreview.
To set up a serverless NEG for serverless applications that share a URLpattern, you use aURLmask. A URL maskis a template of your URL schema (for example,example.com/<service>). Theserverless NEG will use this template to extract the<service> name from theincoming request's URL and route the request to the matchingCloud Run, Cloud Run functions, or App Engineservice with the same name.
To see which load balancers support serverless NEG backends,seeTable: Backend services and supported backendtypes.
For more information about serverless NEGs, see theServerless network endpointgroups overview.
Service bindings
Aservice binding is a backend that establishes a connection between abackend service in Cloud Service Mesh and a service registered inService Directory. A backend service can reference severalservice bindings. A backend service with a service binding cannot referenceany other type of backend.
Mixed backends
The following usage considerations apply when you add different types ofbackends to a single backend service:
- A single backend service cannot simultaneously use both instancegroups and zonal NEGs.
- You can use a combination of differenttypes of instance groups on the samebackend service. For example, a single backend service can reference acombination of both managed and unmanaged instance groups. For completeinformation about which backends are compatible with which backend services,see the table in the previous section.
- With certain proxy load balancers, you can use a combination of zonal NEGs(with
GCE_VM_IP_PORTendpoints)and hybrid connectivity NEGs (withNON_GCP_PRIVATE_IP_PORTendpoints) to configurehybrid load balancing.To see which load balancers have this capability, referTable: Backendservices and supported backend types.
Protocol to the backends
When you create a backend service, you must specify the protocol used tocommunicate with the backends. You can specify only one protocol per backendservice — you cannot specify a secondary protocol to use as a fallback.
Which protocols are valid depends on the type of load balancer or whether youare using Cloud Service Mesh.
| Product | Backend service protocol options |
|---|---|
| Application Load Balancer | HTTP, HTTPS, HTTP/2 |
| Proxy Network Load Balancer | TCP or SSL The regional proxy Network Load Balancers support only TCP. |
| Passthrough Network Load Balancer | TCP, UDP, or UNSPECIFIED |
| Cloud Service Mesh | HTTP, HTTPS, HTTP/2, gRPC, TCP |
Changing a backend service's protocol makes the backends inaccessible throughload balancers for a few minutes.
IP address selection policy
This field is applicable to proxy load balancers.You must use the IP address selection policy to specify the traffic type that issent from the backend service to your backends.
When you select the IP address selection policy, ensure that your backendssupport the selected traffic type. For more information,seeTable: Backend services and supported backendtypes.
IP address selection policy is used when you want to convert your load balancerbackend service to support a different traffic type.For more information, seeConvert from single-stack to dual-stack.
You can specify the following values for the IP address selection policy:
| IP address selection policy | Description |
|---|---|
| Only IPv4 | Only send IPv4 traffic to the backends of the backend service, regardless of traffic from the client to the GFE. Only IPv4 health checks are used to check the health of the backends. |
| Prefer IPv6 | Prioritize the backend's IPv6 connection over the IPv4 connection (provided there is a healthy backend with IPv6 addresses). The health checks periodically monitor the backends' IPv6 and IPv4 connections. The GFE first attempts the IPv6 connection; if the IPv6 connection is broken or slow, the GFE useshappy eyeballs to fall back and connect to IPv4. Even if one of the IPv6 or IPv4 connections is unhealthy, the backend is still treated as healthy, and both connections can be tried by the GFE, with happy eyeballs ultimately selecting which one to use. |
| Only IPv6 | Only send IPv6 traffic to the backends of the backend service, regardless of traffic from the client to the proxy. Only IPv6 health checks are used to check the health of the backends. There is no validation to check if the backend traffic type matches the IP address selection policy. For example, if you have IPv4-only backends and select |
Encryption between the load balancer and backends
For information about encryption between the load balancer and backends, seeEncryption to thebackends.
Traffic distribution
The values of the following fields in the backend services resource determinesome aspects of the backend's behavior:
- Abalancing mode defines how the load balancer measures backend readinessfor new requests or connections.
- Atarget capacity defines a target maximum number of connections,a target maximum rate, or target maximum CPU utilization.
- Acapacity scaler adjusts overall available capacitywithout modifying the target capacity.
Balancing mode
The balancing mode determines whether the backends of a load balancer orCloud Service Mesh can handle additional traffic or are fullyloaded.
Google Cloud has four balancing modes:
CONNECTION: Determines how the load is spread based on the total number ofconnections that the backend can handle.RATE: The target maximum number of requests (queries) per second (RPS,QPS). The target maximum RPS/QPS can be exceeded if all backends are at orabove capacity.UTILIZATION: Determines how the load is spread based on the utilization ofinstances in an instance group.CUSTOM_METRICS: Determines how the load is spread based onuser-definedcustom metrics.
Balancing modes available for each load balancer
You set the balancing mode when you add a backend to the backend service. Thebalancing modes available to a load balancer depend on the type of load balancerand the type of backends.
Passthrough Network Load Balancers require theCONNECTION balancing mode but don'tsupport setting anytarget capacity.
Application Load Balancers support eitherRATE,UTILIZATION, orCUSTOM_METRICS balancing modes for instance group backends, andRATE orCUSTOM_METRICS balancing modes for zonal NEGs (GCE_VM_IP_PORT endpoints) andhybrid NEGs (NON_GCP_PRIVATE_IP_PORT endpoints). For any other type ofsupported backend, balancing mode must be omitted.
For classic Application Load Balancers, a region is selected based on the location ofthe client and whether the region has available capacity, based on the loadbalancing mode's target capacity. Then, within a region, the balancing mode'starget capacity is used to compute proportions for how many requests should goto each backend in the region. Requests or connections are then distributedin a round robin fashion among instances or endpoints within the backend.
For global external Application Load Balancers, a region is selected based on the location ofthe client and whether the region has available capacity, based on the loadbalancing mode's target capacity. Within a region, the balancing mode's targetcapacity is used to compute proportions for how many requests should go toeach backend (instance group or NEG) in the region. You can use theserviceload balancing policy(
serviceLbPolicy) and thepreferred backend setting to influence the selection of any specificbackends within a region. Furthermore, within each instance group or NEG,the load balancing policy (LocalityLbPolicy) determines how traffic isdistributed to instances or endpoints within the group.
- For cross-region internal Application Load Balancers, regional external Application Load Balancers, and regional internal Application Load Balancers, the balancing mode'starget capacity is used to compute proportions for how many requests shouldgo to each backend (instance group or NEG) in the region. Within eachinstance group or NEG, the load balancing policy (
LocalityLbPolicy)determines how traffic is distributed to instances or endpoints within thegroup. Only thecross-region internal Application Load Balancer support the use of theservice load balancingpolicy (serviceLbPolicy) and thepreferred backend settings to influence the selection of any specificbackends within a region.
Proxy Network Load Balancers support eitherCONNECTION orUTILIZATION balancing modes for VM instance group backends,CONNECTIONbalancing mode for zonal NEGs withGCE_VM_IP_PORT endpoints, andCONNECTIONbalancing mode for hybrid NEGs (NON_GCP_PRIVATE_IP_PORT endpoints). For anyother type of supported backend, balancing mode must be omitted.
For global external proxy Network Load Balancers, a region is selected based on the location ofthe client and whether the region has available capacity, based on the loadbalancing mode's target capacity. Within a region, the balancing mode's targetcapacity is used to compute proportions for how many requests should go toeach backend (instance group or NEG) in the region. You can use theserviceload balancing policy(
serviceLbPolicy) and thepreferred backend setting to influence the selection of any specificbackends within a region. Furthermore, within each instance group or NEG, theload balancing policy (LocalityLbPolicy) determines how traffic isdistributed to instances or endpoints within the group.For cross-region internal proxy Network Load Balancers, the configured region is selected first.Within a region, the balancing mode's target capacity is used to computeproportions for how many requests should go toeach backend (instance group or NEG) in the region. You can use theserviceload balancing policy(
serviceLbPolicy) and thepreferred backend setting to influence the selection of any specificbackends within a region. Furthermore, within each instance group or NEG, theload balancing policy (LocalityLbPolicy) determines how traffic isdistributed to instances or endpoints within the group.For classic proxy Network Load Balancers, a region is selected based onthe location of the client and whether the region has available capacitybased on the load balancing mode's target capacity. Then, within a region, theload balancing mode's target capacity is used to compute proportions for howmany requests or connections should go to each backend (instance group or NEG)in the region. After the load balancer has selected a backend, requests orconnections are then distributed in a round robin fashion among VM instancesor network endpoints within each individual backend.
- For regional external proxy Network Load Balancers and regional internal proxy Network Load Balancers, theload balancing mode's target capacity is used to compute proportions for howmany requests should go to each backend (instance group or NEG). Within eachinstance group or NEG, the load balancing policy (
localityLbPolicy)determines how traffic is distributed to instances or endpoints within thegroup.
The following table summarizes the load balancing modes available for eachload balancer and backend combination.
| Load balancer | Backends | Balancing modes available |
|---|---|---|
| Application Load Balancer | Instance groups | RATE,UTILIZATION, orCUSTOM_METRICS |
Zonal NEGs (GCE_VM_IP_PORT endpoints) | RATE orCUSTOM_METRICS | |
Hybrid NEGs (NON_GCP_PRIVATE_IP_PORT endpoints) | RATE orCUSTOM_METRICS | |
Proxy Network Load Balancer
| Instance groups | CONNECTION orUTILIZATION |
Zonal NEGs (GCE_VM_IP_PORT endpoints) | CONNECTION | |
Hybrid NEGs ( | CONNECTION | |
| Passthrough Network Load Balancer | Instance groups | CONNECTION |
Zonal NEGs (GCE_VM_IP endpoints) | CONNECTION |
If you observe poor distribution of traffic while using theUTILIZATION balancing mode, we recommend usingRATE instead.
TheUTILIZATION balancing mode depends on VM instance or CPU utilization along with other factors. When these factors fluctuate, the load balancer calculates capacities ineffectively, which frequently leads to poor distribution of traffic between backend groups. In contrast, forRATE balancing mode, the load balancer sends requests to the backend group with the lowest average latency over recent requests, or for HTTP/2 and HTTP/3, requests are sent to the backend group with the fewest outstanding requests.
If the average utilization of all VMs that are associated with a backend serviceis less than 10%, Google Cloud might prefer specific zones. This canhappen when you use regional managed instance groups, zonal managed instancegroups in different zones, and zonal unmanaged instance groups. This zonalimbalance automatically resolves as more traffic is sent to the load balancer.
For more information, seegcloud compute backend-servicesadd-backend.
Target capacity
Each balancing mode has a correspondingtarget capacity, which defines one ofthe following target maximums:
- Number of connections
- Rate
- CPU utilization
For every balancing mode, the target capacity is not a circuitbreaker. A load balancer can exceed the maximum under certain conditions, forexample, if all backend VMs or endpoints have reached the maximum.
Connection balancing mode
ForCONNECTION balancing mode, the target capacity defines a targetmaximum number of open connections. Except for internal passthrough Network Load Balancersand external passthrough Network Load Balancers, you must use one of the following settings to specify atarget maximum number of connections:
max-connections-per-instance(per VM): Target averagenumber of connections for a single VM.max-connections-per-endpoint(per endpoint in a zonal NEG): Target averagenumber of connections for a single endpoint.max-connections(per zonal NEGs and for zonal instance groups):Target average number of connections for the whole NEG orinstance group. For regional managed instance groups, usemax-connections-per-instanceinstead.
The following table shows how the target capacity parameter defines thefollowing:
- The target capacity for the whole backend
- The expected target capacity for each instance or endpoint
| Backend type | Target capacity | ||
|---|---|---|---|
| If you specify | Whole backend capacity | Expected per instance or per endpoint capacity | |
Instance groupN instances,H healthy | max-connections-per-instance=X | X × N | (X × N)/H |
Zonal NEGN endpoints,H healthy | max-connections-per-endpoint=X | X × N | (X × N)/H |
| Instance groups (except regional managed instance groups) H healthy instances | max-connections=Y | Y | Y/H |
As illustrated, themax-connections-per-instance andmax-connections-per-endpoint settings are proxies for calculating atarget maximum number of connections for the whole VM instance group or wholezonal NEG:
- In a VM instance group with
Ninstances, settingmax-connections-per-instance=Xhas the same meaning as settingmax-connections=X × N. - In a zonal NEG with
Nendpoints, settingmax-connections-per-endpoint=Xhas the same meaning as settingmax-connections=X × N.
Rate balancing mode
For theRATE balancing mode, you must define the target capacity usingone of the following parameters:
max-rate-per-instance(per VM): Provide a target average HTTPrequest rate for a single VM.max-rate-per-endpoint(per endpoint in a zonal NEG): Provide a targetaverage HTTP request rate for a single endpoint.max-rate(per zonal NEGs and for zonal instance groups): Provide atarget average HTTP request rate for the whole NEG or instance group. Forregional managed instance groups, usemax-rate-per-instanceinstead.
The following table shows how the target capacity parameter defines thefollowing:
- The target capacity for the whole backend
- The expected target capacity for each instance or endpoint
| Backend type | Target capacity | ||
|---|---|---|---|
| If you specify | Whole backend capacity | Expected per instance or per endpoint capacity | |
Instance groupN instances,H healthy | max-rate-per-instance=X | X × N | (X × N)/H |
zonal NEGN endpoints,H healthy | max-rate-per-endpoint=X | X × N | (X × N)/H |
| Instance groups (except regional managed instance groups) H healthy instances | max-rate=Y | Y | Y/H |
As illustrated, themax-rate-per-instance andmax-rate-per-endpoint settingsare proxies for calculating a target maximum rate of HTTP requests for the wholeinstance group or whole zonal NEG:
- In an instance group with
Ninstances, settingmax-rate-per-instance=Xhas the same meaning as settingmax-rate=X × N. - In a zonal NEG with
Nendpoints, settingmax-rate-per-endpoint=Xhasthe same meaning as settingmax-rate=X × N.
Utilization balancing mode
TheUTILIZATION balancing mode has no mandatory target capacity. You have anumber of options that depend on the type of backend, as summarized inthe table in the following section.
Themax-utilization target capacity can only be specified per instancegroup and cannot be applied to a particular VM in the group.
TheUTILIZATION balancing mode has no mandatory target capacity. When you usethe Google Cloud console to add a backend instance group to a backend service, theGoogle Cloud console sets the value ofmax-utilization to 0.8 (80%) if theUTILIZATION balancing mode is selected. In addition tomax-utilization, theUTILIZATION balancing mode supports more complex target capacities, assummarized in the table in the following section.
Custom metrics balancing mode
TheCUSTOM_METRICS balancing mode lets you define your own custom metricsthat can be used to determine how the load is spread. Custom metrics let youconfigure your load balancer's traffic distribution behavior to be based onmetrics specific to your application or infrastructure requirements, rather thanGoogle Cloud's standard utilization or rate-based metrics.
For more information, seeCustom metrics forApplication Load Balancers.
Changing the balancing mode of a load balancer
For some load balancers or load balancer configurations, you cannot change thebalancing mode because the backend service has only one possible balancing mode.For others, depending on the backend used, you can change the balancing modebecause more than one mode is available to those backend services.
To see which balancing modes are supported for each load balancer, refer theTable: Balancing modes available for each load balancer
Balancing modes and target capacity settings
For products that support a target capacity specification, the target capacityisnot a circuit breaker. When the configured target capacity maximum isreached in a given zone, new requests or connections are distributed to otherzones that aren't processing requests or connections at target capacity. If allzones have reached target capacity, new requests or connections are distributedby overfilling.
Application Load Balancers and Cloud Service Mesh
This table lists the available balancing mode and target capacity combinationsfor Application Load Balancers and Cloud Service Mesh.
| Backend type | Balancing mode | Target capacity specification |
|---|---|---|
Instance groups
| RATE | Youmust specify one of the following:
|
UTILIZATION | You canoptionally specify one of the following:
| |
CUSTOM_METRICS | You canoptionally specify one of the following:
max-utilization isn't supported. | |
Zonal NEGs
Hybrid NEGS
| RATE | Youmust specify one of the following:
|
CUSTOM_METRICS | You canoptionally specify one of the following:
max-utilization isn't supported. |
Proxy Network Load Balancers
This table lists the available balancing mode and target capacity combinationsfor Proxy Network Load Balancers.
| Backend type | Balancing mode | Target capacity specification |
|---|---|---|
Instance groups
| CONNECTION | Youmust specify one of the following:
|
UTILIZATION | You canoptionally specify one of the following:
| |
Zonal NEGs
Hybrid NEGS
| CONNECTION | Youmust specify one of the following:
|
Passthrough Network Load Balancers
This table lists the available balancing mode and target capacity combinationsfor Passthrough Network Load Balancers.
| Backend type | Balancing mode | Target capacity specification |
|---|---|---|
Instance groups
| CONNECTION | You cannot specify a target maximum number of connections. |
Zonal NEGs
| CONNECTION | You cannot specify a target maximum number of connections. |
Capacity scaler
Use capacity scaler to scale the target capacity (max utilization,max rate, or max connections) without changing the target capacity.
For the Google Cloud reference documentation, see the following:
- Google Cloud CLI:capacity-scaler
- API:
You can adjust the capacity scaler to scale the effective target capacitywithout explicitly changing one of the--max-* parameters.
You can set the capacity scaler to either of these values:
- The default value is
1, which means the group serves up to 100% of itsconfigured capacity (depending onbalancingMode). - A value of
0means the group is completely drained, offering 0% of itsavailable capacity. You cannot configure a setting of0when there is onlyone backend attached to the backend service. - A value from
0.1(10%) to1.0(100%).
The following examples demonstrate how the capacity scaler works inconjunction with the target capacity setting:
If the balancing mode is
RATE, themax-rateis set to80RPS, and thecapacity scaler is1.0, the available capacity is also80RPS.If the balancing mode is
RATE, themax-rateis set to80RPS,and the capacity scaler is0.5, the available capacity is40RPS (0.5 times 80).If the balancing mode is
RATE, themax-rateis set to80RPS,and the capacity scaler is0.0, the available capacity is zero (0).
Service load balancing policy
Aservice load balancing policy (serviceLbPolicy) is a resource associatedwith the load balancer'sbackendservice. It lets you customize theparameters that influence how traffic is distributed within the backendsassociated with a backend service:
- Customize the load balancing algorithm used to determine how traffic isdistributed among regions or zones.
- Enable auto-capacity draining so that the load balancer can quickly draintraffic from unhealthy backends.
Additionally, you can designate specific backends aspreferred backends. Thesebackends must be used to capacity (that is, the target capacity specified bythe backend's balancing mode) before requests are sent to the remainingbackends.
To learn more, seeAdvanced load balancing optimizations with a service loadbalancing policy.
Load balancing locality policy
For a backend service, traffic distribution is based on a balancing mode and aload balancing locality policy. The balancing mode determines the fraction oftraffic that should be sent to each backend (instance group or NEG). The loadbalancing locality policy then (LocalityLbPolicy) determines how traffic isdistributed across instances or endpoints within each zone. For regional managedinstance groups, the locality policy applies to each constituent zone.
The load balancing locality policy is configured per-backend service. Thefollowing settings are available:
ROUND_ROBIN(default): This is the default load balancing locality policysetting in which the load balancer selects a healthy backend in round robinorder.WEIGHTED_ROUND_ROBIN: The load balancer usesuser-defined custommetrics to select theoptimal instance or endpoint within the backend to serve the request.LEAST_REQUEST: AnO(1)algorithm in which the load balancer selects tworandom healthy hosts and picks the host which has fewer active requests.RING_HASH: This algorithm implements consistent hashing to backends. Thealgorithm has the property that the addition or removal of a host from a setof N hosts only affects 1/N of the requests.RANDOM: The load balancer selects a random healthy host.ORIGINAL_DESTINATION: The load balancer selects a backend based on theclient connection metadata. Connections are opened to the original destinationIP address specified in the incoming client request, before the request wasredirected to the load balancer.ORIGINAL_DESTINATIONis not supported for global andregional external Application Load Balancers.MAGLEV: Implements consistent hashing to backends and can be used as areplacement for theRING_HASHpolicy. Maglev is not as stable asRING_HASHbut has faster table lookup build times and host selection times. For moreinformation about Maglev, see theMaglevwhitepaper.WEIGHTED_MAGLEV: Implements per-instance weighted load balancing by usingweights reported by health checks. If this policy is used, the backend servicemust configure a non legacy HTTP-based health check, and health check repliesare expected to contain the non-standard HTTP response header field,X-Load-Balancing-Endpoint-Weight, to specify the per-instance weights. Loadbalancing decisions are made based on the per-instance weights reported in thelast processed health check replies, as long as every instance reports a validweight or reportsUNAVAILABLE_WEIGHT. Otherwise, load balancing will remainequal-weight.WEIGHTED_MAGLEVis supported only for External passthrough Network Load Balancers. For an example,seeSet up weighted load balancing forexternal passthrough Network Load Balancers.
Configuring a load balancing locality policy is supported only on backendservices used with the following load balancers:
- Global external Application Load Balancer
- Regional external Application Load Balancer
- Cross-region internal Application Load Balancer
- Regional internal Application Load Balancer
- Global external proxy Network Load Balancer
- Regional external proxy Network Load Balancer
- Cross-region internal proxy Network Load Balancer
- Regional internal proxy Network Load Balancer
- External passthrough Network Load Balancer
Note that the effective default value of the load balancing locality policy(localityLbPolicy) changes according to your session affinitysettings. If session affinity is not configured—that is, if sessionaffinity remains at the default value ofNONE—then thedefault value forlocalityLbPolicy isROUND_ROBIN. Ifsession affinity is set to a value other thanNONE, then thedefault value forlocalityLbPolicy isMAGLEV.
To configure a load balancing locality policy, you can use theGoogle Cloud console, gcloud(--locality-lb-policy)or the API(localityLbPolicy).
Cloud Service Mesh and traffic distribution
Cloud Service Mesh also uses backend service resources. Specifically,Cloud Service Mesh uses backend services whose load balancing scheme isINTERNAL_SELF_MANAGED. For an internal self-managed backend service, trafficdistribution is based on the combination of aload balancing mode and aload balancing policy. The backend service directs traffic to a backendaccording to the backend's balancing mode. Then Cloud Service Mesh distributestraffic according to a load balancing policy.
Internal self-managed backend services support the following balancing modes:
UTILIZATION, if all the backends are instance groupsRATE, if all the backends are either instance groups or zonal NEGs
If you chooseRATE balancing mode, you must specify a maximum rate, maximumrate per instance, or maximum rate per endpoint.
For more information about Cloud Service Mesh, seeCloud Service Mesh concepts.
Backend subsetting
Backend subsetting is an optional feature that improves performance andscalability by assigning a subset of backends to each of the proxy instances.
Backend subsetting is supported for the following:
- Regional internal Application Load Balancer
- Internal passthrough Network Load Balancer
Backend subsetting for regional internal Application Load Balancers
Preview
This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
The cross-region internal Application Load Balancer doesn't support backend subsetting.For regional internal Application Load Balancers, backend subsetting automatically assigns only asubset of the backends within the regional backend service to each proxyinstance. By default, each proxy instance opens connections to allthe backends within a backend service. When the number of proxy instances andthe backends are both large, opening connections to all the backends can lead toperformance issues.
By enabling subsetting, each proxy only opens connections to a subsetof the backends, reducing the number of connections which are kept open to eachbackend. Reducing the number of simultaneously open connections to each backendcan improve performance for both the backends and the proxies.
The following diagram shows a load balancer with two proxies. Without backendsubsetting, traffic from both proxies is distributed to all the backends in thebackend service 1. With backend subsetting enabled, traffic from each proxy isdistributed to a subset of the backends. Traffic from proxy 1 is distributed tobackends 1 and 2, and traffic from proxy 2 is distributed to backends 3 and 4.
You can additionally refine the load balancing traffic to the backends by setting thelocalityLbPolicy policy.For more information, seeTraffic policies.
To read about setting up backend subsetting for internal Application Load Balancers, seeConfigure backend subsetting.
Caveats related to backend subsetting for internal Application Load Balancer
- Although backend subsetting is designed to ensure that all backend instancesremain well utilized, it can introduce some bias in the amount of traffic thateach backend receives. Setting the
localityLbPolicytoLEAST_REQUESTisrecommended for backend services that are sensitive to the balance of backendload. - Enabling or disabling subsetting breaks existing connections.
- Backend subsetting requires that the session affinity is
NONE(a 5-tuple hash).Other session affinity options can only be used if backend subsetting isdisabled. The default values of the--subsetting-policyand--session-affinityflags are bothNONE, and only one of them at a timecan be set to a different value.
Backend subsetting for internal passthrough Network Load Balancer
Backend subsetting for internal passthrough Network Load Balancers lets you scale your internal passthrough Network Load Balancerto support a larger number of backend VM instances per internal backendservice.
For information about how subsetting affects this limit, see the"Backendservices" section of Load balancing resource quotas andlimits.
By default, subsetting is disabled, which limits the backend service todistributing to up to 250 backend instances or endpoints. If your backendservice needs to support more than 250 backends, you can enable subsetting. Whensubsetting is enabled, a subset of backend instances is selected for each clientconnection.
The following diagram shows a scaled-down model of the difference between thesetwo modes of operation.
Without subsetting, the complete set of healthy backends is better utilized, andnew client connections are distributed among all healthy backends accordingtotraffic distribution. Subsettingimposes load balancing restrictions but allows the load balancer to support morethan 250 backends.
For configuration instructions, seeSubsetting.
Caveats related to backend subsetting for internal passthrough Network Load Balancer
- When subsetting is enabled, not all backends will receive traffic from a givensender even when the number of backends is small.
- For the maximum number of backend instances when subsetting is enabled, seethe quotas page .
- Only 5-tuplesession affinityis supported with subsetting.
- Packet Mirroring is not supported with subsetting.
- Enabling or disabling subsetting breaks existing connections.
- If on-premises clients need for to access an internal passthrough Network Load Balancer, subsetting cansubstantially reduce the number of backends that receive connections from youron-premises clients. This is because the region of the Cloud VPNtunnel or Cloud Interconnect VLAN attachment determines the subset ofthe load balancer's backends. All Cloud VPN andCloud Interconnect endpoints in a specific region use the samesubset. Different subsets are used in different regions.
Backend subsetting pricing
There is no charge for using backend subsetting.For more information, seeAll networking pricing.
Session affinity
Session affinity lets you control how the load balancer selects backendsfor new connections in a predictable way as long as the number of healthybackends remains constant. This is useful for applications that need multiplerequests from a given user to be directed to the same backend or endpoint. Suchapplications usually include stateful servers used by ads serving, games, orservices with heavy internal caching.
Google Cloud load balancers provide session affinity on a best-effortbasis. Factors such as changing backend health check states, adding or removingbackends, changes in backend weights (including enabling or disabling weightedbalancing), or changes to backend fullness, as measured by the balancing mode,can break session affinity.
Load balancing with session affinity works well when there is areasonably largedistribution of unique connections. Reasonably large means at least severaltimes the number of backends. Testing a load balancer with a small number ofconnections won't result in an accurate representation of the distribution ofclient connections among backends.
By default, all Google Cloud load balancers select backends by using afive-tuple hash (--session-affinity=NONE), as follows:
- Packet's source IP address
- Packet's source port (if present in the packet's header)
- Packet's destination IP address
- Packet's destination port (if present in the packet's header)
- Packet's protocol
To learn more about session affinity for passthrough Network Load Balancers, see the followingdocuments:
- Traffic distribution for external passthrough Network Load Balancers
- Traffic distribution for internal passthrough Network Load Balancers
To learn more about session affinity for Application Load Balancers, see thefollowing documents:
- Session affinity for external Application Load Balancers
- Session affinity for internal Application Load Balancers
To learn more about session affinity for proxy Network Load Balancers, see thefollowing documents:
- Session affinity for external proxy Network Load Balancers
- Session affinity for internal proxy Network Load Balancers
Backend service timeout
Most Google Cloud load balancers have abackend service timeout. Thedefault value is 30 seconds. The full range of timeout values allowed is1 - 2,147,483,647 seconds.
For external Application Load Balancers and internal Application Load Balancers using the HTTP, HTTPS, orHTTP/2 protocol, the backend service timeout is a request and response timeoutfor HTTP(S) traffic.
For more details about the backend service timeout for each load balancer, seethe following:
- For global external Application Load Balancers and regional external Application Load Balancers, seeTimeouts and retries.
- For internal Application Load Balancers, seeTimeouts and retries.
For external proxy Network Load Balancers and internal proxy Network Load Balancers, the configuredbackend service timeout is the length of time the load balancer keeps the TCPconnection open in the absence of any data transmitted from either the clientor the backend. After this time has passed without any data transmitted, theproxy closes the connection.
- Default value: 30 seconds
- Configurable range: 1 to 2,147,483,647 seconds
For internal passthrough Network Load Balancers and external passthrough Network Load Balancers, you can set the value ofthe backend service timeout using
gcloudor the API, but the value isignored. Backend service timeout has no meaning for these pass-throughload balancers.
- For Cloud Service Mesh, the backend service timeout field (specified using
timeoutSec) is not supported with proxyless gRPC services.For such services, configure the backend service timeout using themaxStreamDurationfield. This is because gRPC does not support thesemantics oftimeoutSecthat specifies the amount of time to wait for abackend to return a full response after the request is sent. gRPC's timeoutspecifies the amount of time to wait from the beginning of the stream untilthe response has been completely processed, including all retries.
Health checks
Each backend service whose backends are instance groups or zonal NEGs musthave an associatedhealth check. Backendservices using a serverless NEG or a global internet NEG as a backend mustnotreference a health check.
When you create a load balancer using the Google Cloud console, you can create thehealth check, if it is required, when you create the load balancer, or you canreference an existing health check.
When you create a backend service using either instance group or zonal NEGbackends using the Google Cloud CLI or the API, you must reference anexisting health check. Refer to theload balancerguide in theHealthChecks Overview for details about the type and scope of health check required.
For more information, read the following documents:
Additional features enabled on the backend service resource
The following optional features are supported by some backend services.
Cloud CDN
Cloud CDN uses Google's global edge network to serve content closer tousers, which accelerates your websites and applications. Cloud CDN isenabled on backend services used byglobal external Application Load Balancers. The load balancerprovides the frontend IP addresses and ports that receive requests, and thebackends that respond to the requests.
For more details, see theCloud CDN documentation.
Cloud CDN is incompatible with IAP. They can't beenabled on the same backend service.
Cloud Armor
If you use one of the following load balancers, you can add additionalprotection to your applications by enabling Cloud Armor on the backendservice during load balancer creation:
- Global external Application Load Balancer
- Classic Application Load Balancer
- Global external proxy Network Load Balancer
- Classic proxy Network Load Balancer
If you use the Google Cloud console, you can do one of the following:
- Select an existingCloud Armor security policy.
- Accept the configuration of a default Cloud Armor rate-limitingsecurity policy with a customizable name, request count, interval, key, andrate limiting parameters. If you use Cloud Armor with an upstreamproxy service, such as a CDN provider,
Enforce_on_keyshould be set as anXFF IP address. - Choose to opt out of Cloud Armor protection by selectingNone.
IAP
IAP lets you establish a centralauthorization layer for applications accessed by HTTPS, so you can use anapplication-level access control model instead of relying on network-levelfirewalls. IAP is supported bycertainApplication Load Balancers.
IAP is incompatible with Cloud CDN. They can't beenabled on the same backend service.
Advanced traffic management features
To learn about advanced traffic management features that are configured on thebackend services and URL maps associated with load balancers, see the following:
- Traffic management overview forinternal Application Load Balancers
- Traffic management overview forglobal external Application Load Balancers
- Traffic management overview forregional external Application Load Balancers
API andgcloud reference
For more information about the properties of the backend service resource,see the following references:
- Global backend service APIresource
Regional backend service APIresource
gcloud compute backend-servicespage, for both global and regional backend services
What's next
For related documentation and information about how backend services are used inload balancing, review the following:
- Create custom headers
- Create an external Application Load Balancer
- External Application Load Balancer overview
- Enable connection draining
- Encryption in transit in Google Cloud
For related videos:
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.