Internal Application Load Balancer overview

This document introduces the concepts that you need to understand to configureinternal Application Load Balancers.

A Google Cloud internal Application Load Balancer is a proxy-based Layer 7 load balancer that enablesyou to run and scale your services behind a single internal IP address. Theinternal Application Load Balancer distributes HTTP and HTTPS traffic to backends hosted on a varietyof Google Cloud platforms such as Compute Engine, Google Kubernetes Engine (GKE), andCloud Run. For details, seeUse cases.

Modes of operation

You can configure an internal Application Load Balancer in the following modes:

Cross-region internal Application Load Balancer. This is a multi-region load balancer thatis implemented as a managed service based on the open-sourceEnvoyproxy. The cross-region modeenables you to load balance traffic to backend services that are globallydistributed, including traffic management that helps ensure that traffic isdirected to the closest backend. This load balancer also enables highavailability.Placing backends in multiple regions helps avoid failures in asingle region. If one region's backends are down, traffic can fail over toanother region.

Regional internal Application Load Balancer. This is a regional load balancer that isimplemented as a managed service based on the open-sourceEnvoyproxy. The regional mode requires that backendsbe in a single Google Cloud region. Clients can be limited to thatregion or can be in any region, based on whether global access is disabled orenabled on the forwarding rule. This load balancer is enabled with richtraffic control capabilities based on HTTP or HTTPS parameters. After the loadbalancer is configured, it automatically allocates Envoy proxies to meet yourtraffic needs.

The following table describes the important differences between cross-regionand regional modes:

Load balancer mode	Feature
Load balancer mode	Virtual IP address (VIP) of the load balancer	Client access	Load balanced backends	High availability and failover
Cross-region internal Application Load Balancer	Allocated from a subnet in a specific Google Cloud region. VIP addresses from multiple regions can share the same global backend service. You can configure DNS-based global load balancing by using DNS routing policies to route client requests to the closest VIP address.	Always globally accessible. Clients from any Google Cloud region in a VPC can send traffic to the load balancer.	Global backends. Load balancer can send traffic to backends in any region.	Automatic failover to healthy backends in the same or different regions.
Regional internal Application Load Balancer	Allocated from a subnet in a specific Google Cloud region.	Not globally accessible by default. You can optionally enable global access.	Regional backends. Load balancer can only send traffic to backends that are in the same region as the proxy of the load balancer.	Automatic failover to healthy backends in the same region.

Identify the mode

Console

In the Google Cloud console, go to theLoad balancing page.
Go to Load balancing
On theLoad Balancers tab, you can see the load balancer type,protocol, and region. If the region is blank, then the load balanceris in the cross-region mode.The following table summarizes how to identify the mode of the load balancer.
Load balancer mode Load balancer type Access type Region
Cross-region internal Application Load Balancer Application Internal
Regional internal Application Load Balancer Application Internal Specifies a region

Load balancer mode	Load balancer type	Access type	Region
Cross-region internal Application Load Balancer	Application	Internal
Regional internal Application Load Balancer	Application	Internal	Specifies a region

gcloud

To determine the mode of a load balancer, run the following command:
```
gcloud compute forwarding-rules describeFORWARDING_RULE_NAME
```
In the command output, check the load balancing scheme, region, and networktier. The following table summarizes how to identify the mode of the loadbalancer.
Load balancer mode Load balancing scheme Forwarding rule
Cross-region internal Application Load Balancer INTERNAL_MANAGED Global
Regional internal Application Load Balancer INTERNAL_MANAGED Regional

Load balancer mode	Load balancing scheme	Forwarding rule
Cross-region internal Application Load Balancer	INTERNAL_MANAGED	Global
Regional internal Application Load Balancer	INTERNAL_MANAGED	Regional

Important: After you create a load balancer, you can't edit its mode. Instead,you must delete the load balancer and create a new one.

Architecture and resources

The following diagram shows the Google Cloud resources required forinternal Application Load Balancers:

Cross-region internal Application Load Balancer

This diagram shows the components of a cross-region internal Application Load Balancerdeployment in Premium Tier within the sameVPC network. Each global forwarding rule uses a regional IPaddress that the clients use to connect.

Regional internal Application Load Balancer

This diagram shows the components of a regional internal Application Load Balancerdeployment in Premium Tier.

The following resources are required for an internal Application Load Balancer deployment:

Proxy-only subnet

In the previous diagram, theproxy-only subnet provides a set of IP addressesthat Google uses to run Envoy proxies on your behalf. You must create aproxy-only subnet in each region of a VPC network where you useinternal Application Load Balancers.

The following table describes the differences between proxy-only subnets in thecross-region and regional modes. Cross-region and regional load balancers cannotshare the same subnets.

Load balancer mode Value of the proxy-only subnet--purpose flag

Cross-region internal Application Load Balancer

Load balancer mode	Value of the proxy-only subnet`--purpose` flag
Cross-region internal Application Load Balancer	GLOBAL_MANAGED_PROXY The cross-region Envoy-based load balancer must have a proxy-only subnet in each region in which the load balancer is configured. Cross-region load balancer proxies in the same region and network share the same proxy-only subnet.
Regional internal Application Load Balancer	REGIONAL_MANAGED_PROXY All the regional Envoy-based load balancers in a region and VPC network share the same proxy-only subnet.

GLOBAL_MANAGED_PROXY

The cross-region Envoy-based load balancer must have a proxy-only subnet in each region in which the load balancer is configured. Cross-region load balancer proxies in the same region and network share the same proxy-only subnet.

Regional internal Application Load Balancer

REGIONAL_MANAGED_PROXY

All the regional Envoy-based load balancers in a region and VPC network share the same proxy-only subnet.

Further:

Proxy-only subnets areonly used for Envoy proxies, not your backends.
Backend VMs or endpoints of all internal Application Load Balancers in a region andVPC network receive connections from the proxy-only subnet.
The virtual IP address of an internal Application Load Balancer isnot located in the proxy-onlysubnet. The load balancer's IP address is defined by its internal managedforwarding rule, which is described below.

Forwarding rule and IP address

Forwarding rules route trafficby IP address, port, and protocol to a load balancing configuration that consistsof a target proxy and a backend service.

IP address specification. Each forwarding rule references a single regionalIP address that you can use in DNS records for your application. You can eitherreserve a static IP address that you can use or let Cloud Load Balancing assignone for you. We recommend that you reserve a static IP address; otherwise, youmust update your DNS record with the newly assigned ephemeral IP addresswhenever you delete a forwarding rule and create a new one.

Clients use the IP address and port to connect to the load balancer's Envoyproxies—the forwarding rule's IP address is the IP address of the load balancer(sometimes called a virtual IP address or VIP). Clients connecting to a loadbalancer must use HTTP version 1.1 or later. For the complete list of supportedprotocols, seeLoad balancer feature comparison.

The internal IP address associated with the forwarding rule can come from asubnet in the same network and region as your backends.

Port specification. Each forwarding rule for an Application Load Balancer canreference asingle port from1-65535. Tosupport multiple ports, you must configure multiple forwarding rules. You canconfigure multiple forwarding rules to use the same internal IP address (VIP)and to reference the same target HTTP or HTTPS proxy as long as the overallcombination of IP address, port, and protocol is unique for each forwardingrule. This way, you can use a single load balancer with a shared URL map as aproxy for multiple applications.

The type of forwarding rule, IP address, and load balancing scheme used byinternal Application Load Balancers depends on the mode of the load balancer.

Cross-region internal Application Load Balancer
	Forwarding rule	`globalForwardingRules.insert` method
	Regional IP address	`addresses.insert` method
	Load balancing scheme	`INTERNAL_MANAGED`
	IP address (optional)	`SHARED_LOADBALANCER_VIP`
	Routing from the client to the load balancer's frontend	Global access is enabled by default to allow clients from any region in a VPC to access your load balancer. Backends can be in multiple regions.
Regional internal Application Load Balancer
	Forwarding rule	`forwardingRules.insert` method
	Regional IP address	`addresses.insert` method
	Load balancing scheme	`INTERNAL_MANAGED`
	IP address (optional)	`SHARED_LOADBALANCER_VIP`
	Routing from the client to the load balancer's frontend	You can enable global access to allow clients from any region in a VPC to access your load balancer. Backends must also be in the same region as the load balancer.

Forwarding rules and VPC networks

This section describes how forwarding rules used by internal Application Load Balancers areassociated with VPC networks.

Load balancer mode	VPC network association
Cross-region internal Application Load Balancer Regional internal Application Load Balancer	Regional internal IPv4 addresses always exist inside VPC networks. When you create the forwarding rule, you're required to specify the subnet from which the internal IP address is taken. This subnet must be in the same region and VPC network where a proxy-only subnet has been created. Thus, there is an implied network association.

Target proxy

A target HTTP or HTTPS proxy terminates HTTP(S) connections from clients.The HTTP(S) proxy consults the URL map to determine how to route traffic tobackends. A target HTTPS proxy uses an SSL certificate to authenticate itself toclients.

The load balancer preserves the Host header of the original client request. Theload balancer also appends two IP addresses to theX-Forwarded-For header:

The IP address of the client that connects to the load balancer
The IP address of the load balancer's forwarding rule

If there is noX-Forwarded-For header on the incoming request, these two IPaddresses are the entire header value. If the request has anX-Forwarded-For header, other information, such as the IP addresses recordedby proxies on the way to the load balancer, are preserved before the two IPaddresses. The load balancer doesn't verify any IP addresses that precede thelast two IP addresses in this header.

If you are running a proxy as the backend server, this proxy typically appendsmore information to theX-Forwarded-For header, and your software might need totake that into account. The proxied requests from the load balancer come from anIP address in the proxy-only subnet, and your proxy on the backend instancemight record this address as well as the backend instance's own IP address.

Depending on the type of traffic your application needs to handle, you canconfigure a load balancer with either a target HTTP proxy or a target HTTPS proxy.

The following table shows the target proxy APIs required by internal Application Load Balancers:

Load balancer mode	Target proxy
Cross-region internal Application Load Balancer	`targetHttpProxies` `targetHttpsProxies`
Regional internal Application Load Balancer	`regionTargetHttpProxies` `regionTargetHttpsProxies`

SSL certificates

Internal Application Load Balancers using target HTTPS proxies require private keys andSSL certificates as part of the load balancer configuration.

The following table specifies the type of SSL certificates required byinternal Application Load Balancers in each mode:

Load balancer mode	SSL certificate type
Cross-region internal Application Load Balancer	Certificate Managerself-managed certificates and Google-managed certificates. The following types of Google-managed certificates are supported with Certificate Manager: DNS Authorization with Public DNS. For more information, seeDeploy a global Google-managed certificate with DNS authorization. Private Certificate Authority Service. For more information, seeCreate a Google-managed certificate issued by your CA Service instance. Google-managed certificates with load balancer authorization aren't supported. Compute Engine SSL certificates aren't supported.
Regional internal Application Load Balancer	Compute Engine regional SSL certificates Certificate Managerregional self-managed certificates and Google-managed certificates. The following types of Google-managed certificates are supported with Certificate Manager: Regional Google-managed certificates with per-project DNS authorization. For more information, seeDeploy a regional Google-managed certificate. Regional Google-managed certificates with private Certificate Authority Service. For more information, seeDeploy a regional Google-managed certificate with CA Service. Google-managed certificates with load balancer authorization aren't supported.

Load balancer mode

SSL certificate type

Cross-region internal Application Load Balancer

Certificate Managerself-managed certificates and Google-managed certificates.

The following types of Google-managed certificates are supported with Certificate Manager:

DNS Authorization with Public DNS. For more information, seeDeploy a global Google-managed certificate with DNS authorization.
Private Certificate Authority Service. For more information, seeCreate a Google-managed certificate issued by your CA Service instance.

Google-managed certificates with load balancer authorization aren't supported.

Compute Engine SSL certificates aren't supported.

Regional internal Application Load Balancer

Compute Engine regional SSL certificates

Certificate Managerregional self-managed certificates and Google-managed certificates.

The following types of Google-managed certificates are supported with Certificate Manager:

Regional Google-managed certificates with per-project DNS authorization. For more information, seeDeploy a regional Google-managed certificate.
Regional Google-managed certificates with private Certificate Authority Service. For more information, seeDeploy a regional Google-managed certificate with CA Service.

Google-managed certificates with load balancer authorization aren't supported.

URL maps

The target HTTP(S) proxy usesURL mapsto make a routing determination based on HTTP attributes (such as the request path,cookies, or headers). Based on the routing decision, the proxy forwards clientrequests to specific backend services. The URL map can specify additional actionsto take such as rewriting headers, sending redirects to clients, and configuringtimeout policies (among others).

The following table specifies the type of URL map required byinternal Application Load Balancers in each mode:

Load balancer mode	URL map type
Cross-region internal Application Load Balancer	`urlMaps`
Regional internal Application Load Balancer	`regionUrlMaps`

Backend service

A backend service provides configuration information to the load balancer sothat it can direct requests to its backends—for example,Compute Engine instance groups or network endpoint groups (NEGs). Formore information about backend services, seeBackend servicesoverview.

Backend service scope

The following table indicates which backend service resource and scope is usedby internal Application Load Balancers:

Load balancer mode	Backend service resource
Cross-region internal Application Load Balancer	`backendServices`
Regional internal Application Load Balancer	`regionBackendServices`

Protocol to the backends

Backend services for Application Load Balancers must use one of the followingprotocols to send requests to backends:

HTTP, which uses HTTP/1.1 and no TLS
HTTPS, which uses HTTP/1.1 and TLS
HTTP/2, which uses HTTP/2 and TLS (HTTP/2 without encryption isn'tsupported.)
H2C, which uses HTTP/2 over TCP. TLS isn't required. H2C isn't supportedfor classic Application Load Balancers.

The load balancer only uses the backend service protocol that you specify tocommunicate with its backends. The load balancer doesn't fall back to adifferent protocol if it is unable to communicate with backends using thespecified backend service protocol.

The backend service protocol doesn't need to match the protocol used by clientsto communicate with the load balancer. For example, clients can send requeststo the load balancer using HTTP/2, but the load balancer can communicate withbackends using HTTP/1.1 (HTTP or HTTPS).

Backends

The following table specifies the backend features supported by internal Application Load Balancersin each mode.

Load balancer mode	Supported backends on a backend service¹
	Instance groups²	Zonal NEGs³	Internet NEGs	Serverless NEGs	Hybrid NEGs	Private Service Connect NEGs
Cross-region internal Application Load Balancer				Cloud Run
Regional internal Application Load Balancer				Cloud Run

¹ Backends on a backend service must be the same type: all instancegroups or all the same type of NEG. An exception to this rule is that bothGCE_VM_IP_PORT zonal NEGs and hybrid NEGs can be used on the samebackend service to support ahybrid architecture.

² Combinations of zonal unmanaged, zonal managed, and regionalmanaged instance groups are supported on the same backend service. When usingautoscaling for a managed instance group that's a backend for two or morebackend services, configure the instance group's autoscaling policy to use multiple signals.

³ Zonal NEGs must useGCE_VM_IP_PORT endpoints.

Backends and VPC networks

Note: This section describes VPC network association fordeployments where the load balancer's backend service and backends are in thesame project as the forwarding rule. If you also want the backend service to bein a different project from the forwarding rule, you need to configure theload balancer in a Shared VPC environment with cross-project servicereferencing.

The restrictions on where backends can be located depend on the type ofbackend.

For instance groups, zonal NEGs, and hybrid connectivity NEGs, all backendsmust be located in the same project and region as the backend service.However, a load balancer can reference a backend that uses a differentVPC network in the same project as the backend service.Connectivity between the load balancer'sVPC network and the backend VPC networkcan be configured using either VPC Network Peering, Cloud VPNtunnels, Cloud Interconnect VLAN attachments, or a Network Connectivity Centerframework.
Backend network definition
- For zonal NEGs and hybrid NEGs, you explicitly specify theVPC network when you create the NEG.
- For managed instance groups, the VPC network is defined inthe instance template.
- For unmanaged instance groups, the instance group'sVPC network is set to match the VPC networkof thenic0 interface for the first VM added to the instance group.
Backend network requirements
Your backend's network must satisfyone of the following networkrequirements:
- The backend's VPC network must exactly match theforwarding rule's VPC network.
- The backend's VPC network must be connected to theforwarding rule's VPC network usingVPC Network Peering. You must configure subnet route exchanges toallow communication between the proxy-only subnet in the forwarding rule'sVPC network and the subnets used by the backend instancesor endpoints.

Both the backend's VPC network and the forwarding rule'sVPC network must beVPCspokesattached to the sameNetwork Connectivity Centerhub.Import and export filters must allow communication between the proxy-onlysubnet in the forwarding rule's VPC network and the subnetsused by backend instances or endpoints.

For all other backend types, all backends must be located in the sameVPC network and region.

Backends and network interfaces

If you use instance group backends, packets are always delivered tonic0. Ifyou want to send packets to non-nic0 interfaces (eithervNICs orDynamic Network Interfaces), useNEG backends instead.

If you use zonal NEG backends, packets are sent to whatever network interface isrepresented by the endpoint in the NEG. The NEG endpoints must be in the sameVPC network as the NEG's explicitly defined VPCnetwork.

Backend subsetting

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Backend subsetting is an optional feature supported by regional internal Application Load Balancersthat improves performance and scalability by assigning a subset of backends toeach of the proxy instances.

By default, backend subsetting is disabled.For information about enabling this feature, seeBackend subsetting for regional internal Application Load Balancers.

Health checks

Each backend service specifies a health check that periodically monitors thebackends' readiness to receive a connection from the load balancer. This reducesthe risk that requests might be sent to backends that can't service the request.Health checks don't check whether the application itself is working.

For the health check probes to succeed, you must create an Ingress allowfirewall rule that allows health check probes to reach your backendinstances. Typically, health check probes originate from Google's centralized healthchecking mechanism. However for hybrid NEGs, health checks originate from theproxy-only subnet instead. For details, seeDistributed Envoy healthchecks.

Health check protocol

Although it isn't required and isn't always possible, it is a best practice touse a health check whose protocol matches theprotocol of the backendservice.For example, an HTTP/2 health check most accurately tests HTTP/2 connectivity tobackends. In contrast, internal Application Load Balancers that use hybrid NEG backends don'tsupport gRPC health checks. For the list of supported health checkprotocols, see the load balancing features in theHealthchecks section.

The following table specifies the scope of health checks supported byinternal Application Load Balancers:

Load balancer mode	Health check type
Cross-region internal Application Load Balancer	`healthChecks`
Regional internal Application Load Balancer	`regionHealthChecks`

For more information about health checks, see the following:

Firewall rules

An internal Application Load Balancer requires the following firewall rules:

An ingress allow rule that permits traffic from Google's central health checkranges. For more information about the specific health check probe IP addressranges and why it's necessary to allow traffic from them, seeProbe IP rangesand firewall rules.
An ingress allow rule that permits traffic from theproxy-onlysubnet.

There are certain exceptions to the firewall rule requirements for these ranges:

Allowing traffic from Google's health check probe ranges isn't required for hybridNEGs. However, if you're using a combination of hybrid and zonal NEGs ina single backend service, you need to allow traffic from theGooglehealth check probe ranges for the zonal NEGs.
For regional internet NEGs, health checks are optional. Traffic from loadbalancers usingregional internet NEGs originates from theproxy-only subnet and is thenNAT-translated (by using Cloud NAT) to either manually or automatically allocatedNAT IP addresses. This traffic includes both health check probes and userrequests from the load balancer to the backends. For details, seeRegional NEGs:Use a Cloud NAT gateway.

Client access

Clients can be in the same network or in a VPC networkconnected by usingVPC Network Peering.

For cross-region internal Application Load Balancers, global access is enabled by default. Clients fromany region in a VPC can access your load balancer.

For regional internal Application Load Balancers, clients must be in the same region as the load balancer by default.You canenable global accessto allow clients from any region in a VPC to access your load balancer.

The following table summarizes client access for regional internal Application Load Balancers:

Global access disabled	Global access enabled
Clients must be in the same region as the load balancer. They also must be in the same VPC network as the load balancer or in a VPC network that is connected to the load balancer's VPC network by using VPC Network Peering.	Clients can be in any region. They still must be in the same VPC network as the load balancer or in a VPC network that's connected to the load balancer's VPC network by using VPC Network Peering.
On-premises clients can access the load balancer throughCloud VPN tunnels or VLAN attachments. These tunnels or attachments must be in the same region as the load balancer.	On-premises clients can access the load balancer through Cloud VPN tunnels or VLAN attachments. These tunnels or attachments can be in any region.

GKE support

GKE uses internal Application Load Balancers in the following ways:

Internal Gateways created using theGKE Gatewaycontroller can use any mode ofan Internal Application Load Balancer. You control the load balancer's mode by choosing aGatewayClass. TheGKE Gateway controller always usesGCE_VM_IP_PORT zonal NEGbackends.
Internal Ingresses created using theGKE Ingresscontroller are alwaysregional internal Application Load Balancers. The GKE Ingress controlleralways usesGCE_VM_IP_PORT zonal NEG backends.

You can useGCE_VM_IP_PORT zonal NEG created and managed byGKE Services as backends for any Application Load Balancer orProxy Network Load Balancer. For more information, seeContainer-native loadbalancing through standalone zonalNEGs.

Shared VPC architectures

Internal Application Load Balancers support networks that use Shared VPC.Shared VPC lets organizations connect resources from multiple projectsto a common VPC network so that they can communicate with eachother securely and efficiently using internal IPs from that network. If you'renot already familiar with Shared VPC, read theShared VPCoverview documentation.

There are many ways to configure an internal Application Load Balancer within aShared VPC network. Regardless of type of deployment, all thecomponents of the load balancer must be in the same organization.

Subnets and IP address	Frontend components	Backend components
Create the required network and subnets (including the proxy-only subnet), in the Shared VPC host project. The load balancer's internal IP address can be defined in either the host project or a service project, but itmust use a subnet in the desired Shared VPC network in the host project. The address itself comes from the primary IP range of the referenced subnet.	The regional internal IP address, the forwarding rule, the target HTTP(S) proxy, and the associated URL map must be defined in the same project. This project can be the host project or a service project.	You can do one of the following: Create backend services andbackends (instance groups, serverless NEGs, or any other supported backend types) in thesame service project as the frontend components. Create backend services and backends (instance groups, serverless NEGs, or any other supported backend types) in as many service projects as required. A single URL map can reference backend services across different projects. This type of deployment is known ascross-project service referencing. Each backend service must be defined in the same project as the backends it references. Health checks associated with backend services must be defined in the same project as the backend service as well.

Subnets and IP address

Frontend components

Backend components

Create the required network and subnets (including the proxy-only subnet), in the Shared VPC host project.

The load balancer's internal IP address can be defined in either the host project or a service project, but itmust use a subnet in the desired Shared VPC network in the host project. The address itself comes from the primary IP range of the referenced subnet.

The regional internal IP address, the forwarding rule, the target HTTP(S) proxy, and the associated URL map must be defined in the same project. This project can be the host project or a service project.

You can do one of the following:

Create backend services andbackends (instance groups, serverless NEGs, or any other supported backend types) in thesame service project as the frontend components.
Create backend services and backends (instance groups, serverless NEGs, or any other supported backend types) in as many service projects as required. A single URL map can reference backend services across different projects. This type of deployment is known ascross-project service referencing.

Each backend service must be defined in the same project as the backends it references. Health checks associated with backend services must be defined in the same project as the backend service as well.

While you can create all the load balancing components and backends in theShared VPC host project, this type of deployment doesn't separatenetwork administration and service development responsibilities.

All load balancer components and backends in a service project

The following architecture diagram shows a standard Shared VPCdeployment where all load balancer components and backends are in a serviceproject. This deployment type is supported by all Application Load Balancers.

The load balancer uses IP addresses and subnets from the host project. Clientscan access an internal Application Load Balancer if they are in the sameShared VPC network and region as the load balancer. Clients can belocated in the host project, or in an attached service project, or anyconnectednetworks.

Internal Application Load Balancer on Shared VPC network. — Internal Application Load Balancer on Shared VPC network (click to enlarge).

Serverless backends in a Shared VPC environment

For an internal Application Load Balancer that is using a serverless NEG backend, the backingCloud Run service must be in the same service project as thethe backend service and the serverless NEG. The load balancer's frontendcomponents (forwarding rule, target proxy, URL map) can be created in either thehost project, the same service project as the backend components, or any otherservice project in the same Shared VPC environment.

Cross-project service referencing

Cross-project service referencing is a deployment model where the loadbalancer's frontend and URL map are in one project and the load balancer'sbackend service and backends are in a different project.

Cross-project service referencing lets organizations configure one centralload balancer and route traffic to hundreds of services distributed acrossmultiple different projects. You can centrally manage all traffic routing rulesand policies in one URL map. You can also associate the load balancer with asingle set of hostnames and SSL certificates. You can therefore optimize thenumber of load balancers needed to deploy your application, and lowermanageability, operational costs, and quota requirements.

By having different projects for each of your functional teams, you can alsoachieve separation of roles within your organization. Service owners can focuson building services in service projects, while network teams can provision andmaintain load balancers in another project, and both can be connected by usingcross-project service referencing.

Service owners can maintain autonomy over the exposure of their services andcontrol which users can access their services by using the load balancer. This isachieved by a special IAM role called theCompute Load Balancer Services User role(roles/compute.loadBalancerServiceUser).

For internal Application Load Balancers, cross-project service referencing is only supported withinShared VPC environments.

To learn how to configure Shared VPC for an internal Application Load Balancer—with andwithout cross-project service referencing—seeSet up an internal Application Load Balancer withShared VPC.

Usage notes for cross-project service referencing

You can't reference a cross-project backend service if the backend service has regional internet NEG backends. All other backend types are supported.
Google Cloud doesn't differentiate between resources (for example, backend services) using the same name across multiple projects. Therefore, when you are using cross-project service referencing, we recommend that you use unique backend service names across projects within your organization.

Example 1: Load balancer frontend and backend in different service projects

Here is an example of a Shared VPC deployment where the load balancer'sfrontend and URL map are created in service project A and the URL map referencesa backend service in service project B.

In this case, Network Admins or Load Balancer Admins in service project Arequire access to backend services in service project B. Service project Badmins grant the Compute Load Balancer Services User role(roles/compute.loadBalancerServiceUser) to Load Balancer Admins inservice project A who want to reference the backendservice in service project B.

Load balancer frontend and URL map in service project. — Load balancer frontend and backend in different service projects (click to enlarge).

Example 2: Load balancer frontend in the host project and backends in service projects

Here is an example of a Shared VPC deployment where the load balancer'sfrontend and URL map are created in the host project and the backend services(and backends) are created in service projects.

In this case, Network Admins or Load Balancer Admins in the host projectrequire access to backend services in the service project. Service projectadmins grant the Compute Load Balancer Services User role(roles/compute.loadBalancerServiceUser) toto Load Balancer Admins in the host project A who want to reference the backendservice in the service project.

Load balancer frontend and URL map in host project. — Load balancer frontend and URL map in host project (click to enlarge).

Timeouts and retries

Internal Application Load Balancers support the following types of timeouts:

Timeout type and description	Default values	Supports custom values
Timeout type and description	Default values	Cross-region	Regional
Backend service timeout A request and response timeout. Represents the maximum amount of time allowed between the load balancer sending the first byte of a request to the backend and the backend returning the last byte of the HTTP response to the load balancer. If the backend hasn't returned the entire HTTP response to the load balancer within this time limit, the remaining response data is dropped.	For serverless NEGs on a backend service: 60 minutes For all other backend types on a backend service: 30 seconds
Client HTTP keepalive timeout The maximum amount of time that the TCP connection between a client and the load balancer's managed Envoy proxy can be idle. (The same TCP connection might be used for multiple HTTP requests.)	610 seconds
Backend HTTP keepalive timeout The maximum amount of time that the TCP connection between the load balancer's managed Envoy proxy and a backend can be idle. (The same TCP connection might be used for multiple HTTP requests.)	10 minutes (600 seconds)

Backend service timeout

The configurablebackend service timeout represents the maximum amount oftime that the load balancer waits for your backend to process an HTTP request andreturn the corresponding HTTP response. Except for serverless NEGs, the defaultvalue for the backend service timeout is 30 seconds.

For example, if you want to download a 500-MB file, and the value of the backendservice timeout is 90 seconds, the load balancer expects the backend to deliverthe entire 500-MB file within 90 seconds. It is possible to configure thebackend service timeout to be insufficient for the backend to send its completeHTTP response. In this situation, if the load balancer has at least receivedHTTP response headers from the backend, the load balancer returns the completeresponse headers and as much of the response body as it could obtain within thebackend service timeout.

We recommend that you set the backend service timeout to the longest amount oftime that you expect your backend to need in order to process an HTTP response.If the software running on your backend needs more time to process an HTTPrequest and return its entire response, we recommend that you increase thebackend service timeout.

The backend service timeout accepts values between1 and2,147,483,647seconds; however, larger values aren't practical configuration options.Google Cloud also doesn't guarantee that an underlying TCP connection canremain open for the entirety of the value of the backend service timeout.Client systems must implement retry logic instead of relying on a TCPconnection to be open for long periods of time.

For websocket connections used with internal Application Load Balancers, active websocketconnections don't follow the backend service timeout. Idle websocket connectionsare closed after the backend service timeout.

Google Cloud periodically restarts or changes the number of serving Envoysoftware tasks. The longer the backend service timeout value, the more likely itis that Envoy task restarts or replacements will terminate TCP connections.

To configure the backend service timeout, use one of the following methods:

Console

Modify theTimeout field of the load balancer's backend service.

gcloud

Use thegcloud compute backend-services update command to modify the--timeout parameter of the backend service resource.

API

Modify thetimeoutSec parameter for theregionBackendServices resource

Client HTTP keepalive timeout

Theclient HTTP keepalive timeout represents the maximum amount of timethat a TCP connection can be idle between the (downstream) client and an Envoyproxy. The default client HTTP keepalive timeout value is 610 seconds. You canconfigure the timeout with a value between 5 and 1200 seconds.

An HTTP keepalive timeout is also called aTCP idle timeout.

The load balancer's client HTTP keepalive timeout must be greater than theHTTP keepalive (TCP idle) timeout used by downstream clients or proxies.If a downstream client has a greater HTTP keepalive (TCP idle) timeout thanthe load balancer's client HTTP keepalive timeout, it's possible for a racecondition to occur. From the perspective of a downstream client, an establishedTCP connection is permitted to be idle for longer than permitted by the loadbalancer. This means that the downstream client can send packets after the loadbalancer considers the TCP connection to be closed. When that happens, the loadbalancer responds with a TCP reset (RST) packet.

When the client HTTP keepalive timeout expires, either the GFE or the Envoyproxy sends a TCP FIN to the client to gracefully close the connection.

Backend HTTP keepalive timeout

Internal Application Load Balancers are proxies that use a first TCP connection between the(downstream) client and an Envoy proxy, and a second TCP connection between theEnvoy proxy and your backends.

The load balancer's secondary TCP connections might not get closed after eachrequest; they can stay open to handle multiple HTTP requests and responses. Thebackend HTTP keepalive timeout defines the TCP idle timeout between theload balancer and your backends. The backend HTTP keepalive timeout doesn'tapply to websockets.

The backend keepalive timeout is fixed at 10 minutes (600 seconds) and cannotbe changed. This helps ensure that the load balancer maintains idle connectionsfor at least 10 minutes. After this period, the load balancer can sendtermination packets to the backend at any time.

The load balancer's backend keepalive timeout must be less than the keepalivetimeout used by software running on your backends. This avoids a race conditionwhere the operating system of your backends might close TCP connections with aTCP reset (RST). Because the backend keepalive timeout for the load balancerisn't configurable, you must configure your backend software so that itsHTTP keepalive (TCP idle) timeout value is greater than 600 seconds.

When the backend HTTP keepalive timeout expires, either the GFE or the Envoyproxy sends a TCP FIN to the backend VM to gracefully close the connection.

The following table lists the changes necessary to modify keepalive timeoutvalues for common web server software.

Web server software	Parameter	Default setting	Recommended setting
Apache	KeepAliveTimeout	`KeepAliveTimeout 5`	`KeepAliveTimeout 620`
nginx	keepalive_timeout	`keepalive_timeout 75s;`	`keepalive_timeout 620s;`

Retries

To configure retries, you can use aretry policy inthe URL map. The default number of retries (numRetries) is 1.The maximum configurableperTryTimeout is 24 hours.

Without a retry policy, unsuccessful requests that have no HTTP body (forexample,GET requests) that result in HTTP502,503,or504 responses are retried once.

HTTPPOST requests aren't retried.

Retried requests only generate one log entry for the final response.

For more information, seeInternal Application Load Balancer logging and monitoring.

Accessing connected networks

Your clients can access an internal Application Load Balancer in your VPC networkfrom a connected network by using the following:

VPC Network Peering
Cloud VPN and Cloud Interconnect

For detailed examples, seeInternal Application Load Balancers and connectednetworks.

Session affinity

Session affinity, configured on the backend service of Application Load Balancers,provides a best-effort attempt to send requests from a particular client to thesame backend as long as the number of healthy backend instances or endpointsremains constant, and as long as the previously selected backend instance orendpoint is not at capacity. Thetarget capacity of the balancingmode determines when thebackend is at capacity.

The following table outlines the different types of session affinity optionssupported for the different Application Load Balancers. In the sectionthat follows,Types of session affinity, each session affinity type is discussed in further detail.

**Table:** Supported session affinity settings
Product	Session affinity options
Cross-region internal Application Load Balancer Regional internal Application Load Balancer	None (`NONE`) Client IP (`CLIENT_IP`) Generated cookie (`GENERATED_COOKIE`) Header field (`HEADER_FIELD`) HTTP cookie (`HTTP_COOKIE`) Stateful cookie-based affinity (`STRONG_COOKIE_AFFINITY`) Also note: The effective default value of the load balancing locality policy(`localityLbPolicy`) changes according to your sessionaffinity settings. If session affinity is not configured—that is, ifsession affinity remains at the default value of`NONE`—thenthe default value for`localityLbPolicy` is`ROUND_ROBIN`.If session affinity is set to a value other than`NONE`, then thedefault value for`localityLbPolicy` is`MAGLEV`. For the internal Application Load Balancer, don't configure session affinity if you're using weighted traffic splitting. If you do, the weighted traffic splitting configuration takes precedence.

Keep the following in mind when configuring session affinity:

Don't rely on session affinity for authentication or security purposes.Session affinity, except forstateful cookie-based sessionaffinity, can break whenever thenumber of serving and healthy backends changes. For more details, seeLosingsession affinity.
The default values of the--session-affinity and--subsetting-policyflags are bothNONE, and only one of them at a time can be set to adifferent value.

Types of session affinity

The session affinity for internal Application Load Balancers can be classified into one ofthe following categories:

Hash-based session affinity (NONE,CLIENT_IP)
HTTP header-based session affinity (HEADER_FIELD)
Cookie-based session affinity (GENERATED_COOKIE,HTTP_COOKIE,STRONG_COOKIE_AFFINITY)

Hash-based session affinity

For hash-based session affinity, the load balancer uses theconsistent hashing algorithm to select an eligible backend. The session affinity setting determines which fields from the IP header are used to calculate the hash.

Hash-based session affinity can be of the following types:

None

A session affinity setting ofNONE doesnot mean that there is nosession affinity. It means that no session affinity option is explicitly configured.

Hashing is always performed to select a backend. And a session affinity setting ofNONE means that the load balancer uses a 5-tuple hash to select a backend. The 5-tuplehash consists of the source IP address, the source port, the protocol, the destination IP address,and the destination port.

A session affinity ofNONE is the default value.

Client IP affinity

Client IP session affinity (CLIENT_IP) is a 2-tuple hash created from thesource and destination IP addresses of the packet. Client IP affinity forwardsall requests from the same client IP address to the same backend, as long asthat backend has capacity and remains healthy.

When you use client IP affinity, keep the following in mind:

The packet destination IP address is only the same as the load balancer forwarding rule's IP address if the packet is sent directly to the load balancer.
The packet source IP address might not match an IP address associated with the original client if the packet is processed by an intermediate NAT or proxy system before being delivered to a Google Cloud load balancer. In situations where many clients share the same effective source IP address, some backend VMs might receive more connections or requests than others.

HTTP header-based session affinity

With header field affinity (HEADER_FIELD), requests are routed to the backends based on the value of the HTTP header in theconsistentHash.httpHeaderName fieldof the backend service. To distribute requests across all available backends,each client needs to use a different HTTP header value.

Header field affinity is supported when the followingconditions are true:

The load balancing locality policy isRING_HASH orMAGLEV.
The backend service'sconsistentHash specifies the name of the HTTP header(httpHeaderName).

Cookie-based session affinity

Cookie-based session affinity can be of the following types:

Generated cookie affinity

When you use generated cookie-based affinity (GENERATED_COOKIE), the loadbalancer includes an HTTP cookie in theSet-Cookie header in response to theinitial HTTP request.

The name of the generated cookie varies depending on the type of the loadbalancer.

Product	Cookie name
Cross-region internal Application Load Balancers	`GCILB`
Regional internal Application Load Balancers	`GCILB`

The generated cookie's path attribute is always a forward slash (/), so itapplies to all backend services on the same URL map, provided that the otherbackend services also use generated cookie affinity.

You can configure the cookie's time to live (TTL) value between0 and1,209,600 seconds (inclusive) by using theaffinityCookieTtlSec backendservice parameter. IfaffinityCookieTtlSec isn't specified, the default TTLvalue is0.

When the client includes the generated session affinity cookie in theCookierequest header of HTTP requests, the load balancer directs thoserequests to the same backend instance or endpoint, as long as the sessionaffinity cookie remains valid. This is done by mapping the cookie value to anindex that references a specific backend instance or an endpoint,and by making sure that the generated cookie session affinity requirementsare met.

To use generated cookie affinity, configure the following balancingmode andlocalityLbPolicy settings:

For backend instance groups, use theRATE balancing mode.
For thelocalityLbPolicy of the backend service, use eitherRING_HASH orMAGLEV. If you don't explicitly set thelocalityLbPolicy,the load balancer usesMAGLEV as an implied default.

For more information, seelosing session affinity.

HTTP cookie affinity

When you use HTTP cookie-based affinity (HTTP_COOKIE), the load balancerincludes an HTTP cookie in theSet-Cookie header in response to the initialHTTP request. You specify the name, path, and time to live (TTL) for the cookie.

All Application Load Balancers support HTTP cookie-based affinity.

You can configure the cookie's TTL values using seconds, fractions of a second(as nanoseconds), or both seconds plus fractions of a second (as nanoseconds)using the following backend service parameters and valid values:

consistentHash.httpCookie.ttl.seconds can be set to a value between0and315576000000 (inclusive).
consistentHash.httpCookie.ttl.nanos can be set to a value between0and999999999 (inclusive). Because the units are nanoseconds,999999999means.999999999 seconds.

If bothconsistentHash.httpCookie.ttl.seconds andconsistentHash.httpCookie.ttl.nanos aren't specified, the value of theaffinityCookieTtlSec backend service parameter is used instead. IfaffinityCookieTtlSec isn't specified, the default TTL value is0.

When the client includes the HTTP session affinity cookie in theCookierequest header of HTTP requests, the load balancer directs thoserequests to the same backend instance or endpoint, as long as the sessionaffinity cookie remains valid. This is done by mapping the cookie value to anindex that references a specific backend instance or an endpoint,and by making sure that the generated cookie session affinity requirementsare met.

To use HTTP cookie affinity, configure the following balancingmode andlocalityLbPolicy settings:

For backend instance groups, use theRATE balancing mode.
For thelocalityLbPolicy of the backend service, use eitherRING_HASH orMAGLEV. If you don't explicitly set thelocalityLbPolicy,the load balancer usesMAGLEV as an implied default.

For more information, seelosing session affinity.

Stateful cookie-based session affinity

When you use stateful cookie-based affinity (STRONG_COOKIE_AFFINITY), the loadbalancer includes an HTTP cookie in theSet-Cookie header in response to theinitial HTTP request. You specify the name, path, and time to live (TTL) for thecookie.

All Application Load Balancers, except for classic Application Load Balancers, support stateful cookie-based affinity.

You can configure the cookie's TTL values using seconds, fractions of a second(as nanoseconds), or both seconds plus fractions of a second (as nanoseconds).The duration represented bystrongSessionAffinityCookie.ttl cannot be set to avalue representing more than two weeks (1,209,600 seconds).

The value of the cookie identifies a selected backend instance or endpoint byencoding the selected instance or endpoint in the value itself. For as longas the cookie is valid, if the client includes the session affinity cookie intheCookie request header of subsequent HTTP requests, the load balancerdirects those requests to selected backend instance or endpoint.

Unlike other session affinity methods:

Stateful cookie-based affinity has no specific requirements for the balancingmode or for the load balancing locality policy (localityLbPolicy).
Stateful cookie-based affinity is not affected when autoscaling adds a newinstance to a managed instance group.
Stateful cookie-based affinity is not affected when autoscaling removes aninstance from a managed instance groupunless the selected instance isremoved.
Stateful cookie-based affinity is not affected when autohealing removes aninstance from a managed instance groupunless the selected instance isremoved.

For more information, seelosing session affinity.

Meaning of zero TTL for cookie-based affinities

All cookie-based session affinities, such as generated cookie affinity, HTTP cookie affinity, and stateful cookie-based affinity, have a TTL attribute.

A TTL of zero seconds means the load balancer does not assign anExpiresattribute to the cookie. In this case, the client treats the cookie as a sessioncookie. The definition of asession varies depending on the client:

Some clients, like web browsers, retain the cookie for the entire browsingsession. This means that the cookie persists across multiple requests untilthe application is closed.
Other clients treat a session as a single HTTP request, discarding the cookieimmediately after.

Losing session affinity

All session affinity options require the following:

The selected backend instance or endpoint must remain configured as a backend. Session affinity can break when one of the following events occurs:
- You remove the selected instance from its instance group.
- Managed instance group autoscaling or autohealing removes the selected instance from its managed instance group.
- You remove the selected endpoint from its NEG.
- You remove the instance group or NEG that contains the selected instance or endpoint from the backend service.
The selected backend instance or endpoint must remain healthy. Session affinity can break when the selected instance or endpoint fails health checks.

Except forstateful cookie-based session affinity,all session affinity options have the following additional requirements:

The instance group or NEG that contains the selected instance or endpointmust not be full as defined by itstarget capacity. (Forregional managed instance groups, the zonal component of the instance groupthat contains the selected instance must not be full.) Session affinity canbreak when the instance group or NEG is full and other instance groups orNEGs are not. Because fullness can change in unpredictable ways when usingtheUTILIZATION balancing mode, you should use theRATE orCONNECTIONbalancing mode to minimize situations when session affinity can break.
Thetotal number of configured backend instances or endpoints must remainconstant. When at least one of the following events occurs, the number ofconfigured backend instances or endpoints changes, and session affinity canbreak:
- Adding new instances or endpoints:
  - You add instances to an existing instance group on the backend service.
  - Managed instance group autoscaling adds instances to a managed instancegroup on the backend service.
  - You add endpoints to an existing NEG on the backend service.
  - You add non-empty instance groups or NEGs to the backend service.
- Removing any instance or endpoint,not just the selected instance orendpoint:
  - You remove any instance from an instance group backend.
  - Managed instance group autoscaling or autohealing removes any instancefrom a managed instance group backend.
  - You remove any endpoint from a NEG backend.
  - You remove any existing, non-empty backend instance group or NEG fromthe backend service.
Thetotal number of healthy backend instances or endpoints must remainconstant. When at least one of the following events occurs, the number ofhealthy backend instances or endpoints changes, and session affinity canbreak:
- Any instance or endpoint passes its health check, transitioning fromunhealthy to healthy.
- Any instance or endpoint fails its health check, transitioning fromhealthy to unhealthy or timeout.

Failover

If a backend becomes unhealthy, traffic is automatically redirected to healthybackends.

The following table describes the failover behavior in each mode:

Load balancer mode Failover behavior Behavior when all backends are unhealthy

Cross-region internal Application Load Balancer

Load balancer mode	Failover behavior	Behavior when all backends are unhealthy
Cross-region internal Application Load Balancer	Automatic failover to healthy backends in the same region or other regions. Traffic is distributed among healthy backends spanning multiple regions based on the configured traffic distribution.	Returns HTTP`503`
Regional internal Application Load Balancer	Automatic failover to healthy backends in the same region. Envoy proxy sends traffic to healthy backends in a region based on the configured traffic distribution.	Returns HTTP`503`

Automatic failover to healthy backends in the same region or other regions.

Traffic is distributed among healthy backends spanning multiple regions based on the configured traffic distribution.

Returns HTTP503

Regional internal Application Load Balancer

Automatic failover to healthy backends in the same region.

Envoy proxy sends traffic to healthy backends in a region based on the configured traffic distribution.

Returns HTTP503

High availability and cross-region failover

For regional internal Application Load Balancers

To achieve high availability, deploy multiple individualregional internal Application Load Balancers in regions that best support your application'straffic. You then use a Cloud DNSgeolocation routingpolicy to detect whether a load balancer is responding during a regionaloutage. A geolocation policy routes traffic to the next closest available regionbased on the origin of the client request. Health checking is available bydefault for internal Application Load Balancers.

For cross-region internal Application Load Balancers

You can set up a cross-region internal Application Load Balancer in multiple regions to get the followingbenefits:

If the cross-region internal Application Load Balancer in a region fails, the DNS routing policiesroute traffic to a cross-region internal Application Load Balancer in another region.
The high availability deployment example shows the following:
- A cross-region internal Application Load Balancer with frontend virtual IP address (VIP) in theRegionA andRegionB regions in your VPC network. Yourclients are located in theRegionA region.
- You can make the load balancer accessible by using frontend VIPs from tworegions, and use DNS routing policies to return the optimal VIP to yourclients. UseGeolocation routingpoliciesif you want your clients to use the VIP that is geographically closest.
- DNS routing policies can detect whether a VIP isn't responding duringa regional outage, and return the next most optimal VIP to your clients,ensuring that your application stays up even during regional outages.
Cross-region internal Application Load Balancer with high availability deployment (click to enlarge).
If backends in a particular region are down, the cross-region internal Application Load Balancertraffic fails over to the backends in another region gracefully.
The cross-region failover deployment example shows the following:
- A cross-region internal Application Load Balancer with a frontend VIP address in theRegionAregion of your VPC network. Your clients are also located intheRegionA region.
- A global backend service that references the backends in theRegionA andRegionB Google Cloud regions.
- When the backends inRegionA region are down, trafficfails over to theRegionB region.
Cross-region internal Application Load Balancer with a cross-region failover deployment (click to enlarge).

WebSocket support

Google Cloud HTTP(S)-based load balancers support the websocket protocolwhen you use HTTP or HTTPS as the protocol to the backend.The load balancer doesn't require any configuration to proxy websocketconnections.

The websocket protocol provides a full-duplex communication channel betweenclients and the load balancer. Formore information, seeRFC 6455.

The websocket protocol works as follows:

The load balancer recognizes a websocketUpgrade request froman HTTP or HTTPS client. The request contains theConnection: Upgrade andUpgrade: websocket headers, followed by other relevant websocket relatedrequest headers.
Backend sends a websocketUpgrade response. The backend instance sends a101 switching protocol response withConnection: Upgrade andUpgrade: websocket headers and other other websocket relatedresponse headers.
The load balancer proxies bidirectional traffic for the duration of thecurrent connection.

If the backend instance returns a status code426 or502,the load balancer closes the connection.

Session affinity for websockets works the same as for any other request.For more information, seeSessionaffinity.

HTTP/2 support

HTTP/2 is a major revision of the HTTP/1 protocol. There are 2 modes of HTTP/2support:

HTTP/2 over TLS
Cleartext HTTP/2 over TCP

HTTP/2 over TLS

HTTP/2 over TLS is supported for connections between clients and theexternal Application Load Balancer, and for connections between the load balancer and its backends.

The load balancer automatically negotiates HTTP/2 with clients as part of theTLS handshake by using the ALPN TLS extension. Even if a load balancer isconfigured to use HTTPS, modern clients default to HTTP/2. This is controlledon the client, not on the load balancer.

If a client doesn't support HTTP/2 and the load balancer is configured to useHTTP/2 between the load balancer and the backend instances, the load balancermight still negotiate an HTTPS connection or accept unsecured HTTP requests.Those HTTPS or HTTP requests are then transformed by the load balancer to proxythe requests over HTTP/2 to the backend instances.

To use HTTP/2 over TLS, you must enable TLS on your backends and set thebackend service protocol toHTTP2. Formore information, seeEncryption from the load balancer to thebackends.

HTTP/2 max concurrent streams

The HTTP/2SETTINGS_MAX_CONCURRENT_STREAMSsetting describes the maximum number of streams that an endpoint accepts,initiated by the peer. The value advertised by an HTTP/2 client to aGoogle Cloud load balancer is effectively meaningless because the loadbalancer doesn't initiate streams to the client.

In cases where the load balancer uses HTTP/2 to communicate with a server thatis running on a VM, the load balancer respects theSETTINGS_MAX_CONCURRENT_STREAMS value advertised by the server, up to amaximum value of100. In the request direction (Google Cloud loadbalancer → gRPC server), the load balancer uses the initialSETTINGS framefrom the gRPC server to determine how many streams per connection can be in usesimultaneously. If the server advertises a value higher than100, the loadbalancer uses 100 as the maximum number of concurrent streams. If a value ofzero is advertised, the load balancer can't forward requests to the server, andthis might result in errors.

HTTP/2 dynamic header table size

HTTP/2 significantly improves upon HTTP/1.1 with features like multiplexingand HPACK header compression. HPACK uses a dynamic table that enhances headercompression, making everything faster. To understand the impact of dynamicheader table size changes in HTTP/2, how this feature can improve performanceand how a specific bug in a various HTTP client libraries could cause issuesin HPACK header compression, refer to thecommunityarticle.

HTTP/2 limitations

HTTP/2 between the load balancer and the instance can require significantlymore TCP connections to the instance than HTTP or HTTPS. Connection pooling,an optimization that reduces the number of these connections with HTTP orHTTPS, isn't available with HTTP/2. As a result, you might see high backendlatencies because backend connections are made more frequently.
HTTP/2 between the load balancer and the backend doesn't support runningthe WebSocket Protocol over a single stream of an HTTP/2 connection (RFC8441).
HTTP/2 between the load balancer and the backend doesn't support serverpush.
The gRPC error rate and request volume aren't visible in theGoogle Cloud API or the Google Cloud console. If the gRPC endpointreturns an error, the load balancer logs and the monitoring data report the200 OK HTTP status code.

HTTP/2 over cleartext TCP

HTTP/2 over cleartext TCP, represented by the string "h2c" perRFC 7540,lets you use HTTP/2 without TLS encryption. It is supported for the followingconnections:

Client to load balancer: Supported automatically; no special configurationis required.
Important: gRPC over H2C is not supported for this connection.
Load balancer to its backends: Supported by setting the backend service protocol toH2C.

H2C support is also available for load balancers created using theGKE Gateway controller and Cloud Service Mesh,but isn't supported for classic Application Load Balancers.

gRPC support

gRPC is an open-source frameworkfor remote procedure calls. It is based on the HTTP/2 standard. Use cases forgRPC include the following:

Low-latency, highly scalable, distributed systems
Developing mobile clients that communicate with a cloud server
Designing new protocols that must be accurate, efficient, andlanguage-independent
Layered design to enable extension, authentication, and logging

To use gRPC with your Google Cloud applications, you must proxy requestsend-to-end over HTTP/2. To do this, you create an Application Load Balancer withone of the following configurations:

HTTP/2 over TLS between the client and the load balancerand H2C between the load balancer and the backend: you create an HTTPS loadbalancer (configured with a target HTTPS proxy and SSL certificate).Additionally, you configurethe load balancer to use HTTP/2 for unencrypted connections between the loadbalancer and its backends by setting the backend service protocol toH2C.
End-to-end encrypted traffic using HTTP/2 over TLS: you create an HTTPS loadbalancer (configured with a target HTTPS proxy and SSL certificate). The loadbalancer negotiates HTTP/2 with clients as part of the SSL handshake by usingthe ALPN TLS extension.
Additionally, you must make sure that thebackends can handle TLStraffic andconfigure the load balancer to use HTTP/2 for encrypted connections betweenthe load balancer and its backends by setting thebackend serviceprotocol toHTTP2.
The load balancer might still negotiate HTTPS with some clients oraccept unsecured HTTP requests on a load balancer that is configured to useHTTP/2 between the load balancer and the backend instances. Those HTTP orHTTPS requests are transformed by the load balancer to proxy the requests overHTTP/2 to the backend instances.

TLS support

By default, an HTTPS target proxy accepts only TLS 1.0, 1.1, 1.2, and 1.3 whenterminating client SSL requests.

When the internal Application Load Balancer uses HTTPS as the backend service protocol, it cannegotiate TLS 1.2 or 1.3 to the backend.

Mutual TLS support

Mutual TLS, or mTLS, is an industry standard protocol for mutual authenticationbetween a client and a server. mTLS helps ensure that both the client and serverauthenticate each other by verifying that each holds a valid certificate issuedby a trusted certificate authority (CA). Unlike standard TLS, where only theserver is authenticated, mTLS requires both the client and server to presentcertificates, confirming the identities of both parties before communication isestablished.

All of the Application Load Balancers support mTLS. With mTLS, the load balancerrequests that the client send a certificate to authenticate itself during theTLS handshake with the load balancer. You can configure aCertificate Manager truststore that the load balancer then uses to validate the client certificate'schain of trust.

For more information about mTLS, seeMutual TLSauthentication.

Limitations

There's no guarantee that a request from a client in one zone of the regionis sent to a backend that's in the same zone as the client.Session affinitydoesn't reduce communication between zones.
Internal Application Load Balancers aren't compatible with the followingfeatures:
- Cloud CDN
- Compute Engine Google-managed SSLcertificates (Certificate Manager Google-managedcertificates aresupported)
To use Certificate Manager certificates with internal Application Load Balancers, youmust use either the API or the gcloud CLI. TheGoogle Cloud console doesn't support Certificate Managercertificates.
An internal Application Load Balancer supports HTTP/2 only over TLS.
Clients connecting to an internal Application Load Balancer must use HTTP version 1.1or later. HTTP 1.0 isn't supported.
Google Cloud doesn't warn you if yourproxy-onlysubnet runs out of IP addresses.
The internal forwarding rule that your internal Application Load Balancer uses must haveexactly one port.
When using an internal Application Load Balancer withCloud Run in a Shared VPC environment,standalone VPC networks in service projectscan send traffic to any other Cloud Run servicesdeployed in any other service projects within the same Shared VPCenvironment. This is a known issue.
Google Cloud doesn't guarantee that an underlying TCP connection canremain open for the entirety of the value of the backend service timeout.Client systems must implement retry logic instead of relying on a TCPconnection to be open for long periods of time.

Internal Application Load Balancers don't support Cloud Functions 1st gen andApp Engine. For more information, seeServerless NEGs overview:Supported loadbalancers.
Internal Application Load Balancers don't supportCloud Trace.

What's next

To configure load balancing on a Shared VPC setup,seeSet up an internal Application Load Balancer withShared VPC.
To configure load balancing for your services running in GKEpods, seeDeploying GKE Gateways,Container-native load balancing with standaloneNEGsand theAttaching an internal Application Load Balancer to standaloneNEGs section.
To manage the proxy-only subnet resource, seeProxy-onlysubnets for Envoy-based load balancers.
To configure backend subsetting on regional internal Application Load Balancers, seeBackend subsetting.

To configure an regional internal Application Load Balancer withPrivate Service Connect, seeAccess regional Google APIsthrough backends.
To insert custom logic into the load balancing data path, configureCloud Load Balancing extensions.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Internal Application Load Balancer overview Stay organized with collections Save and categorize content based on your preferences.

Modes of operation

Identify the mode

Console

gcloud

Architecture and resources

Proxy-only subnet

Forwarding rule and IP address

Forwarding rules and VPC networks

Target proxy

SSL certificates

URL maps

Backend service

Backend service scope

Protocol to the backends

Backends

Backends and VPC networks

Backends and network interfaces

Backend subsetting

Health checks

Health check protocol

Firewall rules

Client access

GKE support

Shared VPC architectures

All load balancer components and backends in a service project

Serverless backends in a Shared VPC environment

Cross-project service referencing

Usage notes for cross-project service referencing

Example 1: Load balancer frontend and backend in different service projects

Example 2: Load balancer frontend in the host project and backends in service projects

Timeouts and retries

Backend service timeout

Console

gcloud

API

Client HTTP keepalive timeout

Backend HTTP keepalive timeout

Retries

Accessing connected networks

Session affinity

Types of session affinity

Hash-based session affinity

None

Client IP affinity

HTTP header-based session affinity

Cookie-based session affinity

Generated cookie affinity

HTTP cookie affinity

Stateful cookie-based session affinity

Meaning of zero TTL for cookie-based affinities

Losing session affinity

Failover

High availability and cross-region failover

For regional internal Application Load Balancers

For cross-region internal Application Load Balancers

WebSocket support

HTTP/2 support

HTTP/2 over TLS

HTTP/2 max concurrent streams

HTTP/2 dynamic header table size

HTTP/2 limitations

HTTP/2 over cleartext TCP

gRPC support

TLS support

Mutual TLS support

Limitations

What's next

Internal Application Load Balancer overview