Failover for external passthrough Network Load Balancer overview Stay organized with collections Save and categorize content based on your preferences.
You can configure a backend service-based external passthrough Network Load Balancer to distributeconnections among virtual machine (VM) instances inprimary backends, and thenswitch, if needed, to usingfailover backends. Failover provides one method ofincreasing availability, while also giving you greater control over how tomanage your workload when your primary backend VMs aren't healthy.
This page describes concepts and requirements specific to failover forexternal passthrough Network Load Balancers. Make sure that you are familiar with the conceptualinformation in the following articles before you configure failover for yourexternal passthrough Network Load Balancer:
These concepts are important to understand because configuring failover modifiesthe load balancer'sstandard traffic distributionalgorithm.
By default, when you add a backend to an external passthrough Network Load Balancer'sbackendservice, that backend isa primary backend. You can designate a backend to be a failover backend whenyou add it to the load balancer's backend service, or by editing the backendservice later. Failover backends only receive connections from the load balancerafter a configurable ratio of primary VMs don't pass health checks.
Supported backends
Instance groups (managed and unmanaged) and zonal NEGs (withGCE_VM_IPendpoints) are supported as backends. For simplicity, the examples on this pageshow unmanaged instance groups.
Using managed instance groups with autoscaling and failover might cause theactive pool to repeatedlyfailover andfailback between the primary and failover backends.Google Cloud doesn't prevent you from configuring failover with managed instancegroups because your deployment might benefit from this setup.
Architecture
The following example depicts an external passthrough Network Load Balancer with oneprimary backend and one failover backend.
- The primary backend is an unmanaged instance group in
us-west1-a. - The failover backend is a different unmanaged instance group in
us-west1-c.
The next example depicts an external passthrough Network Load Balancer with two primarybackends and two failover backends, both distributed between two zones in theus-west1 region. This configuration increases reliability because it doesn'tdepend on a single zone for all primary or all failover backends.
- Primary backends are unmanaged instance groups
ig-aandig-d. - Failover backends are unmanaged instance groups
ig-bandig-c.
During failover, both primary backends become inactive, while the healthy VMs inboth failover backends become active. For a full explanation of how failoverworks in this example, see theFailover example.
Backend instance groups and VMs
The instance groups in external passthrough Network Load Balancers are eitherprimary backends or failover backends. You can designate a backend to be afailover backend at the time that you add it to the backend service or byediting the backend after you add it. Otherwise, instance groups areprimary by default.
You can configure multiple primary backends and multiple failover backends in asingle external passthrough Network Load Balancer by adding them to the load balancer's backend service.
Aprimary VM is a member of an instance group that you've defined to be aprimary backend. The VMs in a primary backend participate in the load balancer'sactive pool (described in the next section), unless the load balancer switchesto using its failover backends.
Abackup VM is a member of an instance group that you've defined to be afailover backend. The VMs in a failover backend participate in the loadbalancer's active pool when primary VMs become unhealthy. The number ofunhealthy primary VMs that triggers failover is a configurable percentage.
Limits
- Instance groups. You can have up to 50 primary backend instancegroups and up to 50 failover backend instance groups.
Active pool
The active pool is the collection of backend VMs to which anexternal passthrough Network Load Balancer sends new connections. Membership of backend VMs in theactive pool is computed automatically based on which backends are healthy andconditions that you can specify, as described inFailover policy.
The active pool never combines primary VMs and backup VMs. The followingexamples clarify the membership possibilities. During failover, the active poolcontains only backup VMs. During normal operation (failback), the active poolcontains only primary VMs.
Failover and failback
Failover andfailback are the automatic processes that switch backend VMsinto or out of the load balancer's active pool. When Google Cloud removesprimary VMs from the active pool and adds healthy failover VMs to the activepool, the process is called failover. When Google Cloud reverses this, theprocess is called failback.
Failover policy
Afailover policy is a collection of parameters that Google Clouduses for failover and failback. Each external passthrough Network Load Balancerhas one failover policy that has multiple settings:
- Failover ratio
- Dropping traffic when all backend VMs are unhealthy
- Connection draining on failover and failback
Failover ratio
Aconfigurablefailover ratiodetermines when Google Cloud performs a failover or failback, changingmembership in the active pool. The ratio can be from0.0 to1.0, inclusive.If you don't specify a failover ratio, Google Cloud uses a default valueof0.0. It's a best practice to set your failover ratio to a number that worksfor your use case rather than relying on this default.
| Conditions | VMs in active pool |
|---|---|
| All healthy primary VMs |
If at least one backup VM is healthyand:
| All healthy backup VMs |
| When all primary VMs and all backup VMs are unhealthyand you haven't configured your load balancer todrop traffic during this situation | All primary VMs, as a last resort |
The following examples clarify membership in the active pool. For an examplewith calculations, see theFailover example.
- A failover ratio of
1.0requires that all primary VMs be healthy. Whenatleast one primary VM becomes unhealthy, Google Cloud performs afailover, moving the backup VMs into the active pool. - A failover ratio of
0.1requires that at least 10% of the primary VMs behealthy; otherwise, Google Cloud performs a failover. - A failover ratio of
0.0means that Google Cloud performsa failover only when all the primary VMs are unhealthy. Failover doesn'thappen if at least one primary VM is healthy.
An external passthrough Network Load Balancer distributes connections among VMs in theactive pool according to thetraffic distributionalgorithm.
Dropping traffic when all backend VMs are unhealthy
By default, when all primary and backup VMs are unhealthy, Google Clouddistributes new connections among all primary VMs. It does so as a last resort.
If you prefer, you can configure your external passthrough Network Load Balancer todropnew connections when all primary andbackup VMs are unhealthy.
Connection draining on failover and failback
When connection draining is enabled for the failover policy,established connections to instances in either the primary or failover instancegroups continue to be sent to the instances with which they have beenestablished, even after failover or failback, thus preventing connectionbreakage. When connection draining is disabled for the failover policy, anyexisting connections are terminated immediately during failover or failback.
If the protocol for your load balancer is TCP, the following is true:
By default, connection draining is enabled. Existing TCP sessions canpersist on their current backend VMs even if the backend VMisn't in the load balancer's active pool.
You can disable connection draining during failover and failback events.Disabling connection draining during failover and failback ensures that allTCP sessions, including established ones, are quickly terminated. Connectionsto backend VMs might be closed with a TCP reset packet.
Disabling connection draining on failover andfailbackis useful for scenarios such as the following:
Patching backend VMs. Prior to patching, configure your primary VMs tofail health checks so that the load balancer performs a failover.Disabling connection draining ensures that all connections are moved to thebackup VMs quickly and in a planned fashion. This lets you installupdates and restart the primary VMs without existing connections persisting.After patching, Google Cloud can perform a failback when a sufficientnumber of primary VMs (as defined by the failover ratio) pass their healthchecks.
Single backend VM for data consistency. If you need to ensure that only oneVM is the destination for all connections, disable connection drainingso that switching from a primary to a backup VM does not allow existingconnections to persist on both. This reduces the possibility of datainconsistencies by keeping just one backend VM active at any given time.
Failover example
The following example describes failover behavior for the multi-zoneexternal passthrough Network Load Balancer example presented in thearchitecturesection.
The primary backends for this load balancer are the unmanaged instance groupsig-a inus-west1-a andig-d inus-west1-c. Each instance group containstwo VMs. All four VMs from both instance groups are primary VMs:
vm-a1inig-avm-a2inig-avm-d1inig-dvm-d2inig-d
The failover backends for this load balancer are the unmanaged instance groupsig-b inus-west1-a andig-c inus-west1-c. Each instance group containstwo VMs. All four VMs from both instance groups are backup VMs:
vm-b1inig-bvm-b2inig-bvm-c1inig-cvm-c2inig-c
Suppose you want to configure a failover policy for this load balancer such thatnew connections are delivered to backup VMs when the number of healthy primaryVMs is fewer than two. To accomplish this, set the failover ratio to0.5(50%). Google Cloud uses the failover ratio to calculate the minimumnumber of primary VMs that must be healthy by multiplying the failover ratio bythe number of primary VMs:4 × 0.5 = 2
When all four primary VMs are healthy, Google Cloud distributes newconnections to all of them. When primary VMs fail health checks:
If
vm-a1andvm-d1become unhealthy, Google Cloud distributes newconnections between the remaining two healthy primary VMs,vm-a2andvm-d2, because the number of healthy primary VMs is at least the minimum.If
vm-a2also fails health checks, leaving only one healthy primary VM,vm-d2, Google Cloud recognizes that the number of healthy primary VMsis fewer than the minimum, so it performs a failover. The active pool is setto the four healthy backup VMs, and new connections are distributed among those four(in instance groupsig-bandig-c). Even thoughvm-d2remains healthy,it is removed from the active pool and does not receive new connections.If
vm-a2recovers and passes its health check, Google Cloud recognizesthat the number of healthy primary VMs is at least the minimum of two, so itperforms a failback. The active pool is set to the two healthy primary VMs,vm-a2andvm-d2, and new connections are distributed between them. Allbackup VMs are removed from the active pool.As other primary VMs recover and pass their health checks, Google Cloudadds them to the active pool. For example, if
vm-a1becomes healthy,Google Cloud sets the active pool to the three healthy primary VMs,vm-a1,vm-a2, andvm-d2, and distributes new connections among them.
What's next
- To configure and test an external passthrough Network Load Balancer that uses failover, seeConfiguring failover for external passthrough Network Load Balancers.
- To configure and test an external passthrough Network Load Balancer with a backend service, seeSet up an external passthrough Network Load Balancer.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.