Troubleshoot internal passthrough Network Load Balancers

This guide describes how to troubleshoot configuration issues for a Google Cloud internal passthrough Network Load Balancer. Before investigating issues, familiarize yourself withthe following pages:

Troubleshoot common issues with Network Analyzer

Network Analyzerautomatically monitors your VPC network configuration and detectsboth suboptimal configurations and misconfigurations. It identifies networkfailures, provides root cause information, and suggests possible resolutions. Tolearn about the different misconfiguration scenarios that are automaticallydetected by Network Analyzer, seeLoad balancer insightsin the Network Analyzer documentation.

Network Analyzer is available in the Google Cloud console as a part ofNetwork Intelligence Center.

Go to Network Analyzer

Backends have incompatible balancing modes

When creating a load balancer, you might see the error:

Validation failed for instance groupINSTANCE_GROUP:backend services1 and2 point to the same instance groupbut the backends have incompatible balancing_mode. Values should be the same.

This happens when you try to use the same backend in two different loadbalancers, and the backends don't have compatible balancing modes.

For more information, see the following:

Troubleshoot general connectivity issues

If you can't connect to your internal passthrough Network Load Balancer, check for the following commonissues.

Verify firewall rules

  • Ensure that ingress allowfirewallrules are defined topermit health checks to backend VMs.
  • Ensure that ingress allowfirewall rules allow trafficto the backend VMs from clients.
  • Ensure that relevant firewall rules exist to allow traffic to reach thebackend VMs on the ports being used by the load balancer.
  • If you're usingtarget tags for the firewallrules, makesure that the load balancer's backend VMs are tagged appropriately.

To learn how to configure firewall rules required by your internal passthrough Network Load Balancer,seeConfiguring firewallrules.

Verify that the Guest environment is running on the backend VM

If you can connect to a healthy backend VM, but cannot connect to the loadbalancer, it might be that theGuest environment(formerly, the Windows Guest Environment or Linux Guest Environment)on the VM is either not running or is unable to communicate with the metadataserver (metadata.google.internal,169.254.169.254).

Check for the following:

  • Ensure that the Guest environment isinstalled and running on thebackend VM.
  • Ensure that the firewall rules within the guest operating system of thebackend VM (iptables or Windows Firewall) don't block access to themetadata server.

Verify that backend VMs accept packets sent to the load balancer

Each backend VM must be configured to accept packets sent to the load balancer.That is, the destination of packets delivered to the backend VMs is the IPaddress of the load balancer. Under most circumstances, this is done witha local route.

For VMs created from Google Cloud images, theGuestagent installs the localroute for the load balancer's IP address. Google Kubernetes Engine instances basedonContainer-Optimized OS implement this by usingiptables instead.

On a Linux backend VM, you can verify the presence of the local route byrunning the following command. ReplaceLOAD_BALANCER_IPwith the load balancer's IP address:

sudo ip route list table local | grepLOAD_BALANCER_IP

Verify service IP address and port binding on the backend VMs

Packets sent to an internal passthrough Network Load Balancer arrive at backend VMs with thedestination IP address of the load balancer itself. This type of load balanceris not a proxy, and this is expected behavior.

The software running on the backend VM must be doing the following:

  • Listening on (bound to) the load balancer's IP address or any IP address(0.0.0.0 or::)
  • Listening on (bound to) a port that's included in the load balancer'sforwarding rule

To test this, connect to a backend VM using either SSH or RDP. Then performthe following tests using eithercurl,telnet, or a similar tool:

  • Attempt to reach the service by contacting it using the internal IP addressof the backend VM itself,127.0.0.1, or localhost.
  • Attempt to reach the service by contacting it using the IP address of theload balancer's forwarding rule.

Check if the client VM is in the same region as the load balancer

If the client connecting to the load balancer is in another region, make surethatglobalaccessis enabled.

Verify that health check traffic can reach backend VMs

To verify that health check traffic reaches your backend VMs,enablehealth check logging and searchfor successful log entries.

You can also verify that load balancer functionality is healthy by viewing the"Healthy" statefor the backend.

If there are no healthy instances in the backend, make sure that the appropriatehealth check is configured and each VM in the backend is listening on theconfigured health check ports.

From a client in the same VPC network, run the followingcommand to verify that the backend VM is listening on a specific TCP port:

telnetSERVER_IP_ADDRESSPORT

Replace the following:

  • SERVER_IP_ADDRESS: The IP address of the backend VM.
  • PORT: The port that you configured for your health check.By default, the health check port is80.

Alternatively, you can use SSH to connect the backend VM and run the followingcommand:

curl localhost:PORT

Again, replacePORT with the port that you configured for your healthcheck.

Another way to perform this test is to run the following command:

netstat -npal | grep LISTEN

In the output, check for the following:

  • <var>LB_IP_ADDRESS</var>:<var>PORT</var>
  • 0.0.0.0:<var>PORT</var>
  • :::<var>PORT</var>

This does not determine whether routing is set up correctly to respond to theload balancer's IP address. That's a separate problem with a similarsymptom. For routing, run theip route list table local command and verifythat the load balancer's IP address is listed, as described inVerify thatbackend VMs accept packets sent to the load balancer.

Troubleshoot performance issues

If you are noticing performance issues and increased latency, check for thefollowing common issues.

Verify server functionality

If all of the backend servers are responding to health checks, verify thatrequests from the client are working properly when issued on the serverdirectly. For example, if the client is sending HTTP requests to the serverthrough the load balancer and there is no response or the response issubstantially slower than normal, issue the same HTTP request oneach of thebackend servers and observe the behavior.

If any of the individual backend servers are not behaving correctly when therequest is issued from within the server itself, you can conclude that theserver application stack is not working properly. You can focus furthertroubleshooting on the application itself. If all of the servers are behavingcorrectly, the next step is to look at the client side and the network.

Verify network connectivity and latency

If all of the backend servers are responding to requests properly, verifynetwork latency. From a client VM, issue a constant ping command toeach of theservers, as follows:

pingSERVER_IP_ADDRESS

This test shows the built-in network latency and whether the network is droppingpackets. In some cases, firewall rules might be blocking ICMP traffic. If so,this test fails to produce any result. Verify with your firewall rulesadministrator whether this is the case.

If theping command shows significantly higher latency than normal orsignificant packet loss, open aGoogle Cloud supportcase to investigate further.

Identify problematic client-server combinations

If the networkping test suggests low latency and no packet loss, thenext step is to identify which specific client-server combination, if any,produces problematic results. You can do this by reducing the number of backendservers by half until the number of servers reaches 1, while simultaneouslyreproducing the problematic behavior (for example, high latency or noresponses).

If you identify one or more problematic client-server combinations, performtraffic capture and analysis.

If no problematic client-server combination is identified, skip toperformance testing.

Perform traffic capture and analysis

If you identify a specific problematic combination of client-server, you can usepacket capture to pinpoint the part of the communication that is causingdelay or breakage. Packet capture can be done with tcpdump as follows:

  1. Install tcpdump on the server.
  2. Start tcpdump capture on the server.
  3. From a client, issue a sample request, such as the following:

    curlURL
  4. Analyze the tcpdump output to identify the problem.

Do performance testing

If you don't identify any problematic client-server combinations and aggregateperformance of all clients and servers together is lower than expected, considerthe following tests:

  1. One client and one server, without load balancing.
  2. One client and one server, with load balancing.

    Result: The combination of results from tests [1] and [2] identifieswhether the load balancer is causing the issue.

  3. One client and multiple servers, with load balancing.

    Result: Identify the performance limit of one client.

  4. Multiple clients and one server, with load balancing.

    Result: Identify the performance limit of one server.

  5. Multiple clients and multiple servers, without load balancing.

    Result: Identify the performance limit of the network.

When running a stress test with multiple clients and servers, client or serverresources (CPU, memory, I/O) might become bottlenecks and reduce the aggregateresults. Degraded aggregate results can happen even if each client and server isbehaving correctly.

Troubleshoot Shared VPC issues

If you are using Shared VPC and you cannot create a newinternal passthrough Network Load Balancer in a particular subnet, an organization policy might bethe cause. In the organization policy, add the subnet to the list of allowedsubnets or contact your organization administrator. For more information, referto theconstraints/compute.restrictSharedVpcSubnetworksconstraint.

Troubleshoot failover issues

If you've configured failover for an internal passthrough Network Load Balancer,the following sections describe the issues that can occur.

Connectivity

  • Make sure that you've designated at least one failover backend.
  • Verify your failover policy settings:
    • Failover ratio
    • Dropping traffic when all backend VMs are unhealthy
    • Disabling connection draining on failover

Issues with managed instance groups and failover

  • Symptom: The active pool is changing back and forth (flapping) betweenthe primary and failover backends.
  • Possible reason: Using managed instance groups with autoscaling andfailover might cause the active pool to repeatedly failover and failbackbetween the primary and failover backends. Google Cloud doesn't prevent you fromconfiguring failover with managed instance groups, because your deploymentmight benefit from this setup.

Disable connection draining restriction for failover groups

Disabling connection draining only works if the backend service is set up withprotocol TCP.

The following error message appears if you create a backend service with UDPwhile connection draining is disabled:

gcloud compute backend-services create my-failover-bs \    --global-health-checks \    --load-balancing-scheme=internal \    --health-checks=my-tcp-health-check \    --region=us-central1 \    --no-connection-drain-on-failover \    --drop-traffic-if-unhealthy \    --failover-ratio=0.5 \    --protocol=UDPERROR: (gcloud.compute.backend-services.create) Invalid value for[--protocol]: can only specify --connection-drain-on-failover if the protocol isTCP.

Traffic is sent to unexpected backend VMs

First check the following: If the client VM isalso a backend VM of theload balancer, it's expected behavior that connections sent to the IP address ofthe load balancer's forwarding rule are always answered by the backend VMitself. For more information, refer totesting connections from a singleclient andsending requests from load balancedVMs.

If the client VM isnot a backend VM of the load balancer:

  • For requests from a single client, refer totesting connections from a singleclient so that you understand the limitations ofthis method.

  • Ensure that you have configured ingress allowfirewall rules to allowhealth checks.

  • For a failover configuration, make sure that you understand how membership intheactive pool works, and whenGoogle Cloud performsfailover andfailback. Inspect your load balancer'sconfiguration:

    • Use the Google Cloud console to check for the number of healthy backendVMs in each backend instance group. The Google Cloud console also showsyou which VMs are in the active pool.

    • Make sure that your load balancer's failover ratio is set appropriately.For example, if you have ten primary VMs and a failover ratio set to0.2,this means Google Cloud performs a failover whenfewer than two(10 × 0.2 = 2) primary VMs are healthy. A failover ratio of0.0 has aspecial meaning: Google Cloud performs a failover when no primaryVMs are healthy.

Existing connections are terminated during failover or failback

Edit your backend service's failover policy. Ensure thatconnection draining on failover is enabled.

Troubleshoot load balancer as next hop issues

When you set aninternal passthrough Network Load Balancer to be a nexthop of a custom static route, the following issuesmight occur:

Connectivity issues

  • If you cannot ping an IP address in the destination range of a route whosenext hop is a forwarding rule for an internal passthrough Network Load Balancer, note that a routeusing this type of next hop might not process ICMP traffic depending on whenthe route was created. If the route was created before May 15, 2021, it onlyprocesses TCP and UDP traffic until August 16, 2021. Starting August 16, 2021,all routes will automatically forward all protocol traffic(TCP, UDP, and ICMP) regardless of when a route was created. If you don't wantto wait until then, you can enable ping functionality now by creating newroutes and deleting the old ones.

  • When using an internal passthrough Network Load Balancer as a next hop for a customstatic route, all traffic is delivered to the load balancer'shealthy backend VMs, regardless of the protocol configured for the loadbalancer's internal backend service, and regardless of the port or portsconfigured on the load balancer's internal forwarding rule.

  • Ensure that you have created ingress allow firewall rules that correctlyidentify sources of traffic that should be delivered to backend VMs via thecustom static route's next hop. Packets that arrive on backend VMs preservetheir source IP addresses, even when delivered by way of a custom staticroute.

Invalid value for destination range

The destination range of a custom static route can't be more specific than anysubnet route in your VPC network. If you receive the followingerror message when creating a custom static route:

Invalid value for field 'resource.destRange': [ROUTE_DESTINATION].[ROUTE_DESTINATION] hides the address space of the network .... Cannot changethe routing of packets destined for the network.
  • You cannot create a custom static route with a destination that exactlymatches or is more specific (with a longer mask) than asubnet route. Refer toapplicability and order for furtherinformation.

  • If packets go to an unexpected destination, removeother routes in your VPC network with more specificdestinations. Review therouting orderto understand Google Cloud route selection.

Troubleshoot logging issues

If you configure logging for an internal passthrough Network Load Balancer, the followingissues might occur:

  • RTT measurements such as byte values might be missing in some of the logsif not enough packets are sampled to capture RTT. This is more likely to happenfor low volume connections.
  • RTT values are available only for TCP flows.
  • Some packets are sent with no payload. If header-only packets are sampled,the bytes value is0.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.