Troubleshoot internal connectivity between VMs
This document provides troubleshooting steps for connectivity issues betweenCompute Engine VMs that are in the same Virtual Private Cloud (VPC) network (eitherShared VPC or standalone) or two VPC networks connectedwith VPC Network Peering. It assumes that the VMs are communicating usingthe internal IP addresses of their respective virtual network interfacecontrollers (vNICs).
The steps in this guide apply to both Compute Engine VMs and Google Kubernetes Engine nodes.
If you would like to see specific additional troubleshooting scenarios,click theSend feedback link at the bottom of the page and let us know.
The following VM and VPC configurations are applicable to this guide:
- VM-to-VM connections using internal IP addresses in a singleVPC network.
- VM-to-VM connections using internal IP addresses within aShared VPC network.
- VM-to-VM connections using internal IP addresses in differentVPC networks peered using VPC Network Peering.
Commands used in this guide are available on all Google-provided OS images. Ifyou are using your own OS image, you might have to install the tools.
Note: When troubleshooting, it's useful to record the commands you run and theresults you get in a document. You can use this document to check yourown work and to let others know what you've researched. In addition, ifyou do need to open a support ticket, the document can speed upresolving your issue.Quantify the problem
- If you think you have complete packet loss, go toTroubleshoot complete connection failure.
- If you are experiencing latency, only partial packet loss, or timeoutsoccurring mid-connection, go toTroubleshoot network latency or loss causing throughput issues.
Troubleshoot complete connection failure
The following sections provide steps for troubleshooting complete internalconnectivity failure between VMs. If you are instead experiencingincreased latency or intermittent connection timeouts, skip toTroubleshoot network latency or loss causing throughput issues.
Determine connection values
First gather the following information:
- From theVM instances page,gather the following for both VMs:
- VM names
- VM zones
- Internal IP addresses for the vNICs that are communicating
From the configuration of the destination server software, gather thefollowing information:
- Layer 4 protocol
- Destination port
For example, if your destination is an HTTPS server, the protocol is TCPand the port is usually
443, but your specific configuration might use adifferent port.
If you're seeing issues with multiple VMs, pick a single source and singledestination VM that are experiencing issues and use those values.In general, you shouldn't need the source port of the connection.
Once you have this information, proceed toInvestigate issues with the underlying Google network.
Investigate issues with the underlying Google network
If your setup is anexisting one that hasn't changed recently, then the issue might be with theunderlying Google network. Check the Network Intelligence CenterPerformance Dashboard
forpacket loss between the VM zones. If there is an increase in packet loss between the zonesduring the timeframe when you experienced network timeouts, it might indicatethat the problem was with the physical network underlying yourvirtual network. Check theGoogle Cloud Status Dashboard for known issues before filing a support case.If the issue does not seem to be with the underlying Google network, proceed toCheck for misconfigured Google Cloud firewall rules.
Check for misconfigured firewall rules in Google Cloud
Note: This section uses Connectivity Tests, which can incur charges.For pricing details, seeNetwork Intelligence Centerpricing.Connectivity Tests analyzes the VPC network pathconfiguration between two VMs and shows whether the programmed configurationshould allow the traffic or not. If the traffic is not allowed, the resultsshow whether a Google Cloud egress or ingress firewall rule is blockingthe traffic or if a route isn't available.
Connectivity Tests might also dynamically test the path by sendingpackets between the hypervisors of the VMs. If these tests are performed, thenthe results of those tests are displayed.
Connectivity Tests examines the configuration of theVPC network only. It does not test the operating system firewallor operating system routes or the server software on the VM.
The following procedure runs Connectivity Tests fromGoogle Cloud console. For other ways to run tests, seeRunningConnectivity Tests.
Use the following procedure to create and run a test:
In the Google Cloud console, go to theConnectivity Tests page.
In the project pull-down menu, confirm you are in the correct project orspecify the correct one.
ClickCreate connectivity test.
Give the test a name.
Specify the following:
- Protocol
- Source endpoint IP address
- Source project and VPC network
- Destination endpoint IP address
- Destination project and VPC network
- Destination port
ClickCreate.
The test runs immediately. To see the result diagram, clickView in the intheResult details column.
- If the results say the connection is dropped by a Google Cloudfirewall rule, determine if your intended security setupshould allow theconnection. You might have to ask your security or networkadministrator for details. If the traffic should be allowed, then checkthe following:
- Check theAlways blocked trafficlist. If the traffic is blocked by Google Cloud as described inthe always blocked traffic list, then your existing configuration won'twork.
- Go to the Firewall policies page and review your firewall rules. If the firewallis misconfigured, create or modify a firewallrule to allow the connection. This rule can be aVPCfirewall rule or ahierarchical firewall policyrule.
- If there is a correctly configured firewall rule that blocks this traffic, check with your security or network administrator. If the security requirements of your organization mean that the VMs shouldn't reach each other, you might need to redesign your setup.
- If the results indicate that there are no issues with theVPC connectivity path, then the issue might be one of thefollowing.
- Issues with the guest OS configuration, such as issueswith firewall software.
- Issues with the client or server applications, such asthe application being frozen or configured to listen on the wrong port.
Subsequent steps walk you through examining each of these possibilities.Continue withTest TCP connectivity from inside the VM.
Test TCP connectivity from inside the VM
If your VM-VM Connectivity Test did not detect aVPC configuration issue, start testing OS-OS connectivity. Thefollowing steps help you determine the following:
- If a TCP server is listening at the indicated port
- If the server-side firewall software is allowing connections to that portfrom the client VM
- If the client-side firewall software is allowing connections to that port onthe server
- If the server-side route table is correctly configured to forward packets
- If the client-side route table is correctly configured to forward packets
You can test the TCP handshake usingcurl with Linux or Windows 2019, orusing theNew-Object System.Net.Sockets.TcpClient command with WindowsPowershell. The workflow in this section should result in one of the followingoutcomes: connection success, connection timeout, or connection reset.
- Success: If the TCP handshake completes successfully, then an OSfirewall rule is not blocking the connection, the OS is correctlyforwarding packets, and a server of some kind is listening on thedestination port. If this is the case, then the issue might bewith the application itself. To check, seeCheck server logging forinformation about server behavior.
- Timeout: If your connection times out, it usually means one of thefollowing:
- There's no machine at that IP address
- There's a firewall somewhere silently discarding your packets
- OS packet routing is sending the packets to a destination that can'tprocess them, or asymmetric routing is sending the return packet on aninvalid path
Reset: If the connection is being reset, it means that thedestination IP is receiving packets, but an OS or an application isrejecting the packets. This can mean one of the following:
- The packets are arriving at the wrong machine and it is not configuredto respond to that protocol on that port
- The packets are arriving at the correct machine, but no serveris listening on that port
- The packets are arriving at the correct machine and port, buthigher level protocols (such as SSL) aren't completing their handshake
- A firewall is resetting the connection. This is less likely thana firewall silently discarding the packets, but it can happen.
Linux
In the Google Cloud console, go to theFirewall policies page.
Ensure that there is afirewall rule that allows SSH connections fromIAP to your VM orcreate a new one.
In the Google Cloud console, go to theVM instances page.
Find your source VM.
ClickSSH in theConnect column for that VM.
From the client machine command line, run the following command. ReplaceDEST_IP:DEST_PORT with your destinationIP address and port.
curl -vso /dev/null --connect-timeout 5DEST_IP:DEST_PORT
Windows
In the Google Cloud console, go to theVM instances page.
Find your source VM.
Use one of the methods described inConnecting to WindowsVMs to connect to your VM.
From the client machine command line, run the following:
- Windows 2019:
curl -vso /dev/null --connect-timeout 5DEST_IP:DEST_PORT
- Windows 2012 or Windows 2016 Powershell:
PS C:> New-Object System.Net.Sockets.TcpClient('DEST_IP',DEST_PORT)`
- Windows 2019:
Connection success
The following results indicate a successful TCP handshake.If the TCP handshake completes successfully, then the issue is not related toTCP connection timeout or reset. Instead, the timeout issue is occurring withinthe application layers. If you get a successful connection, proceed toCheck server logging for information about server behavior.
Linux and Windows 2019
$curl -vso /dev/null --connect-timeout 5 192.168.0.4:443The "Connected to" line indicates a successful TCP handshake.
Expire in 0 ms for 6 (transfer 0x558b3289ffb0)Expire in 5000 ms for 2 (transfer 0x558b3289ffb0) Trying 192.168.0.4...TCP_NODELAY setExpire in 200 ms for 4 (transfer 0x558b3289ffb0)Connected to 192.168.0.4 (192.168.0.4) port 443 (#0)> GET / HTTP/1.1> Host: 192.168.0.4:443> User-Agent: curl/7.64.0> Accept: */*>Empty reply from serverConnection #0 to host 192.168.0.4 left intact
Windows 2012 and 2016
PS C:\>New-Object System.Net.Sockets.TcpClient('DEST_IP_ADDRESS',PORT)Connection successful result. The "Connected: True" line is relevant.
Available : 0Client : System.Net.Sockets.SocketConnected : TrueExclusiveAddressUse : FalseReceiveBufferSize : 131072SendBufferSize : 131072ReceiveTimeout : 0SendTimeout : 0LingerState : System.Net.Sockets.LingerOptionNoDelay : False
Connection timeout
The following results indicate that the connection has timed out. If yourconnection is timing out, proceed toVerify server IP address and port.
Linux and Windows 2019
$curl -vso /dev/null --connect-timeout 5DEST_IP_ADDRESS:PORTConnection timeout result:
Trying 192.168.0.4:443...Connection timed out after 5000 millisecondsClosing connection 0
Windows 2012 and 2016
PS C:\>New-Object System.Net.Sockets.TcpClient('DEST_IP_ADDRESS',PORT)Connection timeout result:
New-Object: Exception calling ".ctor" with "2" argument(s): "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. 192.168.0.4:443"
Connection reset
A reset is when a device sends a RST packet back to theclient, informing the client that the connection has been terminated. Theconnection might be reset for one of the following reasons:
- The receiving server was not configured to acceptconnections for that protocol on that port. This could be because the packetwas sent to the wrong server or the wrong port, or the server software wasmisconfigured.
- Firewall software rejected the connection attempt
If the connection was reset, proceed toVerify that you are accessing the correct IP address and port.
Linux and Windows 2019
$curl -vso /dev/null --connect-timeout 5DEST_IP_ADDRESS:PORTConnection reset result:
Trying 192.168.0.4:443...connect to 192.168.0.4 port 443 failed: Connection refusedFailed to connect to 192.168.0.4 port 443: Connection refusedClosing connection 0
Windows 2012 and 2016
PS C:\>New-Object System.Net.Sockets.TcpClientt('DEST_IP_ADDRESS',PORT)Connection reset result:
New-Object: Exception calling ".ctor" with "2" argument(s): "No connection could be made because the target machine actively refused it. 192.168.0.4:443"
Verify server IP address and port
Run one of the following commands on your server. They indicate if there is aserver listening on the necessary port.
Linux
$sudo netstat -ltuvnpThe output shows that a TCP server is listening to any destination IP address(0.0.0.0) at port 22, accepting connections from any source address(0.0.0.0) and any source port (*). ThePID/Program name column specifiesthe executable bound to the socket.
Active Internet connections (only servers)Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program nametcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 588/sshdtcp6 0 0 :::22 :::* LISTEN 588/sshdudp 0 0 0.0.0.0:68 0.0.0.0:* 334/dhclientudp 0 0 127.0.0.1:323 0.0.0.0:* 429/chronydudp6 0 0 ::1:323 :::* 429/chronyd
Windows
Note: If you are using a UDP based server, Windows also offers the"Get-NetUdpEndpoint" command.PS C:\>Get-NetTcpConnection -State "LISTEN" -LocalPortDEST_PORTOutput shows results of command run withDEST_PORT set to443.This output shows that a TCP server is listening to any address (0.0.0.0) atport443, accepting connections from any source address (0.0.0.0) and any sourceport (0). TheOwningProcess column indicates the process ID of the processlistening to the socket.
LocalAddress LocalPort RemoteAddress RemotePort State AppliedSetting OwningProcess------------ --------- ------------- ---------- ----- -------------- -------------:: 443 :: 0 Listen 9280.0.0.0 443 0.0.0.0 0 Listen 928
If you see that the server is not bound to the correct port or IP, or that theremote prefix does not match your client, consult the server'sdocumentation or vendor to resolve the issue. The server must be bound to the IPaddress of a particular interface or to0.0.0.0, and it must accept connectionsfrom the correct client IP prefix or0.0.0.0.
If the application server is bound to the correct IP address and port,it might be that the client is accessing the wrong port, that a higher-levelprotocol (frequently TLS) is actively refusing the connection, orthat there is a firewall rejecting the connection.
Check that the client and server are using the same TLS version andencryption formation.
Check that your client is accessing the correct port.
If the preceding steps don't resolve the problem, proceed toCheck firewall on client and server for packet discards.
Check firewall on client and server for packet discards
If the server is unreachable from the client VM but is listening on the correctport, one of the VMs might be running firewall software that is discarding packetsassociated with the connection. Check the firewall on both the client andserver VMs using the following commands.
If a rule is blocking your traffic, you can update the firewall software to allow thetraffic. If you do update the firewall, proceed cautiously as you prepare andexecute the commands because a misconfigured firewall can block unexpected traffic.Consider setting upVM Serial Console access before proceeding.
Linux iptables
Check packet counts for the number of packets processed for each installediptables chain and rule. Determine which DROP rules are being matched against bycomparing source and destination IP addresses and ports with the prefixes and portsspecified by iptables rules.
If a matched rule is showing increasing discards with connection timeouts,consult the iptables documentation to apply the correctallow rule to theappropriate connections.
$sudo iptables -L -n -v -xThis example INPUT chain shows that packets from any IP address to any IPaddress using destination TCP port5000 will be discarded at the firewall.The pkts column indicates that the rule has dropped 10342 packets. As atest, if you create connections that are discarded by this rule, you willsee the pkts counter increase, confirming the behavior.
Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination10342 2078513 DROP tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:5000
You can add an ingress or egress rule to iptables with the following commands:
Ingress rule:
$sudo iptables -A INPUT -p tcp -sSOURCE_IP_PREFIX --dportSERVER_PORT -j ACCEPTEgress rule:
$sudo iptables -A OUTPUT -p tcp -dDEST_IP_PREFIX --dportDEST_PORT -j ACCEPTWindows Firewall
Check inWindows Firewall that the connection is permitted to egress from theclient and ingress to the server. If a rule is blocking your traffic, makethe needed corrections in Windows Firewall to allow the connections. Youcan also enable Windows Firewall Logging.
The default DENY behavior of Windows Firewall is to silently discard deniedpackets, resulting in timeouts.
This command checks the server. To check the egress rules on theclient VM, change the-match value toOutbound.
PS C:\>Get-NetFirewallPortFilter | `>> Where-Object LocalPort -match "PORT" | `>> Get-NetFirewallRule | `>> Where-Object {$_.Direction -match "Inbound" -and $_.Profile -match "Any"}Name : {80D79988-C7A5-4391-902D-382369B4E4A3}DisplayName : iperf3 udpDescription :DisplayGroup :Group :Enabled : TrueProfile : AnyPlatform : {}Direction : InboundAction : AllowEdgeTraversalPolicy : BlockLooseSourceMapping : FalseLocalOnlyMapping : FalseOwner :PrimaryStatus : OKStatus : The rule was parsed successfully from the store. (65536)EnforcementStatus : NotApplicablePolicyStoreSource : PersistentStorePolicyStoreSourceType : LocalYou can add a new firewall rules to Windows with the following commands.
Egress Rule:
PS C:\>netsh advfirewall firewall add rule name="My Firewall Rule" dir=out action=allow protocol=TCP remoteport=DEST_PORTIngress Rule:
PS C:\>netsh advfirewall firewall add rule name="My Firewall Rule" dir=in action=allow protocol=TCP localport=PORTThird-party software
Third-party application firewalls or antivirus software can also drop orreject connections. Consult the documentation provided by yourvendor.
If you find a problem with firewall rules and correct it, retest yourconnectivity. If firewall rules don't seem to be the problem, proceed toCheck configuration of OS routing.
Check OS routing configuration
Operating system routing issues can come from one of the following situations:
- Routing issues are most common on VMs with multiple network interfacesbecause of the additional routing complexity
- On a VM created in Google Cloud with a single network interface,routing issues normally only happen if someone has manually modified thedefault routing table
- On a VM that was migrated from on-premises, the VM might carry overrouting or MTU settings that were needed on premises but which are causingproblems in the VPC network
If you are using a VM with multiple network interfaces, routes must be configuredto egress to the correct vNIC and subnet. For example, a VM might have routesconfigured so that traffic intended for internal subnets is sent to one vNIC,but the default gateway (destination0.0.0.0/0) is configured on anothervNIC which has an external IP address or access to Cloud NAT.
You can review routes by checking individual routes one at a time or by lookingat the entire VM routing table. If either approach reveals issues with therouting table, consult the steps inUpdate routing tables if needed for instructions.
Review all routes
List all your routes to understand what routes already exist on your VM.
Linux
$ip route show table alldefault via 10.3.0.1 dev ens410.3.0.1 dev ens4 scope linklocal 10.3.0.19 dev ens4 table local proto kernel scope host src 10.3.0.19broadcast 10.3.0.19 dev ens4 table local proto kernel scope link src 10.3.0.19broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1::1 dev lo proto kernel metric 256 pref mediumfe80::/64 dev ens4 proto kernel metric 256 pref mediumlocal ::1 dev lo table local proto kernel metric 0 pref mediumlocal fe80::4001:aff:fe03:13 dev ens4 table local proto kernel metric 0 pref mediummulticast ff00::/8 dev ens4 table local proto kernel metric 256 pref medium
Windows
PS C:\>Get-NetRouteifIndex DestinationPrefix NextHop RouteMetric ifMetric PolicyStore------- ----------------- ------- ----------- -------- -----------4 255.255.255.255/32 0.0.0.0 256 5 ActiveStore1 255.255.255.255/32 0.0.0.0 256 75 ActiveStore4 224.0.0.0/4 0.0.0.0 256 5 ActiveStore1 224.0.0.0/4 0.0.0.0 256 75 ActiveStore4 169.254.169.254/32 0.0.0.0 1 5 ActiveStore1 127.255.255.255/32 0.0.0.0 256 75 ActiveStore1 127.0.0.1/32 0.0.0.0 256 75 ActiveStore1 127.0.0.0/8 0.0.0.0 256 75 ActiveStore4 10.3.0.255/32 0.0.0.0 256 5 ActiveStore4 10.3.0.31/32 0.0.0.0 256 5 ActiveStore4 10.3.0.1/32 0.0.0.0 1 5 ActiveStore4 10.3.0.0/24 0.0.0.0 256 5 ActiveStore4 0.0.0.0/0 10.3.0.1 0 5 ActiveStore4 ff00::/8 :: 256 5 ActiveStore1 ff00::/8 :: 256 75 ActiveStore4 fe80::b991:6a71:ca62:f23f/128 :: 256 5 ActiveStore4 fe80::/64 :: 256 5 ActiveStore1 ::1/128 :: 256 75 ActiveStore
Check individual routes
If a particular IP prefix seems to be the problem, check that proper routesexists for the source and destination IPs within the client and server VMs.
Linux
$ip route getDEST_IPGood result:
A valid route is shown. In this case, the packets egress from interfaceens4.
10.3.0.34 via 10.3.0.1 dev ens4 src 10.3.0.26 uid 1000 cache
Bad result:
This result confirms that packets are being discarded becausethere is no pathway to the destination network. Confirm that your routetable contains a path to the correct egress interface.
**RTNETLINK answers: Network is unreachable
Windows
PS C:\>Find-NetRoute -RemoteIpAddress "DEST_IP"Good result:
IPAddress : 192.168.0.2InterfaceIndex : 4InterfaceAlias : EthernetAddressFamily : IPv4Type : UnicastPrefixLength : 24PrefixOrigin : DhcpSuffixOrigin : DhcpAddressState : PreferredValidLifetime : 12:53:13PreferredLifetime : 12:53:13SkipAsSource : FalsePolicyStore : ActiveStoreCaption :Description :ElementName :InstanceID : ;:8=8:8:9<>55>55:8:8:8:55;AdminDistance :DestinationAddress :IsStatic :RouteMetric : 256TypeOfRoute : 3AddressFamily : IPv4CompartmentId : 1DestinationPrefix : 192.168.0.0/24InterfaceAlias : EthernetInterfaceIndex : 4InterfaceMetric : 5NextHop : 0.0.0.0PreferredLifetime : 10675199.02:48:05.4775807Protocol : LocalPublish : NoState : AliveStore : ActiveStoreValidLifetime : 10675199.02:48:05.4775807PSComputerName :ifIndex : 4
Bad result:
Find-NetRoute : The network location cannot be reached. For information about network troubleshooting, see Windows Help.At line:1 char:1+ Find-NetRoute -RemoteIpAddress "192.168.0.4"+ ---------------------------------------- + CategoryInfo : NotSpecified: (MSFT_NetRoute:ROOT/StandardCimv2/MSFT_NetRoute) [Find-NetRoute], CimException + FullyQualifiedErrorId : Windows System Error 1231,Find-NetRouteThis command confirms that packets are being discardedbecause there is no pathway to the destination IP address. Check thatyou have a default gateway, and the gateway is applied to thecorrect vNIC and network.
Update routing tables
If needed, you can add a route to your operating system's route table. Before running acommand to update the routing VM's routingtable, we recommend you familiarize yourself with the commands and develop anunderstanding of the possible implications. Improper use of route update commandsmight cause unexpected problems or disconnection to the VM. Consider setting upVM Serial Console access before proceeding.
Consult your operating system documentation for instructions on updating routes.
If you find a problem with routes and correct it, retest your connectivity. Ifroutes don't seem to be the problem, proceed toCheck interface MTU.
Check MTU
A VM's interface MTU should match the MTU of the VPCnetwork it is attached to. Ideally, VMs that are communicating with each otheralso have matching MTUs. Mismatched MTUs are normally not an issue for TCP,but can be for UDP.
Check the MTU of the VPC. If the VMs are in two differentnetworks, check both networks.
gcloud compute networks describeNET_NAME --format="table(name,mtu)"
Check the MTU configuration for your client and server network interfaces.
Linux
$netstat -iThe lo (loopback) interface always has an MTU of 65536 and can be ignoredfor this step.
Kernel Interface tableIface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flgens4 1460 8720854 0 0 0 18270406 0 0 0 BMRUlo 65536 53 0 0 0 53 0 0 0 LRU
Windows
PS C:\>Get-NetIpInterfaceLoopback Pseudo-Interfaces always have an MTU of 4294967295 and can beignored for this step.
ifIndex InterfaceAlias Address NlMtu(Bytes) Interface Dhcp Connection PolicyStore Family Metric State------- -------------- ------- ------------ --------- ---- ---------- -----------4 Ethernet IPv6 1500 5 Enabled Connected ActiveStore1 Loopback Pseudo-Interface 1 IPv6 4294967295 75 Disabled Connected ActiveStore4 Ethernet IPv4 1460 5 Enabled Connected ActiveStore1 Loopback Pseudo-Interface 1 IPv4 4294967295 75 Disabled Connected Active
If the interface and network MTUs don't match, you can reconfigure theinterface MTU. For more information, seeVMs and MTUsettings. If they do match, and if youhave followed the troubleshooting steps this far, then the issueis likely with the server itself. For guidance on troubleshooting server issues,proceed toCheck server logging for information about serverbehavior.
Check server logging for information about server behavior
If the preceding steps don't resolve an issue, the application might be causingthe timeouts. Check server and application logs for behavior that would explainwhat you're seeing.
Log sources to check:
- Cloud Logging for the VM
- VM Serial Logs
- Linux syslog and kern.log, or Windows Event Viewer
If you're still having issues
If you're still having issues, seeGettingsupport for next steps. It's useful to have theoutput from the preceding troubleshooting steps available to share with othercollaborators.
Troubleshoot network latency or loss causing throughput issues
Network latency or loss issues are typically caused by resource exhaustion orbottlenecks within a VM or network path. Occasionally, network loss cancause intermittent connection timeouts. Causes like vCPU exhaustionor vNIC saturation result in increased latency andpacket loss leading to a reduction in network performance.
The following instructions assume that connections are not consistently timingout and you are instead seeing issues of limited capacity or performance. If youare seeing complete packet loss, seeTroubleshoot complete connection failure.
Small variations in latency, such as latencies varying by a few milliseconds,are normal. Latencies vary because of network load or queuing inside the VM.
Determine connection values
First gather the following information:
- From theVM instances page,gather the following for both VMs:
- VM names
- VM zones
- Internal IP addresses for the vNICs that are communicating
- From the configuration of the destination server software, gather thefollowing information:
- Layer 4 protocol
- Destination port
If you're seeing issues with multiple VMs, pick a single source and singledestination VM that are experiencing issues and use those values.In general, you shouldn't need the source port of the connection.
Once you have this information, proceed toInvestigate issues with the underlying Google network.
Investigate issues with the underlying Google network
If your setup is anexisting one that hasn't changed recently, then the issue might be with theunderlying Google network. Check the Network Intelligence CenterPerformance Dashboard
forpacket loss between the VM zones. If there is an increase in packet loss between the zonesduring the timeframe where you experienced network timeouts, it might indicatethat the problem is with the physical network underlying yourvirtual network. Check theGoogle Cloud Status Dashboard for known issues before filing a support case.If the issue does not seem to be with the underlying Google network, proceed toCheck handshake latency.
Check handshake latency
All connection-based protocols incur some latency while they do theirconnection setup handshake. Each protocol handshake adds to the overhead. ForSSL/TLS connections, for example, the TCP handshake has to complete before theSSL/TLS handshake can start, then the TLS handshake has to complete before datacan be transmitted.
Handshake latency in the same Google Cloud zone is usually negligible, buthandshakes to globally distant locations might add greater delays atconnection initiation. If you have resources in distant regions, you can checkto see if the latency you're seeing is due to protocol handshake.
Linux and Windows 2019
$curl -o /dev/null -Lvs -w 'tcp_handshake: %{time_connect}s, application_handshake: %{time_appconnect}s'DEST_IP:PORTtcp_handshake: 0.035489s, application_handshake: 0.051321s
- tcp_handshake is duration from when the client sends theinitial SYN packet to when the client sends the ACK of the TCP handshake.
- application_handshake is the time from the first SYN packet ofthe TCP handshake to the completion of the TLS (typically) handshake.
- additional handshake time = application_handshake - tcp_handshake
Windows 2012 and 2016
Not available with default OS tooling. ICMP round-trip time can be used asa reference if firewall rules allow.
If the latency is more than the handshakes would account for, proceed toDetermine the maximum throughput of your VM type.
Determine the maximum throughput of your VM type
VM networkegress throughput is limited by the VM CPU architecture andvCPU count. Determine the potential egress bandwidth of your VM by consultingtheNetwork bandwidth page.
If your VM is not capable of meeting your egress requirements, considerupgrading to a VM with greater capacity. For instructions, seeChanging the machine type of an instance.
If your machine type should allow sufficient egress bandwidth, theninvestigate whether Persistent Disk usage is interfering with your networkegress. Persistent Disk operations are allowed to occupy up to 60% of thetotal network throughput of your VM. To determine if Persistent Diskoperations might be interfering with network throughput, seeCheck Persistent Disk performance.
Networkingress to a VM is not limited by the VPC network orthe VM instance type. Instead, it is determined by the packet queuing andprocessing performance of the VM operating system or application. If youregress bandwidth is adequate but you're seeing ingress issues, seeCheck server logging for information about server behavior.
Check interface MTU
The MTU of a VPC network is configurable. The MTU of interfaceon the VM should match the MTU value for theVPC network it is attached to. In a VPC Network Peeringsituation, VMs in different networks can have different MTUs. When this scenariooccurs, apply the smaller MTU value to the associated interfaces. MTUmismatches are normally not an issue for TCP, but can be for UDP.
Check the MTU of the VPC. If the VMs are in two differentnetworks, check both networks.
gcloud compute networks describeNET_NAME --format="table(name,mtu)"
Check the MTU configuration for your network interface.
Linux
The lo (loopback) interface always has an MTU of 65536 and can be ignored forthis step.
$netstat -iKernel Interface tableIface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flgens4 1460 8720854 0 0 0 18270406 0 0 0 BMRUlo 65536 53 0 0 0 53 0 0 0 LRU
Windows
PS C:\>Get-NetIpInterfaceLoopback Pseudo-Interfaces always have an MTU of 4294967295 and can be ignoredfor this step.
ifIndex InterfaceAlias Address NlMtu(Bytes) Interface Dhcp Connection PolicyStore Family Metric State------- -------------- ------- ------------ --------- ---- ---------- -----------4 Ethernet IPv6 1500 5 Enabled Connected ActiveStore1 Loopback Pseudo-Interface 1 IPv6 4294967295 75 Disabled Connected ActiveStore4 Ethernet IPv4 1460 5 Enabled Connected ActiveStore1 Loopback Pseudo-Interface 1 IPv4 4294967295 75 Disabled Connected Active
If the interface and network MTUs don't match, you can reconfigure theinterface MTU. For instructions on updating MTU for Windows VMs, seeVMs and MTUsettings. If they do match, then the issueis likely might be server availability. The next step is toCheck logs to see if a VM was rebooted, stopped, or live migrated to see if anything happened to your VM during the relevant time.
Check logs to see if a VM was rebooted, stopped, or live migrated
During the lifecycle of a VM, a VM can be user-rebooted, live-migrated forGoogle Cloud maintenance, or, in rare circumstances, a VM might be lost andrecreated if there is a failure within the physical host containing your VM.These events might cause a brief increase in latency or connection timeouts. Ifany of these things happens to the VM, the event is logged.
To view logs for your VM, do the following:
In the Google Cloud console, go to theLogging page.
Choose the timeframe of when the latency occurred.
Use the following Logging query to determine if a VM event occurred nearthe timeframe when the latency occurred:
resource.labels.instance_id:"INSTANCE_NAME"resource.type="gce_instance"( protoPayload.methodName:"compute.instances.hostError" OR protoPayload.methodName:"compute.instances.OnHostMaintenance" OR protoPayload.methodName:"compute.instances.migrateOnHostMaintenance" OR protoPayload.methodName:"compute.instances.terminateOnHostMaintenance" OR protoPayload.methodName:"compute.instances.stop" OR protoPayload.methodName:"compute.instances.reset" OR protoPayload.methodName:"compute.instances.automaticRestart" OR protoPayload.methodName:"compute.instances.guestTerminate" OR protoPayload.methodName:"compute.instances.instanceManagerHaltForRestart" OR protoPayload.methodName:"compute.instances.preempted")
If VMs didn't restart or migrate during the relevant time, the issue might bewith resource exhaustion. To check, proceed toCheck network and OS statistics for packet discards due to resource exhaustion.
Check network and OS statistics for packet discards due to resource exhaustion
Resource exhaustion is a general term that means that some resource on theVM, such as egress bandwidth, is being asked to handle more than it can.Resource exhaustion can result in the periodic discards of packets, which causesconnection latency or timeouts. These timeouts might not be visible atclient or server startup, but might appear over time as a system exhausts resources.
The following is a list of commands which display packet counters andstatistics. Some of these commands duplicate the results of other commands. Insuch cases, you can use whichever command works better for you. See thenotes within each section to better understand the intended outcome of runningthe command. It can be useful to run the commands at different timesto see if discards or errors are occurring at the same time as the issue.
Linux
Use the
netstatcommand to view network statistics.$netstat -sTcpExt: 341976 packets pruned from receive queue because of socket buffer overrun 6 ICMP packets dropped because they were out-of-window 45675 TCP sockets finished time wait in fast timer 3380 packets rejected in established connections because of timestamp 50065 delayed acks sent
The netstat command outputs network statistics containing values fordiscarded packets by protocol. Discarded packets might be the result ofresource exhaustion by the application or network interface. View thecounter reason for indication of why a counter was incremented.
Check kern.log for logs matching
nf_conntrack: table full, droppingpacket.Debian:
cat /var/log/kern.log | grep "dropping packet"CentOS:
sudo cat /var/log/dmesg | grep "dropping packet"This log indicates that the connection tracking table forVM has reached the maximum connections that can be tracked. Furtherconnections to and from this VM might timeout. If conntrack has beenenabled, the maximum connection count can be found with:
sudo sysctl net.netfilter.nf_conntrack_maxYou can increase the value for maximum tracked connections bymodifying sysctl
net.netfilter.nf_conntrack_maxor by spreading aVMs workload across multiple VMs to reduce load.
Windows UI
Perfmon
- Using the Windows menu, search for "perfmon" and open theprogram.
- On the left-menu, selectPerformance > Monitoring Tools > Performance Monitor.
- In the main view, click the green plus "+" to add performance counters to themonitoring graph. The following counters are of interest:
- Network Adapter
- Output Queue Length
- Packets Outbound Discarded
- Packets Outbound Errors
- Packets Received Discarded
- Packets Received Errors
- Packets Received Unknown
- Network Interface
- Output Queue Length
- Packets Outbound Discarded
- Packets Outbound Errors
- Packets Received Discarded
- Packets Received Errors
- Packets Received Unknown
- Per Processor Network Interface Card Activity
- Low Resource Receive Indications per sec
- Low Resource Received Packets per sec
- Processor
- % Interrupt Time
- % Privileged Time
- % Processor Time
- % User Time
- Network Adapter
Pefmon lets you plot the preceding counters on a time series graph.This can be beneficial to watch when testing is occurring or a server isimpacted. Spikes in CPU-related counters such as Interrupt Timeand Privileged Time can indicate saturation issues as the VM reaches CPUthroughput limitations. Packet discards and errors can occur when the CPUis saturated, which forces packets to be lost before being processed by theclient or server sockets. Finally, Output Queue Length will also growduring CPU saturation as more packets are queued for processing.
Windows Powershell
PS C:\>netstat -sIPv4 Statistics Packets Received = 56183 Received Header Errors = 0 Received Address Errors = 0 Datagrams Forwarded = 0 Unknown Protocols Received = 0 Received Packets Discarded = 25 Received Packets Delivered = 56297 Output Requests = 47994 Routing Discards = 0 Discarded Output Packets = 0 Output Packet No Route = 0 Reassembly Required = 0 Reassembly Successful = 0 Reassembly Failures = 0 Datagrams Successfully Fragmented = 0 Datagrams Failing Fragmentation = 0 Fragments Created = 0
The netstat command outputs network statistics containing values fordiscarded packets by protocol. Discarded packets might be the result ofresource exhaustion by the application or network interface.
If you are seeing resource exhaustion, you can try spreading your workloadacross more instances, upgrading the VM to one with more resources, tuningthe OS or application for specific performance needs, entering the errormessage into a search engine to look for possible solutions, or ask for helpusing one of the methods described inIf you're still having issues.
If resource exhaustion doesn't seem to be the problem, the issue might be withthe server software itself. For guidance on troubleshooting server softwareissues, proceed toCheck server logging for information about serverbehavior.
Check server logging for information about server behavior
If the preceding steps don't reveal an issue, the timeouts might be caused byapplication behavior such as processing stalls caused by vCPU exhaustion. Checkthe server and applications logs for indications of the behavior you areexperiencing.
As an example, a server experiencing increased latency due to an upstreamsystem, such as a database under load, might queue an excessive amount ofrequests which can cause increased memory usage and CPU wait times. Thesefactors might result in failed connections or socket buffer overrun.
TCP connections occasionally lose a packet, but selectiveacknowledgement and packet retransmission usually recovers lost packets,avoiding connection timeout. Instead, consider that timeouts might have been theresult of the application server failing or being redeployed, causing amomentary failure for connections.
If your server application relies on a connection to a database or otherservice, confirm that coupled services are not performing poorly. Yourapplication might track these metrics.
If you're still having issues
If you're still having issues, seeGettingsupport for next steps. It's useful to have theoutput from the troubleshooting steps available to share with othercollaborators.
What's next
- If you are still having trouble, see theResources page.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.