Troubleshoot network isolation in GKE Stay organized with collections Save and categorize content based on your preferences.
Incorrectnetworkisolation configurations inGoogle Kubernetes Engine (GKE) can cause problems such as cluster creation timeouts,nodes failing to register, control plane unreachability, or inability to pullimages.
Use this document for guidance on troubleshooting problems such as control planeaccess, CIDR range overlaps, image pull errors from public repositories, andissues related to VPC Network Peering or Private Service Connect.
This information is important for Platform admins and operators andNetwork administrators who configure and manage network-isolatedGKE clusters to meet security and compliance requirements. Formore information about the common roles and example tasks that we reference inGoogle Cloud content, seeCommon GKE user roles andtasks.
GKE cluster not running
Deleting the firewall rules that allow ingress traffic from the clustercontrol plane to nodes on port 10250, or deleting the default route to thedefault internet gateway, causes a cluster to stop functioning. If youdelete the default route, you must ensure traffic to necessaryGoogle Cloud services is routed. For more information, seecustom routing.
Timeout when creating a cluster
- Symptoms
- Clusters created in version 1.28 or earlier with private nodes require apeering route between VPCs. However, only one peering operationcan happen at a time. If you attempt to create multiple clusters with thepreceding characteristics at the same time, cluster creation may time out.
- Resolution
Use one of the following solutions:
Create clusters in version 1.28 or earlier serially so that theVPC peering routes already exist for each subsequent clusterwithout an external endpoint. Attempting to create a single cluster may alsotime out if there are operations running on your VPC.
Create clusters in version 1.29 or later.
VPC Network Peering connection is accidentally deleted
- Symptoms
When you accidentallydelete a VPC Network Peeringconnection, the cluster goes in arepair state and all nodes show an
UNKNOWNstatus. You won't be able toperform any operation on the cluster since reachability to the control planeis disconnected. When you inspect the control plane, logs will display an errorsimilar to the following:error checking if node NODE_NAME is shutdown: unimplemented- Potential causes
You accidentally deleted the VPC Network Peering connection.
Resolution
- Create a new GKE cluster with a version that predates the PSC switch and its specific configurations. This action is necessary to force the re-creation of the VPC peering connection, which will restore the old cluster to its normal operation.
- Use the following specific configurations for the new cluster:
- Release channel: extended
- Cluster version: a version that's earlier than 1.29, such as 1.28.15-gke.2403000
- Master IPv4 CIDR: a specific IP address range, such as
--master-ipv4-cidr=172.16.0.192/28
- Use the following specific configurations for the new cluster:
- Monitor the original cluster status.
- After the new cluster is created (and thus the VPC peering is re-established), the original cluster should recover from its repair state, and its nodes should return to a
Readystatus.
- After the new cluster is created (and thus the VPC peering is re-established), the original cluster should recover from its repair state, and its nodes should return to a
- Delete the temporarily created GKE cluster.
- After the original cluster is fully restored and operates normally, you can delete the temporarily created GKE cluster.
Private Service Connect endpoint and forwarding rule are accidentally deleted
- Symptoms
When you accidentally delete a Private Service Connect endpoint or forwarding rule, the cluster goes into arepair state and all nodes show an
UNKNOWNstatus. You won't be able toperform any operation on the cluster since access to the control planeis disconnected. When you inspect the control plane, logs will display an errorsimilar to the following:error checking if node NODE_NAME is shutdown: unimplemented- Potential causes
You accidentally deleted the Private Service Connect endpoint orforwarding rule. Both resources are named
gke-[cluster-name]-[cluster-hash:8]-[uuid:8]-peand permit the control planeand nodes to privately connect.- Resolution
Cluster overlaps with active peer
- Symptoms
Attempting to create a cluster without an external endpoint returns anerror similar to the following:
Google Compute Engine: An IP range in the peer network overlaps with an IPrange in an active peer of the local network.- Potential causes
You chose an overlapping control plane CIDR.
- Resolution
Use one of the following solutions:
- Delete and recreate the cluster using a different control plane CIDR.
- Recreate the cluster in version 1.29 and include the
--enable-private-nodesflag.
Can't reach control plane of a cluster with no external endpoint
Increase the likelihood that your cluster control plane is reachable byimplementing any of the cluster endpoint access configuration. For moreinformation, seeaccess to cluster endpoints.
- Symptoms
After creating a cluster with no external endpoint, attempting to run
kubectlcommands against the cluster returns an error similar to one of the following:Unable to connect to the server: dial tcp [IP_ADDRESS]: connect: connectiontimed out.Unable to connect to the server: dial tcp [IP_ADDRESS]: i/o timeout.- Potential causes
kubectlis unable to talk to the cluster control plane.- Resolution
Use one of the following solutions:
Enable DNS access for a simplified way of securely accessing your cluster. For more information, seeDNS-based endpoint.
Verify credentials for the cluster has been generated for kubeconfig or thecorrect context is activated. For more information on setting the clustercredentials seegenerate kubeconfig entry.
Verify that accessing the control plane using its external IP address ispermitted. Disabling external access to the cluster control plane isolates thecluster from the internet. With this configuration, only authorized internal network CIDRranges or reserved network have access to the control plane.
Verify the origin IP address is authorized to reach the control plane:
gcloudcontainerclustersdescribeCLUSTER_NAME\--format="value(controlPlaneEndpointsConfig.ipEndpointsConfig.authorizedNetworksConfig)"\--location=COMPUTE_LOCATIONReplace the following:
CLUSTER_NAME: the name of your cluster.COMPUTE_LOCATION: theCompute Engine locationfor the cluster.
If your origin IP address is not authorized, the output may return anempty result (only curly braces) or CIDR ranges which does not includeyour origin IP address
cidrBlocks: cidrBlock: 10.XXX.X.XX/32 displayName: jumphost cidrBlock: 35.XXX.XXX.XX/32 displayName: cloud shellenabled: trueAdd authorized networksto access control plane.
If you run the
kubectlcommand from an on-premises environment or a regiondifferent from the cluster's location, ensure that control plane privateendpoint global access is enabled. For more information,seeAccess using the control plane's internal IP address from any region.Describe the cluster to see control access config response:
gcloudcontainerclustersdescribeCLUSTER_NAME\--location=COMPUTE_LOCATION\--flatten"controlPlaneEndpointsConfig.ipEndpointsConfig.globalAccess"Replace the following:
CLUSTER_NAME: the name of your cluster.COMPUTE_LOCATION: theCompute Engine locationfor the cluster.
A successful output is similar to the following:
enabled: trueIf
nullis returned,enable access using the control plane's internal IP address from any region.
Can't create cluster due to overlapping IPv4 CIDR block
- Symptoms
gcloud container clusters createreturns an error similar to the following:The given master_ipv4_cidr 10.128.0.0/28 overlaps with an existing network10.128.0.0/20.- Potential causes
You specified a control plane CIDR block that overlaps with an existing subnetin your VPC.
- Resolution
Specify a CIDR block for
--master-ipv4-cidrthat does not overlap with anexisting subnet.
Can't create cluster due to services range already in use by another cluster
- Symptoms
Attempting to create a cluster returns an error similar to thefollowing:
Services range [ALIAS_IP_RANGE] in network [VPC_NETWORK], subnetwork[SUBNET_NAME] is already used by another cluster.- Potential causes
The following configurations might cause this error:
- You chose a service range which is still in use by another cluster, or thecluster was not deleted.
- There was a cluster using that services range which was deleted but thesecondary ranges metadata was not properly cleaned up. Secondary ranges fora GKE cluster are saved in the Compute Engine metadata andshould be removed once the cluster is deleted. Even when a clusters issuccessfully deleted, the metadata might not be removed.
- Resolution
Follow these steps:
- Check if the services range is in use by an existing cluster. You can use the
gcloud container clusters listcommand with thefilterflag to search for the cluster. If there is anexisting cluster using the services ranges, you must delete that cluster orcreate a new services range. - If the services range is not in use by an existing cluster, thenmanuallyremove the metadata entrythat matches the services range you want to use.
- Check if the services range is in use by an existing cluster. You can use the
Can't create a subnet
- Symptoms
When you attempt to create a cluster with an automatic subnet, or tocreate a custom subnet, you might encounter the any of the following errors:
An IP range in the peer network overlapswith an IP range in one of the active peers of the local network.Error: Error waiting for creating GKE cluster: Invalid value for fieldPrivateClusterConfig.MasterIpv4CidrBlock: x.x.x.x/28 conflicts with anexisting subnet in one of the peered VPCs.- Potential causes
The control plane CIDR range you specified overlaps with another IP range inthe cluster. This subnet creation error can also occur if you're attempting toreuse the
master-ipv4-cidrCIDRs used in a recently deleted cluster.- Resolution
Try using a different CIDR range.
Can't pull image from public Docker Hub
- Symptoms
A Pod running in your cluster displays a warning in
kubectl describe:Failed to pull image: rpc error: code = Unknown desc = Error responsefrom daemon: Get https://registry-1.docker.io/v2/: net/http: request canceledwhile waiting for connection (Client.Timeout exceeded while awaitingheaders)- Potential causes
Nodes with private IP addresses only need additionalconfiguration to meet theinternet access requirements.However, the nodes can access Google Cloud APIs and services, includingArtifact Registry, if you haveenabled Private Google Access and met its network requirements.
- Resolution
Use one of the following solutions:
Copy the images in your cluster from Docker Hub toArtifact Registry. SeeMigrating containers from a third-party registryfor more information.
GKE automatically checks
mirror.gcr.iofor cached copies offrequently-accessed Docker Hub images.If you must pull images from Docker Hub or another public repository,useCloud NAT or an instance-based proxy that isthe target for a static
0.0.0.0/0route.
API request that triggers admission webhook timing out
- Symptoms
An API request that triggers an admission webhook configured to use a servicewith a
targetPortother than 443 times out, causing the request to fail:Error from server (Timeout): request did not complete within requested timeout 30s- Potential causes
By default, thefirewall does not allow TCP connectionsto nodes except on ports 443 (HTTPS) and 10250 (kubelet). An admission webhookattempting to communicate with a Pod on a port other than 443 will fail ifthere is not a custom firewall rule that permits the traffic.
- Resolution
Add a firewall rulefor your specific use case.
Can't create cluster due to health check failing
- Symptoms
After creating a Standard cluster with private node pools, it getsstuck at the health check step and reports an error similar to one of thefollowing:
All cluster resources were brought up, but only 0 of 2 have registered.All cluster resources were brought up, but: 3 nodes out of 4 are unhealthy- Potential causes
The following configurations might cause this error:
- Cluster nodes can't download required binaries from the Cloud Storage API(
storage.googleapis.com). - Firewall rules restricting egress traffic.
- Shared VPC IAM permissions are incorrect.
- Private Google Access requires you to configure DNS for
*.gcr.io.
- Cluster nodes can't download required binaries from the Cloud Storage API(
- Resolution
Use one of the following solutions:
EnablePrivate Google Access onthe subnet for node network access to
storage.googleapis.com, or enableCloud NAT to allow nodes to communicate withstorage.googleapis.comendpoints.For node read access to
storage.googleapis.com, confirm that the serviceaccount assigned to the cluster node hasstorage read access.Ensure that you have either aGoogle Cloud firewall rule to allow all egress traffic orconfigure a firewall rule to allow egresstraffic for nodes to the cluster control plane and
*.googleapis.com.Create theDNS configurationfor
*.gcr.io.If you have a non-default firewall or route setup, configurePrivate Google Access.
If you use VPC Service Controls, set upContainer Registry or Artifact Registry for GKE clusters.
Ensure you have not deleted or modified theautomatically created firewall rulesfor Ingress.
If using Shared VPC, ensure you have configured the requiredIAM permissions.
kubelet Failed to create pod sandbox
- Symptoms
After creating a cluster with private nodes, it reports an errorsimilar to one of the following:
Warning FailedCreatePodSandBox 12s (x9 over 4m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = Error response from daemon: Get https://registry.k8s.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized- Potential causes
The
calico-nodeornetdPod can't reach*.gcr.io.- Resolution
Ensure you have completed the requiredsetup for Container Registry or Artifact Registry.
Private nodes created but not joining the cluster
For clusters using nodes with private IP addresses only, often when using customrouting and third-party network appliances on theVPC, the default route(0.0.0.0/0) is redirected to the appliance instead of the default internetgateway. In addition to the control plane connectivity, you need to ensure thatthe following destinations are reachable:
- *.googleapis.com
- *.gcr.io
- gcr.io
ConfigurePrivate Google Accessfor all three domains. This best practice allows the new nodes to startup andjoin the cluster while keeping the internet bound traffic restricted.
Workloads on GKE clusters unable to access internet
Pods running in nodes with private IP addresses can't access the internet. Forexample, after running theapt update command from the Podexec shell, itreports an error similar to the following:
0% [Connecting to deb.debian.org (199.232.98.132)] [Connecting to security.debian.org (151.101.130.132)]If subnet secondary IP address range used for Pods in the cluster is notconfigured on Cloud NAT gateway, the Pods can't connect to theinternet as they don't have an external IP address configured forCloud NAT gateway.
Ensure you configure the Cloud NAT gateway to apply at least the following subnet IP address ranges forthe subnet that your cluster uses:
- Subnet primary IP address range (used by nodes)
- Subnet secondary IP address range used for Pods in the cluster
- Subnet secondary IP address range used for Services in the cluster
To learn more, seehow to add secondary subnet IP range used for Pods.
Direct IP access can't be disabled for public clusters
- Symptoms
After disabling the IP address endpoint, you see an error message similar to the following:
Direct IP access can't be disabled for public clusters- Potential causes
Your cluster useslegacy network.
- Resolution
Migrate your clusters to Private Service Connect. For more information about the status of the migration,contact support.
Direct IP access can't be disabled for clusters in the middle of PSC migration
- Symptoms
After disabling the IP address endpoint, you see an error message similar to the following:
Direct IP access can't be disabled for public clusters- Potential causes
Your cluster useslegacy network.
- Resolution
Use one of the following solutions:
- Manually recreate all node pools in a different version.
- Wait until GKE automatically upgrades the node pools during a maintenance event.
Control plane internal endpoint can't be enabled
- Symptoms
When attempting to enable the internal endpoint of your cluster's control plane, you seeerror messages similar to the following:
private_endpoint_enforcement_enabled can't be enabled when envoy is disabledprivate_endpoint_enforcement_enabled is unsupported. Please upgrade to the minimum support version- Potential causes
Your cluster needs to do IP address rotation or a version update.
- Resolution
Use one of the following solutions:
- Rotate your control plane IP address to enable Envoy.
- Upgrade your cluster to 1.28.10-gke.1058000 version or later.
Cluster creation fails when organization policies are defined
- Symptoms
When attempting to create a cluster, you see an error message similar to the following:
compute.disablePrivateServiceConnectCreationForConsumers violated for projects- Potential causes
The cluster endpoint or backend is blocked by a consumer organization policy.
- Resolution
Allow instances to create endpoints with the
compute.restrictPrivateServiceConnectProducerconstraint by completing the steps inConsumer-side organization policies.
The Private Service Connect endpoint might leak during cluster deletion
- Symptoms
After creating a cluster, you might see one of the following symptoms:
You can't see a connected endpoint underPrivate Service Connect in yourPrivate Service Connect-based cluster.
You can't delete the subnet or VPC network allocated for theinternal endpoint in a cluster that usesPrivate Service Connect. An error message similarto the following appears:
projects/<PROJECT_ID>/regions/<REGION>/subnetworks/<SUBNET_NAME> is already being used by projects/<PROJECT_ID>/regions/<REGION>/addresses/gk3-<ID>
- Potential causes
On GKE clusters that usePrivate Service Connect, GKE deploys aPrivate Service Connect endpoint by using a forwarding rulethat allocates an internal IP address to access the cluster's control plane ina control plane's network. To protect the communication between the controlplane and the nodes by using Private Service Connect,GKE keeps the endpoint invisible, and you can't see it onGoogle Cloud console or gcloud CLI.
- Resolution
To prevent leaking the Private Service Connect endpoint beforecluster deletion, complete the following steps:
- Assign the
Kubernetes Engine Service Agent roleto theGKE service account. - Ensure that the
compute.forwardingRules.*andcompute.addresses.*permissions arenot explicitly denied from GKE service account.
If you see the Private Service Connect endpoint leaked,contact support.
- Assign the
Unable to parse the cluster's authorized network
- Symptoms
You can't create a cluster in version 1.29 or later. An error message similarto the following appears:
Unable to parse cluster.master_ipv4_cidr "" into a valid IP address and mask- Potential causes
Your Google Cloud project uses private IP-based webhooks. Therefore, youare unable to create a cluster with Private Service Connect.Instead, your cluster uses VPC Network Peering which parses the
master-ipv4-cidrflag.- Resolution
Use one of the following solutions:
Continue to create your VPC Network Peering cluster and include the
master-ipv4-cidrto define valid CIDRs. This solution has the following limitations:- The
master-ipv4-cidrflag has been deprecated on the Google Cloud console.To update this flag you can only use Google Cloud CLI or Terraform. - VPC Network Peering is deprecated in GKE version 1.29 or later.
- The
Migrate your private IP-based webhooks by completing the steps inPrivate Service Connect Limitations. Then,contact supportto opt in to use clusters with Private Service Connect.
Unable to define internal IP address range in clusters with public nodes
- Symptoms
You can't define an internal IP address range by using the
--master-ipv4-cidrflag. An error message similar to the following appears:ERROR: (gcloud.container.clusters.create) Cannot specify --master-ipv4-cidr without --enable-private-nodes- Potential causes
You are defining an internal IP address range for the control plane with the
master-ipv4-cidrflag in a cluster without theenable-private-nodesflagenabled. To create a cluster withmaster-ipv4-cidrdefined, you mustconfigure your cluster to provision nodes with internal IP addresses (privatenodes) by using theenable-private-nodesflag.- Resolution
Use one of the following solutions:
Create a cluster with the following command:
gcloudcontainerclusterscreate-autoCLUSTER_NAME\--enable-private-nodes\--master-ipv4-cidrCP_IP_RANGEReplace the following:
CLUSTER_NAME: the name of your cluster.CLUSTER_NAME: the internal IP address range for the control plane.
Update your cluster to provision nodes with only IP addresses. To learn more, seeConfigure your cluster.
Unable to schedule public workloads on Autopilot clusters
- Symptoms
- On Autopilot clusters, if your cluster uses private nodes only, youcan't schedule workloads in public Pods using the
cloud.google.com/private-node=falsenodeSelector. - Potential causes
- The configuration of the
private-nodeflag set asfalsein the Pod'snodeSelector is only available in clusters in version 1.30.3 or later. - Resolution
- Upgrade your cluster to 1.30 version or later.
Access to the DNS-based endpoint is disabled
- Symptoms
Attempting to run kubectl commands against the cluster returns an error similar to the following:
couldn't get current server API group list:control_plane_endpoints_config.dns_endpoint_config.allow_external_traffic isdisabled- Potential causes
DNS-based access has been disabled on your cluster.
- Resolution
Enable access to the control plane by using the DNS-based endpoint of thecontrol plane. To learn more, seeModify the control plane access.
Nodes fail to allocate IP address during scaling
- Symptoms
Attempting to expand subnet's primary IP address range to the list of authorized networks returns an error similar to the following:
authorized networks fields cannot be mutated if direct IP access is disabled- Potential causes
You have disabled the clusterIP-based endpoint.
- Resolution
Disable and enable the clusterIP-based endpoint by using the
enable-ip-accessflag.
Too many CIDR blocks
gcloud returns the following error when attempting to create or update acluster with more than 50 CIDR blocks:
ERROR: (gcloud.container.clusters.update) argument --master-authorized-networks: too many argsTo resolve this issue, try the following:
- If your cluster doesn't usePrivate Service Connect orVPC Network Peering, ensure that you specify nomore than 50 CIDR blocks.
- If your cluster usesPrivate Service Connect orVPC Network Peering, specify no more than 100CIDR blocks.
Unable to connect to the server
kubectl commands time out due to incorrectly configured CIDR blocks:
Unable to connect to the server: dial tcp MASTER_IP: getsockopt: connection timed outWhen you create or update a cluster,ensure that you specify the correct CIDRblocks.
Nodes can access public container images despite network isolation
- Symptoms
You might observe that in a GKE cluster configured for networkisolation, pulling a common public image like
redisworks, but pulling aless common or private image fails.This behavior is expected due to GKE's default configurationand doesn't indicate that GKE has bypassed your networkisolation.
- Potential causes
This behavior occurs because of two features working together:
- Private Google Access: this feature lets nodes with internal IPaddresses connect to Google Cloud APIs and services withoutneeding public IP addresses. Private Google Access is activated on thecluster's subnet within the VPC that's used by the nodes in the cluster.When a cluster or node pool is created or updated with the
--enable-private-nodesflag, GKE automatically enablesPrivate Google Access on this subnet. The only exception is if youuse a Shared VPC, where you must manually enablePrivate Google Access. - Google's image mirror (
mirror.gcr.io): by default,GKE configures its nodes to first try pulling images frommirror.gcr.io, a Google-managed Artifact Registry that caches frequentlyrequested public container images.
When you try to pull an image like
redis, your node uses the private pathfrom Private Google Access to connect tomirror.gcr.io. Becauseredisis a very common image, it exists in the cache, and the pull succeeds.However, if you request an image that isn't in this public cache, the pullfails because your isolated node has no other way to reach its originalsource.- Private Google Access: this feature lets nodes with internal IPaddresses connect to Google Cloud APIs and services withoutneeding public IP addresses. Private Google Access is activated on thecluster's subnet within the VPC that's used by the nodes in the cluster.When a cluster or node pool is created or updated with the
- Resolution
If an image that you need isn't available in the
mirror.gcr.iocache,host it in your own private Artifact Registry repository. Yournetwork-isolated nodes can access this repository usingPrivate Google Access.
What's next
If you can't find a solution to your problem in the documentation, seeGet support for further help,including advice on the following topics:
- Opening a support case by contactingCloud Customer Care.
- Getting support from the community byasking questions on StackOverflow and using the
google-kubernetes-enginetag to search for similarissues. You can also join the#kubernetes-engineSlack channel for more community support. - Opening bugs or feature requests by using thepublic issue tracker.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.