This document lists troubleshooting documents for common issues that you mightencounter when using Google Kubernetes Engine (GKE). Whether you're diagnosingworkload errors likeImagePullBackOff andCrashLoopBackOff, debuggingcluster autoscaling behavior, resolving PersistentVolume issues, ortroubleshooting node registration problems, the documents listed here can help.
This document is for Admins and architects, Security specialists,Networking specialists, or Storage specialists who troubleshootGKE configurations. To learn more about GKE roles,seeCommon GKE user roles and tasks.
Troubleshoot thekubectl command-line tool in GKE, including issues with authentication, authorization. This page also includes advice on how totroubleshoot the Konnectivity proxy to check if it's causing thekubectl logs,attach,exec, orport-forward commands to stop responding.
Troubleshoot GKE Standard node pools, including issues with node pool creation, best-effort provisioning, corrupted instance metadata, and migrating workloads to new node pools.
Learn how to diagnose and resolve the nodeNotReady status in GKE by troubleshooting common causes such as resource shortages, network issues, and component failures.
Troubleshoot issues that occur when adding nodes to your GKE Standard cluster, such as node registration failures and missing prerequisites for successful node registration.
Diagnose and resolve common reasons your cluster isn't removing underutilized nodes. Learn how to check for issues like restrictivePodDisruptionBudgets, Pods with local storage, or specific annotations (for example,"cluster-autoscaler.kubernetes.io/safe-to-evict": "false") that prevent node eviction.
Learn why the cluster autoscaler isn't adding new nodes to meet demand. Check for unschedulable Pods, verify that you haven't hit cluster or node pool size limits, and identify potential resource quota or regional VM availability issues.
Troubleshoot problems with the Horizontal Pod Autoscaler not scaling your application's Pod replicas. Resolve common issues, such as misconfigured HorizontalPodAutoscaler objects or problems with the metrics pipeline.
If your cluster's root Certificate Authority (CA) is expiring soon, learn how to perform acredential rotation to prevent normal cluster operations from being interrupted.
Troubleshoot image pulls. Learn what causes statuses likeImagePullBackOff andErrImagePull and how to resolve these statuses by fixing common issues like authentication and network connectivity.
Troubleshoot Kubernetes Out of Memory (OOM) events. Identify causes, distinguish event types, and apply effective solutions for both container- and node-level OOM kills.
Troubleshoot and resolve GKE cluster and node upgrade issues, including long or incomplete upgrades, unexpected auto-upgrades, failures, and post-upgrade problems.
Troubleshoot some of the 400, 401, 403, and 404 errors that you might encounter when using GKE. This page also includes information on how to troubleshootmissing edit permissions on account errors.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-02-18 UTC."],[],[]]