Troubleshoot cluster upgrades

Autopilot Standard

If your Google Kubernetes Engine (GKE) control plane or node pool upgrade fails,gets stuck, or causes unexpected workload behavior, you might need totroubleshoot the process. Keeping your control plane and node pools up-to-dateis essential for security and performance, and resolving any issues helps ensurethat your environment remains stable.

To resolve common upgrade issues, a good first step is tomonitor the cluster upgrade process. You can thenfind advice on resolving your issue:

This information is important for Platform admins and operators who want todiagnose the root causes of stuck or failed upgrades, manage maintenancepolicies, and resolve version incompatibilities. Application developers canfind guidance on resolving post-upgrade workload issues and understand howworkload configurations, such asPodDisruptionBudgets, can affect upgradeduration. For more information about the common roles and example tasks thatwe reference in Google Cloud content, seeCommon GKE user roles and tasks.

Monitor the cluster upgrade process

To resolve upgrade issues more effectively, start by understanding whathappened during the upgrade process. GKE provides several toolsthat give you visibility into this process.

In the Google Cloud console, the upgrade dashboard offers a project-wide view ofall ongoing cluster upgrades, a timeline of recent events, and warnings aboutpotential blockers like active maintenance exclusions or upcoming versiondeprecations. For command-line or automated checks, you can use thegcloudcontainer operations list command to get the status of specific upgradeoperations. For more information, seeGet visibility into clusterupgrades.

For a more detailed investigation, Cloud Logging is your primarysource of information. GKE records detailed information aboutcontrol plane and node pool upgrade processes within Cloud Logging. Thisincludes high-level audit logs that track the main upgrade operations, aswell as more granular logs such as Kubernetes Events and logs from nodecomponents, which can show you more information about specific errors.

The following sections explain how to query these logs by usingeither Logs Explorer or the gcloud CLI. For more information, seeCheck upgrade logs.

Identify the upgrade operation with audit logs

If you don't know which upgrade operation failed, you can use GKEaudit logs. Audit logs track administrative actions and provide anauthoritative record of when an upgrade was initiated and its final status. Usethe following queries in the Logs Explorer to find the relevant operation.

Event type	Log query
Control plane auto-upgrade	resource.type="gke_cluster"protoPayload.methodName="google.container.internal.ClusterManagerInternal.UpdateClusterInternal"log_id("cloudaudit.googleapis.com/activity")protoPayload.metadata.operationType="UPGRADE_MASTER"resource.labels.cluster_name="`CLUSTER_NAME`" Replace`CLUSTER_NAME` with the name of the cluster that you want to investigate. This query shows the target control plane version and the previous control plane version.
Control plane manual upgrade	resource.type="gke_cluster"log_id("cloudaudit.googleapis.com/activity")protoPayload.response.operationType="UPGRADE_MASTER"resource.labels.cluster_name="`CLUSTER_NAME`"
Node pool auto-upgrade (target version only)	resource.type="gke_nodepool"protoPayload.methodName="google.container.internal.ClusterManagerInternal.UpdateClusterInternal"log_id("cloudaudit.googleapis.com/activity")protoPayload.metadata.operationType="UPGRADE_NODES"resource.labels.cluster_name="`CLUSTER_NAME`"resource.labels.nodepool_name="`NODEPOOL_NAME`" Replace`NODEPOOL_NAME` with the name of the node pool that belongs to the cluster.
Node pool manual upgrade	resource.type="gke_nodepool"protoPayload.methodName="google.container.v1.ClusterManager.UpdateNodePool"log_id("cloudaudit.googleapis.com/activity")protoPayload.response.operationType="UPGRADE_NODES"resource.labels.cluster_name="`CLUSTER_NAME`"resource.labels.nodepool_name="`NODEPOOL_NAME`" To find the previous node pool version, check the Kubernetes API logs: resource.type="k8s_cluster"resource.labels.cluster_name="`CLUSTER_NAME`"protoPayload.methodName="nodes.patch"

Find detailed error messages in GKE logs

After the audit log shows you which operation failed and when, you can searchfor more detailed error messages from GKE components around thesame time. These logs can contain the specific reasons for an upgrade failure,such as a misconfiguredPodDisruptionBudget object.

For example, after finding a failedUPGRADE_NODES operation in the audit logs,you can use its timestamp to narrow your search. In Logs Explorer, enter thefollowing query and then use the time-range selector to focus on the time whenthe failure occurred:

resource.type="k8s_node"resource.labels.cluster_name="CLUSTER_NAME"resource.labels.node_name="NODE_NAME"severity=ERROR

Replace the following:

CLUSTER_NAME: the name of your cluster.
NODE_NAME: the name of the node within the clusterthat you want to check for errors.

Use the gcloud CLI to view upgrade events

In addition to Logs Explorer, you can use gcloud CLIcommands to review upgrade events.

To look for control plane upgrades, run the following command:

gcloudcontaineroperationslist--filter="TYPE=UPGRADE_MASTER"

The output is similar to the following:

NAME: operation-1748588803271-cfd407a2-bfe7-4b9d-8686-9f1ff33a2a96TYPE: UPGRADE_MASTERLOCATION:LOCATIONTARGET:CLUSTER_NAMESTATUS_MESSAGE:STATUS: DONESTART_TIME: 2025-05-30T07:06:43.271089972ZEND_TIME: 2025-05-30T07:18:02.639579287Z

This output includes the following values:

LOCATION: the Compute Engine region or zone (forexample,us-central1 orus-central1-a ) for the cluster.
CLUSTER_NAME: the name of your cluster.

To look for node pool upgrades, run the following command:

gcloudcontaineroperationslist--filter="TYPE=UPGRADE_NODES"

The output is similar to the following:

NAME: operation-1748588803271-cfd407a2-bfe7-4b9d-8686-9f1ff33a2a96TYPE: UPGRADE_NODESLOCATION:LOCATIONTARGET:CLUSTER_NAMESTATUS_MESSAGE:STATUS: DONESTART_TIME: 2025-05-30T07:06:43.271089972ZEND_TIME: 2025-05-30T07:18:02.639579287Z

Example: Use logs to troubleshoot control plane upgrades

The following example shows you how to use logs to troubleshoot an unsuccessfulcontrol plane upgrade:

In the Google Cloud console, go to theLogs Explorer page.
Go to Logs Explorer
In the query pane, filter for control plane upgrade logs by entering thefollowing query:
```
resource.type="gke_cluster"protoPayload.metadata.operationType=~"(UPDATE_CLUSTER|UPGRADE_MASTER)"resource.labels.cluster_name="CLUSTER_NAME"
```
ReplaceCLUSTER_NAME with the name of the clusterthat you want to investigate.
ClickRun query.
Review the log output for the following information:
Confirm that the upgrade started: look for recentUPGRADE_MASTERevents around the time that you initiated the upgrade. The presence of theseevents confirms that either you or GKE triggered the upgradeprocess.
- Verify the versions: check the following fields to confirm theprevious and target versions:
  - protoPayload.metadata.previousMasterVersion: shows the controlplane versionbefore the upgrade.
  - protoPayload.metadata.currentMasterVersion: shows the version towhich GKE attempted to upgrade the control plane.
    For example, if you intended to upgrade to version1.30.1-gke.1234 but accidentally specified 1.30.2-gke.4321 (anewer, potentially incompatible version for your workloads),reviewing these two fields would highlight this discrepancy.Alternatively, if thecurrentMasterVersion field still displays theearlier version after an extended period, this finding indicatesthat the upgrade failed to apply the new version.
- Look for errors: check for repeatedUPGRADE_MASTER events or othererror messages. If the operation log stops without indicating completionor failure, this finding indicates a problem.

After you identify a specific error or behavior from the logs, you can use thatinformation to find the appropriate solution in this guide.

Troubleshoot node pool upgrades taking longer than usual

If your node pool upgrade is taking longer than expected, try the following solutions:

Check the value ofterminationGracePeriodSeconds in the manifest of yourPods. This value defines the maximum time that Kubernetes waits for a Pod toshut down gracefully. A high value (for example, a few minutes) cansignificantly extend upgrade durations because Kubernetes waits for thefull period for each Pod. If this delay is causing issues, consider reducingthe value.
Check yourPodDisruptionBudget objects. When a node is being drained,GKE waits for at most one hour per node to gracefully evictits workloads. If yourPodDisruptionBudget object is too restrictive, itcan prevent a graceful eviction from ever succeeding. In this scenario,GKE uses the entire one-hour grace period to try and drainthe node before it finally times out and forces the upgrade to proceed. Thisdelay, when repeated across multiple nodes, is a common cause of a slowoverall cluster upgrade. To confirm if a restrictivePodDisruptionBudgetobject is the cause of your slow upgrades, use Logs Explorer:
1. In the Google Cloud console, go to theLogs Explorer page.
  Go to Logs Explorer
2. In the query pane, enter the following query:
```
resource.type=("gke_cluster"OR"k8s_cluster")resource.labels.cluster_name="CLUSTER_NAME"protoPayload.response.message="Cannot evict pod as it would violate the pod's disruption budget."log_id("cloudaudit.googleapis.com/activity")
```
3. ClickRun query.
4. Review the log output. If thePodDisruptionBudget object is the causeof your issue, the output is similar to the following:
```
resourceName: "core/v1/namespaces/istio-system/pods/POD_NAME/eviction"response: {  @type: "core.k8s.io/v1.Status"  apiVersion: "v1"  code: 429  details: {  causes: [    0: {    message: "The disruption budget istio-egressgateway needs 1 healthy pods and has 1 currently"    reason: "DisruptionBudget"    }  ]  }  kind: "Status"  message: "Cannot evict pod as it would violate the pod's disruption budget."  metadata: {  }  reason: "TooManyRequests"  status: "Failure"}
```
5. After you've confirmed that aPodDisruptionBudget object is the cause,list allPodDisruptionBudget objects and make sure that the settingsare appropriate:
```
kubectlgetpdb--all-namespaces
```
  The output is similar to the following:
```
NAMESPACE        NAME          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGEexample-app-one  one_pdb       3               0                 1                     12d
```
  In this example, thePodDisruptionBudget namedone_pdb requires aminimum of three available Pods. Because evicting a Pod during theupgrade would leave only two Pods available, the action violates thebudget and causes the upgrade to stall.
  If yourPodDisruptionBudget object is working the way you want, youdon't need to take any action. If it's not, consider relaxing thePodDisruptionBudget settings during the upgrade window.
Check your node affinities. Restrictive rules can slow down upgrades bypreventing Pods from being rescheduled onto available nodes if those nodesdon't match the required labels. This issue is especially problematic duringsurge upgrades because node affinities can limit how many nodes can beupgraded simultaneously if nodes with the correct labels don't have enough cluster capacity to host the new Pods.
Check if you use theshort-lived upgrade strategy.GKE uses the short-lived upgrade strategy forflex-start nodes and for nodes that use only queued provisioning onclusters running GKE version 1.32.2-gke.1652000 orlater. If you use this upgrade strategy, the upgrade operation can take upto seven days.
Check if you useextended duration Pods(available for Autopilot clusters). During an upgrade,GKE must drain all Pods from a node before the process cancomplete. However, during a GKE-initiated upgrade,GKE doesn't evict extended duration Pods for up to sevendays.This protection prevents the node from draining. GKEforcibly terminates the Pod only after this period ends, and thissignificant, multi-day delay for a single node can delay more node upgradesin the Autopilot cluster.
Attached Persistent Volumes can cause an upgrade process to takelonger than usual, due to the time it takes to manage the lifecycle ofthese volumes.
Check thecluster auto-upgrade status.If the reason isSYSTEM_CONFIG, automatic upgrades are temporarily pausedfor technical or business reasons. If you see this reason, we recommend notperforming a manual upgrade unless it's required.

Troubleshoot incomplete node pool upgrades

Occasionally, GKE can't complete a node pool upgrade,leaving the node pool partially upgraded. There are several reasons that causeincomplete upgrades:

The upgrade was manually cancelled.
The upgrade failed due to an issue such as new nodes failing to register, IPaddress exhaustion, or insufficient resource quotas.
GKE paused the upgrade. This pause can occur, forexample, to prevent an upgrade to a version with known issues or duringcertain Google-initiated maintenance periods.
If you use auto-upgrades, a maintenance window ended before the upgradecould complete. Alternatively, a maintenance exclusion period started beforethe upgrade could complete. For more information, seeMaintenance window preventing node update completion.

When a node pool is partially upgraded, the nodes run on different versions. Toresolve this issue and verify that all nodes in the node pool run on the sameversion, eitherresume the upgradeorroll back the upgrade.

How node upgrade strategies work with maintenance windows

Thesurgeupgradesstrategy and theblue-greenupgradesstrategy interact with maintenance windows differently:

Surge upgrades: the upgrade operation is paused if it runs beyondthe maintenance window. The upgrade is automatically resumed during thenext scheduled maintenance window.
Blue-green upgrades: the upgrade operation continues until completion,even if it exceeds the maintenance window. Blue-green upgrades offergranular control over the upgrade pace with features like batch and nodepool soak times, and the additional node pool helps ensure workloads remainoperational.

For more information about how specific operations work with maintenancepolicies, see theRespects maintenance policies columns in the tables inTypes of changes to a GKEcluster.

Troubleshoot unexpected auto-upgrade behavior

Sometimes, cluster auto-upgrades don't happen the way that you mightexpect. The following sections help you to resolve the following issues:

Clusters fail to upgrade when node auto-upgrade is enabled

If you haven't disabled node auto-upgrade, but an upgrade doesn't occur, try thefollowing solutions:

If you use a release channel, verify that node auto-upgrades aren't blocked.For clusters enrolled in a release channel, yourmaintenancePolicy is theprimary way to control automated upgrades. It can prevent an upgrade fromstarting or interrupt one that is already in progress. An active maintenanceexclusion can block an upgrade completely and the timing of a maintenancewindow can cause an interruption. Review yourmaintenancePolicy todetermine if either of these settings is the cause:
```
gcloudcontainerclustersdescribeCLUSTER_NAME\--projectPROJECT_ID\--locationLOCATION
```
Replace the following:
- CLUSTER_NAME: the name of the cluster of the nodepool to describe.
- PROJECT_ID: the project ID of the cluster.
- LOCATION: the Compute Engine region or zone (forexample,us-central1 orus-central1-a ) for the cluster.
The output is similar to the following:
```
…maintenancePolicy:  maintenanceExclusions:  - exclusionName: critical-event-q4-2025    startTime: '2025-12-20T00:00:00Z'    endTime: '2026-01-05T00:00:00Z'    scope:      noUpgrades: true # This exclusion blocks all upgrades  window:    dailyMaintenanceWindow:      startTime: 03:00 # Daily window at 03:00 UTC…
```
In the output, review themaintenancePolicy section for the following twoconditions:
- To see if an upgrade is blocked: look for an activemaintenanceExclusion with aNO_MINOR_OR_NODE_UPGRADES scope. Thissetting generally prevents GKE from initiating a newupgrade.
- To see if an upgrade was interrupted: check the schedule for yourdailyMaintenanceWindow ormaintenanceExclusions. If an upgrade runsbeyond the scheduled window, GKE pauses the upgrade,resulting in a partial upgrade. For more information about partialupgrades, see theTroubleshoot incomplete upgrades section.
To resolve these issues, you can wait for an exclusion to end, remove it, oradjust your maintenance windows to allow more time for upgrades to complete.
If you don't use a release channel, verify that auto-upgrade is stillenabled for the node pool:
```
gcloudcontainernode-poolsdescribeNODE_POOL_NAME\--clusterCLUSTER_NAME\--locationLOCATION
```
ReplaceNODE_POOL_NAME with the name of the nodepool to describe.
If node pool auto-upgrades are enabled for this node pool, the output in theautoUpgrade field is the following:
```
management:  autoUpgrade: true
```
IfautoUpgrade is set tofalse, or the field isn't present,enable auto-upgrades.
The upgrade might not have rolled out to the region or zone where yourcluster is located, even if the upgrade was mentioned in the release notes.GKE upgrades are rolled out progressively over multiple days(typically four or more). After the upgrade reaches your region or zone, theupgrade only starts during approved maintenance windows. For example, arollout could reach your cluster's zone on Day One of the rollout, but thecluster's next maintenance window isn't until Day Seven. In this scenario,GKE won't upgrade the cluster until Day Seven. For moreinformation, seeGKE release schedule.

Clusters upgrade automatically when auto-upgrade is not enabled

To help maintain the reliability, availability, security, and performance ofyour GKE environment, GKE might automaticallyupgrade your clusters, even if you don't use auto-upgrades.

GKE might bypass your maintenance windows,exclusions, or disabled node pool auto-upgrades to perform necessary upgradesfor several critical reasons, such as the following:

Clusters whose control planes are running a GKE version thathas reached its end of support date. To confirm that your cluster is nearingits end of support date, seeEstimated schedule for release channels.
Nodes within a cluster that are running a GKE versionthat has reached its end of support date.
Clusters that are in a running state, but show no activity for an extendedperiod. For example, GKE might consider a cluster withno API calls, no network traffic, and no active use of subnets to beabandoned.
Clusters that exhibit persistent instability that repeatedly cycle throughoperational states. For example, states that loop from running to degraded,repairing, or suspended and back to running without a resolution.

If you observe an unexpected automatic upgrade and have concerns about theeffect that this upgrade might have on your cluster, contactCloud Customer Care for assistance.

Troubleshoot failed upgrades

When your upgrade fails, GKE produces error messages. Thefollowing sections explain the causes and resolutions for the following errors:

Error:`kube-apiserver` is unhealthy

Sometimes, you might see the following error message when you start a manualcontrol plane upgrade of your cluster's GKE version:

FAILED: All cluster resources were brought up, but: component"KubeApiserverReady" from endpoint "readyz of kube apiserver is not successful"is unhealthy.

This message appears in the gcloud CLI and in thegke_cluster andgke_nodepool resource type log entries.

This issue occurs when some user-deployed admission webhooks block systemcomponents from creating the permissive RBAC roles that are required to functioncorrectly.

During a control plane upgrade, GKE re-creates the Kubernetes APIserver (kube-apiserver) component. If a webhook blocks the RBAC role for theAPI server component, the API server won't start and the cluster upgrade won'tcomplete. Even if a webhook is working correctly, it can cause the clusterupgrade to fail because the newly created control plane might be unable to reachthe webhook.

Kubernetesauto-reconciles the default system RBAC roles with the default policies in the latest minorversion. The default policies for system roles sometimes change in newKubernetes versions.

To perform this reconciliation, GKE creates or updates theClusterRoles and ClusterRoleBindings in the cluster. If you have a webhook thatintercepts and rejects the create or update requests because of the scope ofpermissions that the default RBAC policies use, the API server can't function onthe new minor version.

To identify the failing webhook,check your GKE audit logsfor RBAC calls with the following information:

protoPayload.resourceName="RBAC_RULE"protoPayload.authenticationInfo.principalEmail="system:apiserver"

In this output,RBAC_RULE is the full name of the RBACrole, such asrbac.authorization.k8s.io/v1/clusterroles/system:controller:horizontal-pod-autoscaler.

The name of the failing webhook is displayed in the log with the followingformat:

admission webhookWEBHOOK_NAME denied the request

To resolve this issue, try the following solutions:

Review your ClusterRoles to ensure that they are not overly restrictive.Your policies shouldn't block GKE's requests to create orupdate the ClusterRoles with the defaultsystem: prefix.
Adjust your webhook to not intercept requests for creating and updatingsystem RBAC roles.
Disable the webhook.

Error: DeployPatch failed

Sometimes, the cluster upgrade operation fails with the following error:

DeployPatch failed

This error can happen if the Kubernetes control plane remains unhealthy for over20 minutes.

This error is often transient because the control plane retries the operationuntil it succeeds. If the upgrade continues to fail with this error, contactCloud Customer Care.

Troubleshoot issues after a completed upgrade

If you encounter unexpected behavior after your upgrade has completed, thefollowing sections offer troubleshooting guidance for the following commonproblems:

Unexpected behavior due to breaking changes

If the upgrade completed successfully, but you notice unexpected behavior afteran upgrade, check theGKE release notesfor information about bugs and breaking changes related to the version to which thecluster upgraded.

Workloads evicted after Standard cluster upgrade

Your workloads might be at risk of eviction after a cluster upgrade if all ofthe following conditions are true:

The system workloads require more space when the cluster's controlplane is running the new GKE version.
Your existing nodes don't have enough resources to run the new systemworkloads and your existing workloads.
Cluster autoscaler is disabled for the cluster.

To resolve this issue, try the following solutions:

Pods stuck in`Pending` state after configuring Node Allocatable

If you've configuredNode Allocatable,a node version upgrade can sometimes cause Pods that had aRunning state tobecome stuck in aPending state. This change typically occurs because theupgraded node consumes slightly different system resources, or because Pods thatwere rescheduled must now fit within Node Allocatable limits on the new ormodified nodes, potentially under stricter conditions.

If your Pods have a status ofPending after an upgrade, try the followingsolutions:

Verify that the CPU and memory requests for your Pods don't exceed theirpeak usage. With GKE reserving CPU and memory foroverhead, Pods cannot request these resources. Pods that request more CPU ormemory than they use prevent other Pods from requesting these resources, andmight leave the cluster underutilized. For more information, seeHow Pods with resource requests are scheduledin the Kubernetes documentation.
Considerincreasing the size of your cluster.
To verify if the upgrade is the cause of this issue, revert the upgrade bydowngrading your node pools.
Configure your cluster tosend Kubernetes scheduler metrics to Cloud Monitoringand viewscheduler metrics.By monitoring these metrics, you can determine if there are enough resourcesfor the Pods to run.

Troubleshoot version and compatibility issues

Maintaining supported and compatible versions for all of your cluster'scomponents is essential for stability and performance. The following sectionsprovide guidance about how to identify and resolve versioning and compatibilityissues that can affect the upgrade process.

Check for control plane and node version incompatibility

Version skew between your control plane and nodes can cause clusterinstability. TheGKE version skew policystates that a control plane is only compatible with nodes up to two minorversions earlier. For example, a 1.19 control plane works with 1.19, 1.18, and1.17 nodes.

If your nodes fall outside this supported window, you risk running into criticalcompatibility problems. These issues are often API-related; for example, aworkload on an older node might use an API version that has been deprecated orremoved in the newer control plane. This incompatibility can also lead to moresevere failures, like a broken networking path that prevents nodes fromregistering with the cluster if an incompatible workload disrupts communication.

Periodically, the GKE team performs upgrades of the clustercontrol plane on your behalf. Control planes are upgraded to newer stableversions of Kubernetes. To ensure your nodes remain compatible with the upgradedcontrol plane, they must also be kept up-to-date. By default, GKEhandles this upgrade because a cluster's nodes haveauto-upgradeenabled, and we recommend that you don't disable it. If auto-upgrade is disabledfor a cluster's nodes, and you don't manually upgrade them, your control planeeventually becomes incompatible with your nodes.

To confirm if your control plane and node versions are incompatible, check whatversion of Kubernetes your cluster's control plane and node pools are running:

gcloudcontainerclustersdescribeCLUSTER_NAME\--projectPROJECT_ID\--locationLOCATION

Replace the following:

CLUSTER_NAME: the name of the cluster of the nodepool to describe.
PROJECT_ID: the project ID of the cluster.
LOCATION: the Compute Engine region or zone (forexample,us-central1 orus-central1-a ) for the cluster.

The output is similar to the following:

…currentMasterVersion: 1.32.3-gke.1785003…currentNodeVersion: 1.26.15-gke.1090000…

In this example, the control plane version and the node pool version areincompatible.

To resolve this issue,manually upgrade the node poolversion to a version that is compatible with the control plane.

If you're concerned about the upgrade process causing disruption to workloadsrunning on the affected nodes, then complete the following steps to migrate yourworkloads to a new node pool:

Create a new node poolwith a compatible version.
Cordonthe nodes of the existing node pool.
Optional: update your workloads running on the existing node pool to adda nodeSelector for the labelcloud.google.com/gke-nodepool:NEW_NODE_POOL_NAME.ReplaceNEW_NODE_POOL_NAME with the name of the newnode pool. This actionensures that GKE places those workloads on nodes inthe new node pool.
Drainthe existing node pool.
Check that the workloads are running successfully in the new node pool.If they are, you can delete the old node pool. If you notice workloaddisruptions, reschedule the workloads on the existing nodes by uncordoningthe nodes in the existing node pool and draining the new nodes.

Node CPU usage is higher than expected

You might encounter an issue where some nodes are using more CPU than expectedfrom the running Pods.

This issue can occur if you use manual upgrades and your clusters or nodeshaven't been upgraded to run asupported version.Review thereleasenotes to ensurethe versions you use are available and supported.

What's next

If you can't find a solution to your problem in the documentation, seeGet support for further help,including advice on the following topics:
- Opening a support case by contactingCloud Customer Care.
- Getting support from the community byasking questions on StackOverflow and using thegoogle-kubernetes-engine tag to search for similarissues. You can also join the#kubernetes-engine Slack channel for more community support.
- Opening bugs or feature requests by using thepublic issue tracker.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換

Troubleshoot cluster upgrades Stay organized with collections Save and categorize content based on your preferences.

Monitor the cluster upgrade process

Identify the upgrade operation with audit logs

Find detailed error messages in GKE logs

Use the gcloud CLI to view upgrade events

Example: Use logs to troubleshoot control plane upgrades

Troubleshoot node pool upgrades taking longer than usual

Troubleshoot incomplete node pool upgrades

How node upgrade strategies work with maintenance windows

Troubleshoot unexpected auto-upgrade behavior

Clusters fail to upgrade when node auto-upgrade is enabled

Clusters upgrade automatically when auto-upgrade is not enabled

Troubleshoot failed upgrades

Error:kube-apiserver is unhealthy

Error: DeployPatch failed

Troubleshoot issues after a completed upgrade

Unexpected behavior due to breaking changes

Workloads evicted after Standard cluster upgrade

Pods stuck inPending state after configuring Node Allocatable

Troubleshoot version and compatibility issues

Check for control plane and node version incompatibility

Node CPU usage is higher than expected

What's next

Troubleshoot cluster upgrades

Error:`kube-apiserver` is unhealthy

Pods stuck in`Pending` state after configuring Node Allocatable