Auto-repair nodes Stay organized with collections Save and categorize content based on your preferences.
This page explains how node auto-repair works andhow to use the feature for Standard Google Kubernetes Engine (GKE) clusters.
Node auto-repair helps keep the nodes in your GKE cluster in ahealthy, running state. When enabled, GKE makes periodic checkson the health state of each node in your cluster. If a node fails consecutivehealth checks over an extended time period, GKE initiates arepair process for that node.
Settings for Autopilot and Standard
Autopilot clusters always automatically repair nodes. You can't disablethis setting.
In Standard clusters, node auto-repair is enabled by default for newnode pools. You candisable auto repair for an existing node pool,however we recommend keeping the default configuration.
Repair criteria
GKE uses the node's health status to determine if a nodeneeds to be repaired. A node reporting aReady status is considered healthy.GKE triggers a repair action if a node reports consecutiveunhealthy status reports for a given time threshold.An unhealthy status can mean:
- A node reports a
NotReadystatus on consecutive checks over the given timethreshold (approximately 10 minutes). - A node does not report any status at all over the given time threshold(approximately 10 minutes).
- A node's boot disk is out of disk space for an extended time period(approximately 30 minutes).
- A node in an Autopilot cluster is cordoned for longer than the given time threshold(approximately 10 minutes).
You can manually check your node's health signals at any time by using thekubectl get nodes command.
Node repair process
If GKE detects that a node requires repair, the node is drainedand re-created. This process preserves the original name of the node.GKE waits one hour for the drain to complete. If the draindoesn't complete, the node is shut down and a new node is created.
If multiple nodes require repair, GKE might repair nodes inparallel. GKE balances the number of repairs depending on thesize of the cluster and the number of broken nodes. GKE willrepair more nodes in parallel on a larger cluster, but fewer nodes as the numberof unhealthy nodes grows.
If you disable node auto-repair at any time during the repair process, in-progress repairs arenot canceled and continue for any node under repair.
Note: Modifications on the boot disk of a node VM don't persist across nodere-creations. To preserve modifications across node re-creation, use aDaemonSet.Note: Node auto-repair uses a set of signals, including signalsfrom theNode Problem Detector.The Node Problem Detector is enabled by default on nodes that useContainer-Optimized OSand Ubuntu images.Node auto repair in TPU slice nodes
If a TPU slice node in amulti-host TPU slice nodepool is unhealthy and requiresauto repair, theentire node pool is recreated. To learn more about the TPUslice node conditions, seeTPU slice node autorepair.
Enable auto-repair for an existing Standard node pool
You enable node auto-repair on aper-node pool basis.
If auto-repair is disabled on an existing node pool in a Standardcluster, use the following instructions to enable it:
Console
Go to theGoogle Kubernetes Engine page in the Google Cloud console.
In the cluster list, click the name of the cluster you want to modify.
Click theNodes tab.
UnderNode Pools, click the name of the node pool you want to modify.
On theNode pool details page, clickeditEdit.
UnderManagement, select theEnable auto-repair checkbox.
ClickSave.
gcloud
gcloudcontainernode-poolsupdatePOOL_NAME\--clusterCLUSTER_NAME\--location=CONTROL_PLANE_LOCATION\--enable-autorepairReplace the following:
POOL_NAME: the name of your node pool.CLUSTER_NAME: the name of your Standard cluster.CONTROL_PLANE_LOCATION: the Compute Enginelocation of the control plane of yourcluster. Provide a region for regional clusters, or a zone for zonal clusters.
Verify node auto-repair is enabled for a Standard node pool
Node auto-repair is enabled on aper-node pool basis. You can verify that anode pool in your cluster has node auto-repair enabled with the Google Cloud CLIor the Google Cloud console.
Console
Go to theGoogle Kubernetes Engine page in the Google Cloud console.
On theGoogle Kubernetes Engine page, click the name of the cluster ofthe node pool you want to inspect.
Click theNodes tab.
UnderNode Pools, click the name of the node pool you want to inspect.
UnderManagement, in theAuto-repair field, verify thatauto-repair is enabled.
gcloud
Describe the node pool:
gcloudcontainernode-poolsdescribeNODE_POOL_NAME\--cluster=CLUSTER_NAMEIf node auto-repair is enabled, the output of the command includes theselines:
management: ... autoRepair: trueDisable node auto-repair
You can disable node auto-repair for an existing node pool in a Standardcluster by using the gcloud CLI or the Google Cloud console.
Note: You can only disable auto-repair with the gcloud CLI for a nodepool in a Standard cluster enrolled in a release channel.Console
Go to theGoogle Kubernetes Engine page in the Google Cloud console.
In the cluster list, click the name of the cluster you want to modify.
Click theNodes tab.
UnderNode Pools, click the name of the node pool you want to modify.
On theNode pool details page, clickeditEdit.
UnderManagement, clear theEnable auto-repair checkbox.
ClickSave.
gcloud
gcloudcontainernode-poolsupdatePOOL_NAME\--clusterCLUSTER_NAME\--location=CONTROL_PLANE_LOCATION\--no-enable-autorepairReplace the following:
POOL_NAME: the name of your node pool.CLUSTER_NAME: the name of your Standard cluster.CONTROL_PLANE_LOCATION: the Compute Enginelocation of the control plane of yourcluster. Provide a region for regional clusters, or a zone for zonal clusters.
Get information about recent automated repair events
GKE generates a log entry for automated repair events. You cancheck the logs by running the following commands:
List the operations:
gcloudcontaineroperationslist--location=CONTROL_PLANE_LOCATIONReplace
CONTROL_PLANE_LOCATIONwith the Compute Enginelocation of the control plane of yourcluster. Provide a region for regional clusters, or a zone for zonal clusters.Find the reason why the node auto-repair operation was triggered by runningthe following command:
gcloudcontaineroperationsdescribeOPERATION_NAME--location=CONTROL_PLANE_LOCATIONReplace
OPERATION_NAMEwith the name of an operationlisted in the output from the previous command.
In the output from the command, check theoperationReason for the reason whythe repair operation was triggered. For example,AUTO_REPAIR_LONG_UNHEALTHYmeans that the node auto-repair was triggered because the node was unhealthy for10 minutes.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.