Deploy TPU workloads in GKE Standard Stay organized with collections Save and categorize content based on your preferences.
This page provides a foundation for learning how to accelerate machine learning(ML) workloads using TPUs in Google Kubernetes Engine (GKE). TPUs are designed formatrix multiplication processing, such as large-scale deep learning modeltraining. TPUs are optimized to handle the enormous datasets and complex modelsof ML and therefore are more cost-effective and energy efficient for MLworkloads due to their superior performance. In this guide, you learn how todeploy ML workloads by using Cloud TPU accelerators, configure quotas forTPUs, configure upgrades for node pools that run TPUs, and monitor TPU workloadmetrics.
This tutorial is intended for Machine learning (ML) engineers andPlatform admins and operators who are interested in using Kubernetes containerorchestration to manage large-scale model training, tuning, and inferenceworkloads using TPUs. To learn more about common roles and example tasksreferenced in Google Cloud content, seeCommon GKE user roles and tasks.
Before reading this page, ensure that you're familiar with the following:
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,install and theninitialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.Note: For existing gcloud CLI installations, make sure to set thecompute/regionproperty. If you use primarily zonal clusters, set thecompute/zoneinstead. By setting a default location, you can avoid errors in the gcloud CLI like the following:One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.
Plan your TPU configuration
Plan your TPU configuration based on your model and how muchmemory it requires. Before you use this guide to deploy your workloads on TPU,complete the planning steps inPlan your TPU configuration.
Ensure that you have TPU quota
The following sections help you ensure that you have enough quota when using TPUs in GKE.Quota for on-demand or Spot VMs
If you are creating aTPU slice node pool with on-demand or Spot VMs, you musthave sufficient TPU quota available in the region that you want to use.
Creating a TPU slice node pool that consumes a TPU reservation doesnotrequire any TPU quota.1You may safely skip this section for reserved TPUs.
Creating an on-demand or Spot TPU slice node pool in GKErequires Compute Engine API quota. Compute Engine API quota (compute.googleapis.com)is not the same as Cloud TPU API quota (tpu.googleapis.com), which is neededwhen creating TPUs with the Cloud TPU API.
To check the limit and current usage of your Compute Engine API quota for TPUs,follow these steps:
Go to theQuotas page in the Google Cloud console:
In theFilter box, dothe following:
Use the following table to select and copy the property of the quota based on theTPU version and machine type. For example, if you plan to create on-demandTPU v5e nodes whosemachine type begins with
ct5lp-,enterName: TPU v5 Lite PodSlice chips.TPU version, machine type begins with Property and name of the quota for on-demand instances Property and name of the quota for Spot2 instances TPU v3, ct3-Dimensions (e.g. location):
tpu_family:CT3Not applicable TPU v3, ct3p-Dimensions (e.g. location):
tpu_family:CT3PNot applicable TPU v4, ct4p-Name:
TPU v4 PodSlice chipsName:
Preemptible TPU v4 PodSlice chipsTPU v5e, ct5lp-Name:
TPU v5 Lite PodSlice chipsName:
Preemptible TPU v5 Lite Podslice
chipsTPU v5p, ct5p-Name:
TPU v5p chipsName:
Preemptible TPU v5p chipsTPU Trillium, ct6e-Dimensions (e.g. location):
tpu_family:CT6EName:
Preemptible TPU slices v6eIronwood (TPU7x) (Preview), tpu7x-standard-4tDimensions (e.g. location):
tpu_family:tpu7xName:
Preemptible TPU slices tpu7xSelect theDimensions (e.g. locations) property and enter
region:followed by the name of the region in which you plan to create TPUs inGKE. For example, enterregion:us-west4if you plan tocreate TPU slice nodes in the zoneus-west4-a. TPU quota is regional, so allzones within the same region consume the same TPU quota.
If no quotas match the filter you entered, then the project has not beengranted any of the specified quota for the region that you need, and you mustrequest a TPU quota adjustment.
When a TPU reservation is created, both the limit and current use values forthe corresponding quota increase by the number of chips in the TPUreservation. For example, when a reservation is created for 16 TPU v5e chipswhosemachine type begins withct5lp-,then both theLimit andCurrent usage for theTPU v5 Lite PodSlice chips quota in the relevantregion increase by 16.
When creating a TPU slice node pool, use the
--reservationand--reservation-affinity=specificflags to create a reserved instance. TPU reservations are available when purchasing a commitment.↩When creating a TPU slice node pool, use the
--spotflag to create aSpot instance.↩
Quotas for additional GKE resources
You may need to increase the following GKE-related quotas in theregions where GKE creates your resources.
- Persistent Disk SSD (GB) quota: The boot disk of each Kubernetes noderequires 100GB by default. Therefore, this quota should be set at least ashigh as the product of the maximum number of GKE nodes you anticipatecreating and 100GB (nodes * 100GB).
- In-use IP addresses quota: Each Kubernetes node consumes one IP address.Therefore, this quota should be set at least as high as the maximum number ofGKE nodes you anticipate creating.
- Ensure that
max-pods-per-nodealigns with the subnet range: Each Kubernetes nodeuses secondary IP ranges for Pods. For example,max-pods-per-nodeof 32requires 64 IP addresses which translates to a /26 subnetper node.Note that this range shouldn't be shared with any other cluster. To avoidexhausting the IP address range, use the--max-pods-per-nodeflag to limit the number of podsallowed to be scheduled on a node. The quota formax-pods-per-nodeshould be set at least as high as the maximum number ofGKE nodes you anticipate creating.
To request an increase in quota, seeRequest a quota adjustment.
Ensure reservation availability
To create a TPU slice node pool using a reservation, the reservation must havesufficient available TPU chips at the time of node pool creation.
To see which reservations exist within a project and how many TPU chips within aTPU reservation are available,view a list of your reservations.
Create a cluster
You can create a cluster that uses TPUs by using the Google Cloud CLI or a AcceleratedProcessing Kit (XPK).
- Use the Google Cloud CLI to manually create your GKE clusterinstance for precise customization or expansion of existing productionGKE environments.
- Use XPK to quickly create GKE clusters and run workloadsforproof-of-concept and testing. For more information and instructions, see theXPK README.
The following document describes how to configure TPUs using the Google Cloud CLI.
Create a GKE cluster in Standard mode in a region withavailable TPUs.
Use regional clusters, which provide high availability of theKubernetes control plane.
gcloudcontainerclusterscreateCLUSTER_NAME\--locationLOCATION\--cluster-versionVERSIONReplace the following:
CLUSTER_NAME: the name of the new cluster.LOCATION: the region with your TPU capacity available.VERSION: the GKE version, which must support the machine type that you want to use.Note that the default GKE version might not have availability for your target TPU.To learn what are the minimum GKE versions available by TPU machine type,seeTPU availability in GKE.
Provision TPUs
To provision TPUs in GKE you have the following configuration options:- Manually create a node pool: you can create a node pool with a specific TPU version and topology.
- Use GKE node auto-provisioning: you can enable node auto-provisioning at the cluster level and then, in your Pod's manifest, use a nodeSelector to specify the TPU version and topology. When a pending Pod matches these selectors, GKE automatically creates a new node pool that meets the request. This method requires you to set cluster-level resource limits for TPUs.
- Define custom ComputeClasses: you can requestTPUs by using custom ComputeClasses. Custom ComputeClasses let platformadministrators define a hierarchy of node configurations forGKE to prioritize during node scaling decisions, so thatworkloads run on your selected hardware.
Manually create a node pool
You can create a single or multi-host TPU slice node pool.
Create a single-host TPU slice node pool
You can create asingle-host TPU slice node poolusing the Google Cloud CLI, Terraform, or the Google Cloud console.
gcloud
gcloudcontainernode-poolscreateNODE_POOL_NAME\--location=LOCATION\--cluster=CLUSTER_NAME\--node-locations=NODE_ZONES\--machine-type=MACHINE_TYPE\[--sandbox=type=gvisor]Replace the following:
NODE_POOL_NAME: the name of the new node pool.LOCATION: the name of the zone based onthe TPU version you want to use. To identify an available location, seeTPU availability in GKE.CLUSTER_NAME: the name of the cluster.NODE_ZONES: the comma-separated list of one or more zones where GKE creates the node pool.MACHINE_TYPE: theTPU version and type. Forexample, usetpu7x-standard-4tfor Ironwood (TPU7x).
Optionally, you can also use the following flags:
--num-nodes=NUM_NODES: The initial number of nodesin the node pool in each zone.Best practice: If you use the
enable-autoscalingflag for the node pool, setnum-nodesto0so that the autoscalerprovisionsadditional nodes as soon as your workloads demand them.--reservation=RESERVATION_NAME: The name of thereservation GKE uses when creating the node pool. If youomit this flag, GKE uses available TPUs.To learn more about TPU reservations, seeAbout Cloud TPU reservations.--node-labelscloud.google.com/gke-workload-type=HIGH_AVAILABILITY: TellsGKE that the single-host TPU slice node pool is part of acollection. Use this flag if the following conditions apply:- The node pool runs inference workload in the new node pool.
- The node pool uses TPU Trillium.
- The node pool doesn't use Spot VMs.
To learn more about collection scheduling management, seeManage collection scheduling in single-host TPU slices.
--enable-autoscaling: Create a node pool with autoscaling enabled.Requires the following additional flags:--total-min-nodes=TOTAL_MIN_NODES: Minimumnumber of all nodes in the node pool.--total-max-nodes=TOTAL_MAX_NODES: Maximumnumber of all nodes in the node pool.--location-policy=ANY: prioritize usage of unused reservations andreduce the preemption risk of Spot VMs.
--spot: Sets the node pool to useSpot VMs for thenodes in the node pool. This cannot be changed after node pool creation.--flex-start: Sets the node pool to use Flex-start VMs. Flex-start VMs are created by using theflex-startconsumption option. For more information, seeRun a small batch workload with TPUs and Flex-start VMs.
For a full list of all the flags that you can specify, see thegcloud container clusters createreference.
Terraform
- Ensure that you use the version 4.84.0 or later of the
googleprovider. - Add the following block to your Terraform configuration:
resource"google_container_node_pool""NODE_POOL_RESOURCE_NAME"{provider=googleproject=PROJECT_IDcluster=CLUSTER_NAMEname=POOL_NAMElocation=CLUSTER_LOCATIONnode_locations=[NODE_ZONES]node_config{machine_type=MACHINE_TYPEreservation_affinity{consume_reservation_type="SPECIFIC_RESERVATION"key="compute.googleapis.com/reservation-name"values=[RESERVATION_LABEL_VALUES]}spot=trueflex_start=false}}Replace the following:
NODE_POOL_RESOURCE_NAME: The name of the node poolresource in the Terraform template.PROJECT_ID: Your project ID.CLUSTER_NAME: The name of the existing cluster.POOL_NAME: The name of the node pool to create.CLUSTER_LOCATION: The compute zone(s) of thecluster. Specify the region where the TPU version is available. To learnmore, seeSelect a TPU version and topology.NODE_ZONES: The comma-separated list of one or more zones where GKE creates the node pool.MACHINE_TYPE: The type of TPU machine to use. Tosee TPU compatible machine types, use the table inChoose the TPU version.
Optionally, you can also use the following variables:
autoscaling: Create a node pool with autoscaling enabled. Forsingle-host TPU slice, GKE scales between theTOTAL_MIN_NODESandTOTAL_MAX_NODESvalues.TOTAL_MIN_NODES: Minimum number of all nodesin the node pool. This field is optional unless autoscaling is alsospecified.TOTAL_MAX_NODES: Maximum number of all nodesin the node pool. This field is optional unless autoscaling is alsospecified.
RESERVATION_NAME: If you useAbout Cloud TPU reservations,this is the list of labels of the reservation resources to use whencreating the node pool. To learn more about how to populate theRESERVATION_LABEL_VALUESin thereservation_affinityfield, seeTerraform Provider.spot: Sets the node pool to use Spot VMs for the TPUnodes. This cannot be changed after node pool creation. For moreinformation, seeSpot VMs.flex_start: Sets the node pool to useflex-startconsumption option. Can't be set totrueifspotis enabled. Flex-start is supported in GKE version 1.33.0-gke.1712000 or later.
Console
To create a node pool with TPUs:
Go to theGoogle Kubernetes Engine page in the Google Cloud console.
In the cluster list, click the name of the cluster you want to modify.
Clickadd_boxAdd node pool.
In theNode pool details section, check theSpecify node locations box.
Select the zone based on the TPU version you want to use. To identify an available zone, seeTPU availability in GKE.
From the navigation pane, clickNodes.
In theMachine Configuration section, selectTPUs.
In theSeries drop-down menu, select one of the following:
- CT3: TPU v3, single host device
- CT3P: TPU v3, multi host pod slice
- CT4P: TPU v4
- CT5LP: TPU v5e
- CT5P: TPU v5p
- CT6E: TPU Trillium (v6e)
In theMachine type drop-down menu, select the name of the machine to use fornodes. Use theChoose the TPU version tableto learn how to define the machine type and TPU topology that create asingle-host TPU slice node pool.
In theTPU Topology drop-down menu, select the physical topology for the TPU slice.
In theChanges needed dialog, clickMake changes.
Ensure thatBoot disk type iseitherStandard persistent disk orSSD persistent disk.
Optionally, select theEnable nodes on spot VMs checkbox to useSpot VMs for the nodes in the node pool.
ClickCreate.
Create a multi-host TPU slice node pool
The steps to create a multi-host TPU slice node pool differ depending on whether you use Ironwood (TPU7x) or an earlier TPU version.
Ironwood (TPU7x)
Preview — Ironwood (TPU7x) This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
You can create amulti-host TPU slice nodepool in version Ironwood (TPU7x) by using the Google Cloud CLI or Terraform:
gcloud
To create a multi-host TPU slice node pool with Ironwood (TPU7x), you mustfirst create a workload policy.
Note: You don't need to create a new workload policy for every node pool. Aworkload policy is unique per project, per region, and per topology. Youcan reuse the same workload policy for multiple node pools that sharethese characteristics. To see the list of workload policies, use thegcloud compute resource-policies list --filter="region:REGION" command.Create a workload policy:
gcloudcomputeresource-policiescreateworkload-policyWORKLOAD_POLICY_NAME\--type=HIGH_THROUGHPUT\--accelerator-topology=TPU_TOPOLOGY\--project=PROJECT_ID\--region=REGIONReplace the following:
WORKLOAD_POLICY_NAME: a name for yourworkload policy.TPU_TOPOLOGY: the TPU Ironwood (TPU7x) topology.For example,2x2x2. To see all supported Ironwood (TPU7x) topologies, see thetopology section.PROJECT_ID: your Google Cloud project ID.REGION: the region for the workload policy.A workload policy is a regional resource and can be re-used acrossnode pools that share the same topology.
Create the node pool with the workload policy:
gcloudcontainernode-poolscreateNODE_POOL_NAME\--cluster=CLUSTER_NAME\--machine-type=tpu7x-standard-4t\--placement-policy=WORKLOAD_POLICY_NAME\--location=CONTROL_PLANE_LOCATION\--node-locations=NODE_ZONE\--project=PROJECT_ID\--reservation=RESERVATION_NAME\--reservation-affinity=specificReplace the following:
NODE_POOL_NAME: the name for your new node pool.CLUSTER_NAME: the name of your GKE cluster.WORKLOAD_POLICY_NAME: the name of the workload policy you created.CONTROL_PLANE_LOCATION: the Compute Enginelocation of the control plane of yourcluster. Provide a region for regional clusters, or a zone for zonal clusters.NODE_ZONE: the name of the zone based onthe TPU version you want to use. To identify an available location, seeTPU availability in GKE.PROJECT_ID: your Google Cloud project ID.RESERVATION_NAME: the name of the reservation to use.
In this command, the
--tpu-topologyflag has been replaced by the--placement-policyflag.
Terraform
- Ensure that you use the version 4.84.0 or later of the
googleprovider. Create a workload policy:
resource"google_compute_resource_policy"{name="WORKLOAD_POLICY_NAME"region=CLUSTER_LOCATIONworkload_policy{type="HIGH_THROUGHPUT"accelerator_topology="TPU_TOPOLOGY"}}Replace the following:
WORKLOAD_POLICY_NAME: a name for your workload policy.CLUSTER_LOCATION: Compute location for thecluster. We recommend having a regional cluster for higher reliability ofthe Kubernetes control plane. You can also use a zonal cluster.For more information, seeSelect a TPU version and topology.TPU_TOPOLOGY: the TPU Ironwood (TPU7x) topology.For example,2x2x2. To see all supported Ironwood (TPU7x) topologies, seePlan TPUs.
For more information about the
google_compute_resource_policyreference, seeTerraform Provider.In your Terraform configuration, add the following block:
resource"google_container_node_pool""NODE_POOL_RESOURCE_NAME"{provider=googleproject=PROJECT_IDcluster=CLUSTER_NAMEname=POOL_NAMElocation=CLUSTER_LOCATIONnode_locations=[NODE_ZONES]initial_node_count=NUM_NODESautoscaling{max_node_count=MAX_NODESlocation_policy="ANY"}node_config{machine_type=MACHINE_TYPEreservation_affinity{consume_reservation_type="SPECIFIC_RESERVATION"key="compute.googleapis.com/reservation-name"values=[RESERVATION_LABEL_VALUES]}flex_start=falsespot=true}placement_policy{policy_name=WORKLOAD_POLICY_NAME}}Replace the following:
NODE_POOL_RESOURCE_NAME: the name of the nodepool resource in the Terraform template.PROJECT_ID: your project ID.CLUSTER_NAME: the name of the existing clusterto add the node pool to.POOL_NAME: the name of the node pool to create.NODE_ZONES: the comma-separated list of one or more zones where GKE creates the node pool.NUM_NODES: the number of nodes in the nodepool. It must be zero or the product of the number of the TPU chipsdivided by four, because in multi-host TPU slices each TPU slice node has fourchips. For example, ifTPU_TOPOLOGYis4x8,then there are 32 chips, which meansNUM_NODESmust be 8. To learn more about TPU topologies, use the table inChoose the TPU version.TPU_TOPOLOGY: this indicates the selectedphysical topology for the TPU slice. The format of the topologydepends on the TPU version you are using. To learn more about TPU topologies, use the table inChoose a topology.
Optionally, you can also use the following variables:
RESERVATION_NAME: if you use aTPU reservation,provide a list of reservation-resource labels to use whencreating the node pool. To learn more about how to populatetheRESERVATION_LABEL_VALUESin thereservation_affinityfield, seeTerraform Provider.autoscaling: create a node pool with autoscaling enabled. WhenGKE scales a multi-host TPU slice node pool, itatomically scales up the node pool from zero to the maximum size.MAX_NODES: the maximum size of the nodepool. The value must be equal to the product of the values defined inTPU_TOPOLOGY({A}x{B}x{C}) divided by the number of chips in each VM. For example, ifTPU_TOPOLOGYis2x2x2, the product is 8. Since each VM intpu7x-standard-4thas 4 chips, the number of nodes is 2.
spot: the node pool that will use Spot VMs for the TPU slice nodes.This setting cannot be changed after the node pool is created. For more information,seeSpot VMs.flex_start: the node pool that will useflex-startconsumption option. This setting can't be set totrueifspotis enabled.
Other TPU versions
You can create amulti-host TPU slice nodepool in version v3, v4, v5p, v5e, and Trillium (v6e) by using the Google Cloud CLI, Terraform, orthe Google Cloud console.
gcloud
gcloudcontainernode-poolscreatePOOL_NAME\--location=CONTROL_PLANE_LOCATION\--cluster=CLUSTER_NAME\--node-locations=NODE_ZONE\--machine-type=MACHINE_TYPE\--tpu-topology=TPU_TOPOLOGY\[--num-nodes=NUM_NODES]\[--spot\][--flex-start\][--enable-autoscaling\--max-nodesMAX_NODES][--reservation-affinity=specific\--reservation=RESERVATION_NAME]\[--node-labelscloud.google.com/gke-nodepool-group-name=COLLECTION_NAME,cloud.google.com/gke-workload-type=HIGH_AVAILABILITY][--placement-type=COMPACT]Replace the following:
POOL_NAME: the name of the new node pool.CONTROL_PLANE_LOCATION: the Compute Enginelocation of the controlplane of your cluster. Provide a region for regional clusters, or azone for zonal clusters.CLUSTER_NAME: the name of the cluster.NODE_ZONES: the name of the zone based onthe TPU version you want to use. To identify an available location, seeTPU availability in GKE.MACHINE_TYPE: the type of machine to use fornodes. To learn more about the available machine types, seeChoose the TPU version.TPU_TOPOLOGY: the physicaltopology for the TPU slice. The format of the topology depends on the TPUversion. For more information about TPU topologies, use the table inChoose a topology.For more information, seeTopology.
Optionally, you can also use the following flags:
NUM_NODES: the number of nodes in the node pool. It must be zero or the product of the values defined inTPU_TOPOLOGY({A}x{B}x{C}) divided by the numberof chips in each VM. For multi-host TPU v4 and TPU v5e, the number of chips in eachVM is four. Therefore, if yourTPU_TOPOLOGYis2x4x4(TPU v4 with four chips in each VM), then theNUM_NODESis 32/4 which equals to 8. If you omit this flag, the number of nodes is calculated anddefaulted based on the topology and machine type.RESERVATION_NAME: the name of the reservation GKE uses when creating the node pool. If you omit this flag, GKE uses available TPU slice node pools. For more information about TPU reservations, seeTPU reservation.--spot: sets the node pool to use Spot VMs forthe TPU slice nodes. This cannot be changed after node pool creation. For moreinformation, seeSpot VMs.--flex-start: sets the node pool to use Flex-start VMs. Flex-start VMs are created by using theflex-start consumption option, which is supported in GKE version 1.33.0-gke.1712000 or later.--enable-autoscaling: Create a node pool with autoscaling enabled. WhenGKE scales a multi-host TPU slice node pool, itatomically scales up the node pool from zero to the maximum size.MAX_NODES: the maximum size of the nodepool. The--max-nodesflag is required if--enable-autoscalingis supplied andmust be equal to the product of the values defined inTPU_TOPOLOGY({A}x{B}x{C}) divided by the number of chips in each VM.
--node-label=cloud.google.com/gke-nodepool-group-name=COLLECTION_NAME,cloud.google.com/gke-workload-type=HIGH_AVAILABILITY: TellsGKE that the multi-host TPU slice node pool is acollection. Use this flag if the following conditions apply:- The node pool runs inference workloads.
- The node pool uses TPU Trillium.
- Spot VMs don't support collection scheduling.
For more information about collection scheduling management, seeManage collection scheduling in multi-host TPU slices.
--placement-type=COMPACT: Create a node pool with compact placement enabled.This option must be used with the flag--tpu-topology.For more information, seeCreate a compact placement policy andTPU Topology.
Terraform
- Ensure that you use the version 4.84.0 or later of the
googleprovider. Add the following block to your Terraform configuration:
resource"google_container_node_pool""NODE_POOL_RESOURCE_NAME"{provider=googleproject=PROJECT_IDcluster=CLUSTER_NAMEname=POOL_NAMElocation=CLUSTER_LOCATIONnode_locations=[NODE_ZONES]initial_node_count=NUM_NODESautoscaling{max_node_count=MAX_NODESlocation_policy="ANY"}node_config{machine_type=MACHINE_TYPEreservation_affinity{consume_reservation_type="SPECIFIC_RESERVATION"key="compute.googleapis.com/reservation-name"values=[RESERVATION_LABEL_VALUES]}flex_start=falsespot=true}placement_policy{type="COMPACT"tpu_topology=TPU_TOPOLOGY}}Replace the following:
NODE_POOL_RESOURCE_NAME: the name of the nodepool resource in the Terraform template.PROJECT_ID: your project ID.CLUSTER_NAME: the name of the existing clusterto add the node pool to.POOL_NAME: the name of the node pool to create.CLUSTER_LOCATION: compute location for thecluster. We recommend having a regional cluster for higher reliability ofthe Kubernetes control plane. You can also use a zonal cluster.To learn more, seeSelect a TPU version and topology.NODE_ZONES: the comma-separated list of one or more zones where GKE creates the node pool.NUM_NODES: the number of nodes in the nodepool. It must be zero or the product of the number of the TPU chipsdivided by four, because in multi-host TPU slices each TPU slice node has 4chips. For example, ifTPU_TOPOLOGYis4x8,then there are 32 chips which meansNUM_NODESmust be 8. To learn more about TPU topologies, use the table inChoose the TPU version.TPU_TOPOLOGY: this indicates the physical topology for the TPU slice. The format of the topologydepends on the TPU version you are using. To learn more about TPU topologies, use the table inChoose a topology.
Optionally, you can also use the following variables:
RESERVATION_NAME: if you useTPU reservation,this is the list of labels of the reservation resources to use whencreating the node pool. To learn more about how to populatetheRESERVATION_LABEL_VALUESin thereservation_affinityfield, seeTerraform Provider.autoscaling: Create a node pool with autoscaling enabled. WhenGKE scales a multi-host TPU slice node pool, itatomically scales up the node pool from zero to the maximum size.MAX_NODES: it is the maximum size of the nodepool. It must be equal to the product of the values defined inTPU_TOPOLOGY({A}x{B}x{C}) divided by the number of chips in each VM).
spot: lets the node pool to use Spot VMs for the TPU slice nodes.This cannot be changed after node pool creation. For more information,seeSpot VMs.flex_start: Sets the node pool to useflex-startconsumption option. Can't be set totrueifspotis enabled.
Console
To create a node pool with TPUs:
Go to theGoogle Kubernetes Engine page in the Google Cloud console.
In the cluster list, click the name of the cluster you want to modify.
Clickadd_boxAdd node pool.
In theNode pool details section, check theSpecify node locations box.
Select the name of the zone based onthe TPU version you want to use. To identify an available location, seeTPU availability in GKE.
From the navigation pane, clickNodes.
In theMachine Configuration section, selectTPUs.
In theSeries drop-down menu, select one of the following:
- CT3: TPU v3, single-host device
- CT3P: TPU v3, multi-host pod slice
- CT4P: TPU v4
- CT5LP: TPU v5e
- CT5P: TPU v5p
- CT6E: TPU Trillium (v6e)
In theMachine type drop-down menu, select the name of the machine to use fornodes. Use theChoose the TPU version tableto learn how to define the machine type and TPU topology that create amulti-host TPU slice node pool.
In theTPU Topology drop-down menu, select the physical topology for the TPU slice.
In theChanges needed dialog, clickMake changes.
Ensure thatBoot disk type iseitherStandard persistent disk orSSD persistent disk.
Optionally, select theEnable nodes on spot VMs checkbox to useSpot VMs for the nodes in thenode pool.
ClickCreate.
Use GKE node auto-provisioning
You can configure GKE to automatically create and delete nodepools to meet theresource demands of your TPU workloads.
To enable node pool auto-provisioning, edit your cluster TPU resource limits:
gcloudcontainerclustersupdateCLUSTER_NAME\--location=CONTROL_PLANE_LOCATION\--enable-autoprovisioning\--min-cpu=MINIMUM_CPU\--min-memory=MINIMUM_MEMORY\--max-cpu=MAXIMUM_CPU\--max-memory=MAXIMUM_MEMORY\--min-accelerator=type=TPU_TYPE,count=MINIMUM_TPU_COUNT\--max-accelerator=type=TPU_TYPE,count=MAXIMUM_TPU_COUNTReplace the following:
TPU_TYPE: theTPU type. For example, usetpu7x-standard-4tfor Ironwood (TPU7x).MINIMUM_TPU_COUNT: the minimum number of TPUchips of the specified type that the cluster can have. If the value thatyou specify is larger than the number of TPU chips in a multi-host TPUslice, GKE removesall nodes in the slice. Multi-host node pools scale between 0 andthe number of nodes in the slice, with no intermediate values.MAXIMUM_TPU_COUNT: the maximum number of TPUchips of the specified type that the cluster can have. For multi-hostTPU slices, specify a value that's greater than the number of chips ineach slice so that GKE can scale the slice atomically.The number of chips in a slice is the product of the TPU topology. Forexample, if the topology is2x2x2, the number of chips in the slice is8, which means that the value ofMAXIMUM_TPU_COUNTmust be greater than8.
Define custom ComputeClasses
You can also configure GKE to request TPUs duringscaling operations that create new nodes by usingcustom ComputeClasses.
You can specify TPU configuration options in your custom ComputeClassspecification. When a GKE workload uses that custom ComputeClass, GKE attempts to provision TPUs that use yourspecified configuration when scaling up.
The following sections show you how to create a custom ComputeClass and thencreate a Job that consumes the TPUs defined in the ComputeClass.
Create a custom ComputeClass
The steps to create a custom ComputeClass that follows theTPU rules differ depending on whether you use Ironwood (TPU7x) or an earlier TPU version.
Ironwood (TPU7x)
Preview — Ironwood (TPU7x) This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Create a workload policy. This step is required only if you are creating amulti-host node pool, which depends on the topology you choose. If you use a single-host node pool, skip this step.
gcloudcomputeresource-policiescreateworkload-policyWORKLOAD_POLICY_NAME\--type=HIGH_THROUGHPUT\--accelerator-topology=TPU_TOPOLOGY\--project=PROJECT_ID\--region=REGIONReplace the following:
WORKLOAD_POLICY_NAME: a name for your workload policy.TPU_TOPOLOGY: the TPU Ironwood (TPU7x) topology. For example, use2x2x2. For more information about all supported Ironwood (TPU7x) topologies, seetopology section.PROJECT_ID: Your Google Cloud project ID.REGION: The region for the workload policy. A workload policy is a regional resource and you can use it across node pools.
Save the following manifest as
tpu-compute-class.yaml:apiVersion:cloud.google.com/v1kind:ComputeClassmetadata:name:tpu-classspec:priorities:-tpu:type:tpu7xtopology:TPU_TOPOLOGYcount:4placement:policyName:WORKLOAD_POLICY_NAMEnodePoolAutoCreation:enabled:true(Optional) You can consume a specific reservation or sub-block. For example, you can add the following
specsto yourComputeClassmanifest:reservations:affinity:Specificspecific:-name:RESERVATION_NAMEreservationBlock:name:RESERVATION_BLOCK_NAMEreservationSubBlock:name:RESERVATION_SUB_BLOCK_NAMEReplace the following:
RESERVATION_NAME: the name of theCompute Engine capacity reservation.RESERVATION_BLOCK_NAME: the name of theCompute Engine capacity reservation block.RESERVATION_SUB_BLOCK_NAME: the name of theCompute Engine capacity reservation sub-block.
For more information, seeConsuming reserved zonal resources.
Other TPU versions
To provision v3, v4, v5p, v5e, or v6e (Trillium) TPUs by using a custom ComputeClassconfigured for TPUs, complete the following steps:
Save the following manifest as
tpu-compute-class.yaml:apiVersion:cloud.google.com/v1kind:ComputeClassmetadata:name:tpu-classspec:priorities:-tpu:type:TPU_TYPEcount:NUMBER_OF_CHIPStopology:TOPOLOGY-spot:truetpu:type:{"<var>"}}TPU_TYPEcount:NUMBER_OF_CHIPStopology:TOPOLOGY-flexStart:enabled:truetpu:type:{"<var>"}}TPU_TYPEcount:NUMBER_OF_CHIPStopology:TOPOLOGYnodePoolAutoCreation:enabled:trueReplace the following:
TPU_TYPE: the TPU type to use, liketpu-v4-podslice. Must be a valuesupported by GKE.TOPOLOGY: the arrangement of TPU chips in theslice, like2x2x4. Must be a supported topology for the selected TPUtype.NUMBER_OF_CHIPS: the number of TPU chips forthe container to use. Must be the same value forlimitsandrequests.
Deploy the ComputeClass:
kubectlapply-ftpu-compute-class.yamlFor more information about custom ComputeClasses and TPUs, seeTPU configuration.
Create a Job that consumes TPUs
Save the following manifest as
tpu-job.yaml:apiVersion:v1kind:Servicemetadata:name:headless-svcspec:clusterIP:Noneselector:job-name:tpu-job---apiVersion:batch/v1kind:Jobmetadata:name:tpu-jobspec:backoffLimit:0completions:4parallelism:4completionMode:Indexedtemplate:spec:subdomain:headless-svcrestartPolicy:NevernodeSelector:cloud.google.com/compute-class:tpu-classcontainers:-name:tpu-jobimage:us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latestports:-containerPort:8471# Default port using which TPU VMs communicate-containerPort:8431# Port to export TPU runtime metrics, if supported.command:-bash--c-|python -c 'import jax; print("TPU cores:", jax.device_count())'resources:requests:cpu:10memory:MEMORY_SIZEgoogle.com/tpu:NUMBER_OF_CHIPSlimits:cpu:10memory:MEMORY_SIZEgoogle.com/tpu:NUMBER_OF_CHIPSReplace the following:
NUMBER_OF_CHIPS: the number of TPU chips forthe container to use. Must be the same value forlimitsandrequests,equal to theCHIP_COUNTvalue in the selectedcustom ComputeClass.MEMORY_SIZE: The maximum amount of memory thatthe TPU uses. Memory limits depend on the TPU version and topology thatyou use. To learn more, seeMinimums and maximums for accelerators.NUMBER_OF_CHIPS: the number of TPU chips forthe container to use. Must be the same value forlimitsandrequests.
Deploy the Job:
kubectlcreate-ftpu-job.yamlWhen you create this Job, GKE automatically does the following:
- Provisions nodes to run the Pods. Depending on the TPU type, topology,and resource requests that you specified, these nodes are either single-hostslices or multi-host slices. Depending on the availability of TPU resourcesin the top priority, GKE might fall back to lower prioritiesto maximize obtainability.
- Adds taints to the Pods and tolerations to the nodes to prevent any of yourother workloads from running on the same nodes as TPU workloads.
To learn more, seeAbout custom ComputeClasses.
When you finish this section, you can avoid continued billing by deleting theresources you created:
kubectldelete-ftpu-job.yaml
Prepare your workloads
TPU workloads have the following preparation requirements.
- Frameworks like JAX, PyTorch, and TensorFlow access TPU VMs using the
libtpushared library.libtpuincludes the XLA compiler, TPU runtime software, and the TPU driver. Each release of PyTorch and JAX requires a certainlibtpu.soversion. To avoid package version conflicts, we recommend using aJAX AI image. To use TPUs in GKE, ensure that you use the following versions:TPU type libtpu.soversionIronwood (TPU7x) (Preview) tpu7x- Recommended JAX AI image:jax0.8.1-rev1 or later
- Recommended jax[tpu] version:v0.8.1
TPU Trillium (v6e) tpu-v6e-slice- Recommended JAX AI image:jax0.4.35-rev1 or later
- Recommended jax[tpu] version:v0.4.9 or later
- Recommended torchxla[tpuvm] version:v2.1.0 or later
TPU v5e tpu-v5-lite-podslice- Recommended JAX AI image:jax0.4.35-rev1 or later
- Recommended jax[tpu] version:v0.4.9 or later
- Recommended torchxla[tpuvm] version:v2.1.0 or later
TPU v5p tpu-v5p-slice- Recommended JAX AI image:jax0.4.35-rev1 or later
- Recommended jax[tpu] version:0.4.19 or later.
- Recommended torchxla[tpuvm] version: suggested to use a nightly version build on October 23, 2023.
TPU v4 tpu-v4-podslice- Recommended JAX AI image:jax0.4.35-rev1 or later
- Recommended jax[tpu]:v0.4.4 or later
- Recommended torchxla[tpuvm]:v2.0.0 or later
TPU v3 tpu-v3-slicetpu-v3-device- Recommended JAX AI image:jax0.4.35-rev1 or later
- Recommended jax[tpu]:v0.4.4 or later
- Recommended torchxla[tpuvm]:v2.0.0 or later
- In your workload manifest, add Kubernetes node selectors to ensure that GKE schedules your TPU workload on the TPU machine type andTPU topology you defined:
nodeSelector: cloud.google.com/gke-tpu-accelerator:TPU_ACCELERATOR cloud.google.com/gke-tpu-topology:TPU_TOPOLOGY cloud.google.com/placement-policy-name:WORKLOAD_POLICY # Required only for Ironwood (TPU7x)
Replace the following:
TPU_ACCELERATOR: the name of theTPU accelerator. For example, usetpu7x-standard-4t.TPU_TOPOLOGY: the physical topology for the TPU slice. The format of the topology depends on the TPU version. For example, use2x2x2. To learn more, seePlan TPUs in GKE.WORKLOAD_POLICY: the name of the workload policy that you want to use to place your TPU Pods. This node selector is required only for Ironwood (TPU7x).
After you complete the workload preparation, you can run a Job that uses TPUs.
The following sections show examples on how to run a Job that performsbasic computation with TPUs.
Run your workload on TPU slice nodes
This section explains how to prepare your workloads and examples of how youcan run your workloads.
Example 1: Run a Deployment that requests TPUs in the Pod specification
GKE uses the configuration in your Pod or ComputeClass todetermine the configuration of your TPU nodes. The following manifest is anexample of a Deployment specification that requests TPUs in the Podspecification. If the cluster-level node auto-provisioning setting is enabled,this Deployment triggers node pool auto-creation. When you create this exampleDeployment, GKE creates a node pool that contains a TPU v4 slicewith a2x2x2 topology and twoct4p-hightpu-4t machines.
apiVersion:apps/v1kind:Deploymentmetadata:name:tpu-workloadlabels:app:tpu-workloadspec:replicas:2template:spec:nodeSelector:cloud.google.com/gke-tpu-accelerator:tpu-v4-podslicecloud.google.com/gke-tpu-topology:2x2x2containers:-name:tpu-jobimage:us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latestports:-containerPort:8431# Port to export TPU runtime metrics, if supported.securityContext:privileged:true# Required for GKE versions earlier than 1.28 to access TPUs.command:-bash--c-|python -c 'import jax; print("Total TPU chips:", jax.device_count())'resources:requests:google.com/tpu:4limits:google.com/tpu:4ports:-containerPort:80In this manifest, the following fields define TPU configuration:
cloud.google.com/gke-tpu-accelerator: theTPU version and type. Forexample, usetpu7x-standard-4tfor Ironwood (TPU7x).cloud.google.com/gke-tpu-topology: thetopology with number and physical arrangement ofTPU chips within a TPU slice. For example, use2x2x2.limits.google.com/tpu: the number ofTPU chips per VM. For example, if you usetpu7x-standard-4t, the number of TPU chips per VM is4.
Example 2: Run a workload that displays the number of available TPU chips in a TPU slice node pool
The following workload returns the number of TPU chips across all of the nodes in a multi-host TPU slice. To create a multi-host slice, the workload has the following parameters:
- TPU version: TPU v4
- Topology: 2x2x4
This version and topology selection result in a multi-host slice.
- Save the following manifest as
available-chips-multihost.yaml:apiVersion:v1kind:Servicemetadata:name:headless-svcspec:clusterIP:Noneselector:job-name:tpu-available-chips---apiVersion:batch/v1kind:Jobmetadata:name:tpu-available-chipsspec:backoffLimit:0completions:4parallelism:4completionMode:Indexedtemplate:spec:subdomain:headless-svcrestartPolicy:NevernodeSelector:cloud.google.com/gke-tpu-accelerator:tpu-v4-podslice# Node selector to target TPU v4 slice nodes.cloud.google.com/gke-tpu-topology:2x2x4# Specifies the physical topology for the TPU slice.containers:-name:tpu-jobimage:us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latestports:-containerPort:8471# Default port using which TPU VMs communicate-containerPort:8431# Port to export TPU runtime metrics, if supported.securityContext:privileged:true# Required for GKE versions earlier than 1.28 to access TPUs.command:-bash--c-|python -c 'import jax; print("TPU cores:", jax.device_count())' # Python command to count available TPU chips.resources:requests:cpu:10memory:407Gigoogle.com/tpu:4# Request 4 TPU chips for this workload.limits:cpu:10memory:407Gigoogle.com/tpu:4# Limit to 4 TPU chips for this workload.
- Deploy the manifest:
kubectl create -f available-chips-multihost.yaml
GKE runs a TPU v4 slice with four VMs (multi-host TPU slice). The slice has 16 interconnected TPU chips.
- Verify that the Job created four Pods:
kubectl get pods
The output is similar to the following:
NAME READY STATUS RESTARTS AGEtpu-job-podslice-0-5cd8r 0/1 Completed 0 97stpu-job-podslice-1-lqqxt 0/1 Completed 0 97stpu-job-podslice-2-f6kwh 0/1 Completed 0 97stpu-job-podslice-3-m8b5c 0/1 Completed 0 97s
- Get the logs of one of the Pods:
kubectl logsPOD_NAME
Replace
POD_NAMEwith the name of one of the created Pods. For example,tpu-job-podslice-0-5cd8r.The output is similar to the following:
TPU cores: 16
- Optional: Remove the workload:
kubectl delete -f available-chips-multihost.yaml
Example 3: Run a workload that displays the number of available TPU chips in the TPU slice
The following workload is a static Pod that displays the number of TPU chips that are attached to a specific node. To create a single-host node, the workload has the following parameters:
- TPU version: TPU v5e
- Topology: 2x4
This version and topology selection result in a single-host slice.
- Save the following manifest as
available-chips-singlehost.yaml:apiVersion:v1kind:Podmetadata:name:tpu-job-jax-v5spec:restartPolicy:NevernodeSelector:cloud.google.com/gke-tpu-accelerator:tpu-v5-lite-podslice# Node selector to target TPU v5e slice nodes.cloud.google.com/gke-tpu-topology:2x4# Specify the physical topology for the TPU slice.containers:-name:tpu-jobimage:us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latestports:-containerPort:8431# Port to export TPU runtime metrics, if supported.securityContext:privileged:true# Required for GKE versions earlier than 1.28 to access TPUs.command:-bash--c-|python -c 'import jax; print("Total TPU chips:", jax.device_count())'resources:requests:google.com/tpu:8# Request 8 TPU chips for this container.limits:google.com/tpu:8# Limit to 8 TPU chips for this container.
- Deploy the manifest:
kubectl create -f available-chips-singlehost.yaml
GKE provisions nodes with eight single-host TPU slices that use TPU v5e. Each TPU node has eight TPU chips (single-host TPU slice).
- Get the logs of the Pod:
kubectl logs tpu-job-jax-v5
The output is similar to the following:
Total TPU chips: 8
- Optional: Remove the workload:
kubectl delete -f available-chips-singlehost.yaml
Upgrade node pools using accelerators (GPUs and TPUs)
GKEautomatically upgradesStandard clusters, including node pools. You can alsomanuallyupgrade nodepools if you want your nodes on a later version sooner. To control how upgradeswork for your cluster, usereleasechannels,maintenancewindows andexclusions,androlloutsequencing.
You can also configure anode upgradestrategy foryour node pool, such assurgeupgrades,blue-greenupgradesorshort-lived upgrades.By configuring these strategies, you can ensure that the node pools are upgradedin a way that achieves the optimal balance between speed and disruption for yourenvironment. Formulti-host TPU slice nodepools, instead of using theconfigured node upgrade strategy, GKE atomically recreates theentire node pool in a single step. To learn more, see the definition ofatomicity inTerminology related to TPU inGKE.
Using a node upgrade strategy temporarily requires GKE toprovision additional resources, depending on the configuration. If Google Cloudhas limited capacity for your node pool's resources—for example, you're seeingresource availabilityerrors when trying to create more nodes with GPUs or TPUs—seeUpgrade in aresource-constrainedenvironment.
Clean up
To avoid incurring charges to your Google Cloud account for the resourcesused in this guide, consider deleting the TPU slice node pools that no longer havescheduled workloads. If the workloads running must be gracefullyterminated, usekubectl drain to clean up the workloads before you delete the node.
Delete a TPU slice node pool:
gcloudcontainernode-poolsdeletePOOL_NAME\--location=LOCATION\--cluster=CLUSTER_NAMEReplace the following:
POOL_NAME: The name of the node pool.CLUSTER_NAME: The name of the cluster.LOCATION: The compute location of the cluster.
Configure additional settings
The following sections describe the additional configurations you can apply to your TPU workloads.
Use Multislice
You can aggregate smaller slices together in a Multislice to handlelarger training workloads. For more information, seeMultislice TPUs in GKE.
Migrate your TPU reservation
If you have existing TPU reservations, you must first migrateyour TPU reservation to a new Compute Engine-based reservation system. You can also create Compute Engine-based reservation system where no migration is needed. Tolearn how to migrate your TPU reservations, seeTPU reservation.
Enable logging
Logs emitted by containers running on GKE nodes, including TPUVMs, arecollected by theGKE logging agent, sent to Logging, and arevisible in Logging.
Configure auto repair for TPU slice nodes
If a TPU slice node in a multi-host TPU slice node pool is unhealthy, the entirenode pool is recreated. Whereas, In a single-host TPU slice node pool, only theunhealthy TPU node is auto-repaired.
Conditions that result in unhealthy TPU slice nodes include thefollowing:
- Any TPU slice node with common nodeconditions.
- Any TPU slice node with an unallocatable TPU count larger than zero.
- Any VM instance in a TPU slice that is stopped (due to preemption) or is terminated.
- Node maintenance: If any TPU slice node within a multi-host TPU slice nodepool goes down for host maintenance, GKE recreates the entireTPU slice node pool.
You can see the repair status (including the failure reason) in theoperation history.If the failure is caused by insufficient quota, contact yourGoogle Cloud account representative to increase the corresponding quota.
Configure graceful termination for TPU slice nodes
In GKE clusters with the control plane running 1.29.1-gke.1425000or later, TPU slice nodes supportSIGTERM signals that alert the node of an imminentshutdown. The imminent shutdown notification is configurable up tofive minutesin TPU nodes.
To configure GKE to terminate your workloads gracefullywithin this notification timeframe, follow the steps inManage GKE node disruption for GPUs and TPUs.
Run containers without privileged mode
Containers running in nodes in GKE version 1.28 or later don't need to have privileged mode enabled to accessTPUs. Nodes in GKE version 1.28 and earlierrequire privileged mode.
If your TPU slice node is running versions less than 1.28, read the following section:
A container running on a VM in a TPU slice needs access to higher limits on lockedmemory so the driver can communicate with the TPU chips over direct memoryaccess (DMA). To enable this, you must configure a higherulimit. If you want toreduce the permission scope on your container, complete the following steps:
Edit the
securityContextto include the following fields:securityContext:capabilities:add:["SYS_RESOURCE"]Increase
ulimitby running the following command inside the containerbefore your setting up your workloads to use TPU resources:ulimit-l68719476736
For TPU v5e, running containers without privileged mode is availablein clusters in version 1.27.4-gke.900 and later.
Observability and metrics
Dashboard
Node pool observability in theGoogle Cloud console is generally available.To view the status of your TPU multi-host node pools on GKE, go toGKE TPU Node Pool Status dashboard provided by Cloud Monitoring:
Go to GKE TPU Node Pool Status
This dashboard gives you comprehensive insights into the health of your multi-host TPU node pools.For more information, seeMonitor health metrics for TPU nodes and node pools.
In theKubernetes Clusters page in theGoogle Cloud console, theObservability tab also displays TPU observabilitymetrics, such as TPU usage, under theAccelerators > TPU heading.For more information, seeView observability metrics.
The TPU dashboard is populated only if you havesystem metricsenabled in your GKE cluster.
Runtime metrics
In GKE version 1.27.4-gke.900 or later, TPU workloadsthat both use JAX version0.4.14or later and specifycontainerPort: 8431 export TPU utilization metrics as GKEsystem metrics.The following metrics are available in Cloud Monitoringto monitor your TPU workload's runtime performance:
- Duty cycle: percentage of time over the past sampling period (60 seconds) duringwhich the TensorCores were actively processing on a TPU chip.Larger percentage means better TPU utilization.
- Memory used: amount of accelerator memory allocated in bytes. Sampled every 60seconds.
- Memory total: total accelerator memory in bytes. Sampled every 60 seconds.
These metrics are located in the Kubernetes node (k8s_node) and Kubernetescontainer (k8s_container) schema.
Kubernetes container:
kubernetes.io/container/accelerator/duty_cyclekubernetes.io/container/accelerator/memory_usedkubernetes.io/container/accelerator/memory_total
Kubernetes node:
kubernetes.io/node/accelerator/duty_cyclekubernetes.io/node/accelerator/memory_usedkubernetes.io/node/accelerator/memory_total
Monitor health metrics for TPU nodes and node pools
When a training job has an error or terminates in failure, you can check metricsrelated to the underlying infrastructure to figure out if the interruption was caused by an issue with the underlying node or node pool.
Node status
In GKE version 1.32.1-gke.1357001 or later, the followingGKE system metricexposes the condition of a GKE node:
kubernetes.io/node/status_condition
Thecondition field reports conditions on the node, such asReady,DiskPressure, andMemoryPressure. Thestatus field shows the reported status of the condition,which can beTrue,False, orUnknown. This is a metric with thek8s_node monitored resource type.
This PromQL query shows if a particular node isReady:
kubernetes_io:node_status_condition{monitored_resource="k8s_node",cluster_name="CLUSTER_NAME",node_name="NODE_NAME",condition="Ready",status="True"}To help troubleshoot issues in a cluster, you might want to look at nodes that haveexhibited other conditions:
kubernetes_io:node_status_condition{monitored_resource="k8s_node",cluster_name="CLUSTER_NAME",condition!="Ready",status="True"}You might want to specifically look at nodes that aren'tReady:
kubernetes_io:node_status_condition{monitored_resource="k8s_node",cluster_name="CLUSTER_NAME",condition="Ready",status="False"}If there is no data, then the nodes are ready. The status condition is sampledevery 60 seconds.
You can use the following query to understand the node status across the fleet:
avgby(condition,status)(avg_over_time(kubernetes_io:node_status_condition{monitored_resource="k8s_node"}[${__interval}]))Node pool status
The followingGKE system metric for thek8s_node_pool monitored resourceexposes the status of a GKE node pool:
kubernetes.io/node_pool/status
This metric is reported only for multi-host TPU node pools.
Thestatus field reports the status of the node pool, such asProvisioning,Running,Error,Reconciling, orStopping. Status updates happen after GKE API operations complete.
To verify if a particular node pool hasRunning status, use the following PromQL query:
kubernetes_io:node_pool_status{monitored_resource="k8s_node_pool",cluster_name="CLUSTER_NAME",node_pool_name="NODE_POOL_NAME",status="Running"}To monitor the number of node pools in your project grouped by their status,use the following PromQL query:
countby(status)(count_over_time(kubernetes_io:node_pool_status{monitored_resource="k8s_node_pool"}[${__interval}]))Node pool availability
The followingGKE system metric shows whether a multi-host TPU node pool is available:
kubernetes.io/node_pool/multi_host/available
The metric has a value ofTrue if all of the nodes in the node pool are available,andFalse otherwise. The metric is sampled every 60 seconds.
To check the availability of multi-host TPU node pools in your project, use thefollowing PromQL query:
avgby(node_pool_name)(avg_over_time(kubernetes_io:node_pool_multi_host_available{monitored_resource="k8s_node_pool",cluster_name="CLUSTER_NAME"}[${__interval}]))Node interruption count
The followingGKE system metric reports the count of interruptions for a GKE node sincethe last sample (the metric is sampled every 60 seconds):
kubernetes.io/node/interruption_count
Theinterruption_type (such asTerminationEvent,MaintenanceEvent, orPreemptionEvent) andinterruption_reason(likeHostError,Eviction, orAutoRepair) fields can help provide the reason for whya node was interrupted.
To get a breakdown of the interruptions and their causes in TPU nodes in theclusters in your project, use the following PromQL query:
sumby(interruption_type,interruption_reason)(sum_over_time(kubernetes_io:node_interruption_count{monitored_resource="k8s_node"}[${__interval}]))To only see thehost maintenance events,update the query to filter theHW/SW Maintenance value for theinterruption_reason. Use the following PromQL query:
sumby(interruption_type,interruption_reason)(sum_over_time(kubernetes_io:node_interruption_count{monitored_resource="k8s_node",interruption_reason="HW/SW Maintenance"}[${__interval}]))To see the interruption count aggregated by node pool, use the following PromQL query:
sumby(node_pool_name,interruption_type,interruption_reason)(sum_over_time(kubernetes_io:node_pool_interruption_count{monitored_resource="k8s_node_pool",interruption_reason="HW/SW Maintenance",node_pool_name=NODE_POOL_NAME}[${__interval}]))Node pool times to recover (TTR)
The followingGKE system metric reportsthe distribution of recovery period durations for GKE multi-host TPU node pools:
kubernetes.io/node_pool/accelerator/times_to_recover
Each sample recorded in this metric indicates a single recovery event for the node pool from a downtime period.
This metric is useful for tracking the multi-host TPU node pool time to recover and time between interruptions.
You can use the following PromQL query to calculate the mean time to recovery (MTTR) for the last 7 days in your cluster:
sum(sum_over_time(kubernetes_io:node_pool_accelerator_times_to_recover_sum{monitored_resource="k8s_node_pool",cluster_name="CLUSTER_NAME"}[7d]))/sum(sum_over_time(kubernetes_io:node_pool_accelerator_times_to_recover_count{monitored_resource="k8s_node_pool",cluster_name="CLUSTER_NAME"}[7d]))Node pool times between interruptions (TBI)
Node pool times between interruptions measures how long your infrastructure runs before experiencing an interruption.It is computed as the average over a window of time, where the numerator measures the total time that your infrastructure was up and the denominator measures the total interruptions to your infrastructure.
The following PromQL example shows the 7-day mean time between interruptions (MTBI) for the given cluster:
sum(count_over_time(kubernetes_io:node_memory_total_bytes{monitored_resource="k8s_node",node_name=~"gke-tpu.*|gk3-tpu.*",cluster_name="CLUSTER_NAME"}[7d]))/sum(sum_over_time(kubernetes_io:node_interruption_count{monitored_resource="k8s_node",node_name=~"gke-tpu.*|gk3-tpu.*",cluster_name="CLUSTER_NAME"}[7d]))Host metrics
In GKE version 1.28.1-gke.1066000 or later, VMs in a TPU sliceexport TPU utilization metrics as GKEsystem metrics.The following metrics are available in Cloud Monitoringto monitor your TPU host's performance:
- TensorCore utilization: current percentage of the TensorCore that is utilized. The TensorCore value equals the sum of thematrix-multiply units (MXUs) plus the vector unit.The TensorCore utilization value is the division of the TensorCore operations that wereperformed over the past sample period (60 seconds) by thesupported number ofTensorCore operations over the same period.Larger value means better utilization.
- Memory bandwidth utilization: current percentage of the acceleratormemory bandwidth that is being used. Computed by dividing the memory bandwidthused over a sample period (60s) by the maximum supported bandwidth over thesame sample period.
These metrics are located in the Kubernetes node (k8s_node) and Kubernetescontainer (k8s_container) schema.
Kubernetes container:
kubernetes.io/container/accelerator/tensorcore_utilizationkubernetes.io/container/accelerator/memory_bandwidth_utilization
Kubernetes node:
kubernetes.io/node/accelerator/tensorcore_utilizationkubernetes.io/node/accelerator/memory_bandwidth_utilization
For more information, seeKubernetes metricsandGKE system metrics.
Manage collection scheduling
In TPU Trillium, you can use collection scheduling to group TPU slice nodes.Grouping these TPU slice nodes makes it easier to adjust the number of replicas tomeet the workload demand. Google Cloud controls software updates to ensurethat sufficient slices within the collection are always available to serve traffic.
TPU Trillium supports collection scheduling for single-host and multi-host node poolsthat run inference workloads. The following describes how collection schedulingbehavior depends on the type of TPU slice that you use:
- Multi-host TPU slice: GKE groupsmulti-host TPU slices to form a collection. EachGKE node pool is a replica withinthis collection. To define a collection, create a multi-host TPU sliceand assign a unique name to the collection. To add more TPU slicesto the collection, create another multi-host TPU slice node pool with the samecollection name and workload type.
- Single-host TPU slice: GKE considers the entiresingle-host TPU slice node pool as a collection. To add more TPU slicesto the collection, you can resize the single-host TPU slice node pool.
To manage a collection, perform any of these actions based on the type of nodepool that you use.
Manage collection scheduling in multi-host TPU slice node pools
Use the following tasks to manage multi-host TPU slice node pools.
To check if a multi-host TPU slice pool is part of a collection, run the following command:
gcloudcontainernode-poolsdescribeNODE_POOL_NAME\--locationLOCATION\--clusterCLUSTER_NAME\--format="json"|jq-r\'"nodepool-group-name: \(.config.labels["cloud.google.com/gke-nodepool-group-name"] // "")\ngke-workload-type: \(.config.labels["cloud.google.com/gke-workload-type"] // "")"'The output is similar to the following:
nodepool-group-name: <code><var>NODE_POOL_COLLECTION_NAME</var></code>gke-workload-type: HIGH_AVAILABILITYIf multi-host TPU slice pool is part of a collection, the output has the following labels:
cloud.google.com/gke-workload-type: HIGH_AVAILABILITYcloud.google.com/gke-nodepool-group-name: <code><var>COLLECTION_NAME</var></code>
To get the list of collections in the cluster, run the following command:
#!/bin/bash# Replace with your cluster name, project, and locationCLUSTER_NAME=CLUSTER_NAMEPROJECT=PROJECT_IDLOCATION=LOCATIONdeclare-Acollection_namesnode_pools=$(gcloudcontainernode-poolslist--cluster"$CLUSTER_NAME"--project"$PROJECT"--location"$LOCATION"--format="value(name)")# Iterate over each node poolforpoolin$node_pools;do# Describe the node pool and extract labels using jqcollection_name=$(gcloudcontainernode-poolsdescribe"$pool"\--cluster"$CLUSTER_NAME"\--project"$PROJECT"\--location"$LOCATION"\--format="json"|jq-r'.config.labels["cloud.google.com/gke-nodepool-group-name"]')# Add the collection name to the associative array if it's not emptyif[[-n"$collection_name"]];thencollection_names["$collection_name"]=1fidone# Print the unique node pool collection namesecho"Unique cloud.google.com/gke-nodepool-group-name values:"fornamein"${!collection_names[@]}";doecho"$name"doneThe output is similar to the following:
Unique cloud.google.com/gke-nodepool-group-name values: {COLLECTION_NAME_1}, {COLLECTION_NAME_2}, {COLLECTION_NAME_3}To get a list of node pools that belong to a collection, run the followingcommand:
#!/bin/bashTARGET_COLLECTION_NAME=COLLECTION_NAMECLUSTER_NAME=CLUSTER_NAMEPROJECT=PROJECT_IDLOCATION=LOCATIONmatching_node_pools=()# Get the list of all node pools in the clusternode_pools=$(gcloudcontainernode-poolslist--cluster"$CLUSTER_NAME"--project"$PROJECT"--location"$LOCATION"--format="value(name)")# Iterate over each node poolforpoolin$node_pools;do# Get the value of the cloud.google.com/gke-nodepool-group-name labelcollection_name=$(gcloudcontainernode-poolsdescribe"$pool"\--cluster"$CLUSTER_NAME"\--project"$PROJECT"\--location"$LOCATION"\--format="json"|jq-r'.config.labels["cloud.google.com/gke-nodepool-group-name"]')# Check if the group name matches the target valueif[["$collection_name"=="$TARGET_COLLECTION_NAME"]];thenmatching_node_pools+=("$pool")fidone# Print the list of matching node poolsecho"Node pools with collection name '$TARGET_COLLECTION_NAME':"forpoolin"${matching_node_pools[@]}";doecho"$pool"doneThe output is similar to the following:
Node pools with collection name 'COLLECTION_NAME':{NODE_POOL_NAME_1}{NODE_POOL_NAME_2}{NODE_POOL_NAME_3}To scale up the collection, create another multi-host TPU slice node pool and add the
cloud.google.com/gke-workload-typeandcloud.google.com/gke-nodepool-group-name. Use the same collection name incloud.google.com/gke-nodepool-group-nameand run the same workload type. If node auto-provisioning is enabled on the cluster, GKE automatically creates pools based on workload demands.To scale down the collection,delete the node pool.
To delete the collection, remove all of the attached node pools. You candelete the node pool ordelete the cluster. Deleting the cluster removes all of the collections in it.
Manage collection scheduling in single-host TPU slice node pools
Use the following tasks to manage single-host TPU slice node pools.
To check if a single-host TPU slice pool has collection scheduling enabled, run the following command:
gcloudcontainernode-poolsdescribeNODE_POOL_NAME\--clusterCLUSTER_NAME\--projectPROJECT_NAME\--locationLOCATION\--format="json"|jq-r'.config.labels["cloud.google.com/gke-workload-type"]'The output is similar to the following:
gke-workload-type: HIGH_AVAILABILITYIf the single-host TPU slice pool is part of a collection, the output has the
cloud.google.com/gke-workload-type: HIGH_AVAILABILITYlabel.To scale up the collection, resize the node poolmanually orautomatically with node auto-provisioning.
To scale down the collection,delete the node pool.
To delete the collection, remove all of the attached node pools. You candelete the node pool ordelete the cluster. Deleting the clusterremoves all of the collections in it.
Known issues
- Cluster autoscaler might incorrectly calculate capacity for new TPU slice nodes beforethose nodes report available TPUs. Cluster autoscaler might then performadditional scale up and as a result create more nodes than needed. Clusterautoscaler scales down additional nodes, if they are not needed, afterregular scale down operation.
- Cluster autoscaler cancels scaling up of TPU slice node pools that remain in waitingstatus for more than 10 hours. Cluster Autoscaler retries such scale upoperations later. This behavior might reduce TPU obtainability for customerswho don't use reservations.
- Non-TPU workloads that have a toleration for the TPU taint can prevent scale down of thenode pool if they are being recreated during draining of the TPU slice node pool.
- Memory bandwidth utilization metric is not available for v5e TPUs.
What's next
- Learn more about setting up Ray on GKE with TPUs
- Build large-scale machine learning on Cloud TPUs withGKE
- Serve Large Language Models with KubeRay onTPUs
- Troubleshoot TPUs in GKE
- Learn about sandboxing TPU workloads with GKE Sandbox
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.