Deploy TPU workloads in GKE Standard

This page provides a foundation for learning how to accelerate machine learning(ML) workloads using TPUs in Google Kubernetes Engine (GKE). TPUs are designed formatrix multiplication processing, such as large-scale deep learning modeltraining. TPUs are optimized to handle the enormous datasets and complex modelsof ML and therefore are more cost-effective and energy efficient for MLworkloads due to their superior performance. In this guide, you learn how todeploy ML workloads by using Cloud TPU accelerators, configure quotas forTPUs, configure upgrades for node pools that run TPUs, and monitor TPU workloadmetrics.

This tutorial is intended for Machine learning (ML) engineers andPlatform admins and operators who are interested in using Kubernetes containerorchestration to manage large-scale model training, tuning, and inferenceworkloads using TPUs. To learn more about common roles and example tasksreferenced in Google Cloud content, seeCommon GKE user roles and tasks.

Before reading this page, ensure that you're familiar with the following:

Before you begin

Before you start, make sure that you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task,install and theninitialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running thegcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.Note: For existing gcloud CLI installations, make sure to set thecompute/regionproperty. If you use primarily zonal clusters, set thecompute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following:One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Plan your TPU configuration

Plan your TPU configuration based on your model and how muchmemory it requires. Before you use this guide to deploy your workloads on TPU,complete the planning steps inPlan your TPU configuration.

Ensure that you have TPU quota

The following sections help you ensure that you have enough quota when using TPUs in GKE.

Quota for on-demand or Spot VMs

If you are creating aTPU slice node pool with on-demand or Spot VMs, you musthave sufficient TPU quota available in the region that you want to use.

Creating a TPU slice node pool that consumes a TPU reservation doesnotrequire any TPU quota.1You may safely skip this section for reserved TPUs.

Creating an on-demand or Spot TPU slice node pool in GKErequires Compute Engine API quota. Compute Engine API quota (compute.googleapis.com)is not the same as Cloud TPU API quota (tpu.googleapis.com), which is neededwhen creating TPUs with the Cloud TPU API.

To check the limit and current usage of your Compute Engine API quota for TPUs,follow these steps:

  1. Go to theQuotas page in the Google Cloud console:

    Go to Quotas

  2. In theFilter box, dothe following:

    1. Use the following table to select and copy the property of the quota based on theTPU version and machine type. For example, if you plan to create on-demandTPU v5e nodes whosemachine type begins withct5lp-,enterName: TPU v5 Lite PodSlice chips.

      TPU version, machine type begins withProperty and name of the quota for on-demand instancesProperty and name of the quota for Spot2 instances
      TPU v3,
      ct3-
      Dimensions (e.g. location):
      tpu_family:CT3
      Not applicable
      TPU v3,
      ct3p-
      Dimensions (e.g. location):
      tpu_family:CT3P
      Not applicable
      TPU v4,
      ct4p-
      Name:
      TPU v4 PodSlice chips
      Name:
      Preemptible TPU v4 PodSlice chips
      TPU v5e,
      ct5lp-
      Name:
      TPU v5 Lite PodSlice chips
      Name:
      Preemptible TPU v5 Lite Podslice
      chips
      TPU v5p,
      ct5p-
      Name:
      TPU v5p chips
      Name:
      Preemptible TPU v5p chips
      TPU Trillium,
      ct6e-
      Dimensions (e.g. location):
      tpu_family:CT6E
      Name:
      Preemptible TPU slices v6e
      Ironwood (TPU7x) (Preview),
      tpu7x-standard-4t
      Dimensions (e.g. location):
      tpu_family:tpu7x
      Name:
      Preemptible TPU slices tpu7x
    2. Select theDimensions (e.g. locations) property and enterregion:followed by the name of the region in which you plan to create TPUs inGKE. For example, enterregion:us-west4 if you plan tocreate TPU slice nodes in the zoneus-west4-a. TPU quota is regional, so allzones within the same region consume the same TPU quota.

If no quotas match the filter you entered, then the project has not beengranted any of the specified quota for the region that you need, and you mustrequest a TPU quota adjustment.

When a TPU reservation is created, both the limit and current use values forthe corresponding quota increase by the number of chips in the TPUreservation. For example, when a reservation is created for 16 TPU v5e chipswhosemachine type begins withct5lp-,then both theLimit andCurrent usage for theTPU v5 Lite PodSlice chips quota in the relevantregion increase by 16.

  1. When creating a TPU slice node pool, use the--reservation and--reservation-affinity=specific flags to create a reserved instance. TPU reservations are available when purchasing a commitment.

  2. When creating a TPU slice node pool, use the--spot flag to create aSpot instance.

Quotas for additional GKE resources

You may need to increase the following GKE-related quotas in theregions where GKE creates your resources.

  • Persistent Disk SSD (GB) quota: The boot disk of each Kubernetes noderequires 100GB by default. Therefore, this quota should be set at least ashigh as the product of the maximum number of GKE nodes you anticipatecreating and 100GB (nodes * 100GB).
  • In-use IP addresses quota: Each Kubernetes node consumes one IP address.Therefore, this quota should be set at least as high as the maximum number ofGKE nodes you anticipate creating.
  • Ensure thatmax-pods-per-node aligns with the subnet range: Each Kubernetes nodeuses secondary IP ranges for Pods. For example,max-pods-per-node of 32requires 64 IP addresses which translates to a /26 subnetper node.Note that this range shouldn't be shared with any other cluster. To avoidexhausting the IP address range, use the--max-pods-per-node flag to limit the number of podsallowed to be scheduled on a node. The quota formax-pods-per-node should be set at least as high as the maximum number ofGKE nodes you anticipate creating.

To request an increase in quota, seeRequest a quota adjustment.

Ensure reservation availability

To create a TPU slice node pool using a reservation, the reservation must havesufficient available TPU chips at the time of node pool creation.

To see which reservations exist within a project and how many TPU chips within aTPU reservation are available,view a list of your reservations.

Create a cluster

You can create a cluster that uses TPUs by using the Google Cloud CLI or a AcceleratedProcessing Kit (XPK).

  • Use the Google Cloud CLI to manually create your GKE clusterinstance for precise customization or expansion of existing productionGKE environments.
  • Use XPK to quickly create GKE clusters and run workloadsforproof-of-concept and testing. For more information and instructions, see theXPK README.

The following document describes how to configure TPUs using the Google Cloud CLI.

Create a GKE cluster in Standard mode in a region withavailable TPUs.

Best practice:

Use regional clusters, which provide high availability of theKubernetes control plane.

gcloudcontainerclusterscreateCLUSTER_NAME\--locationLOCATION\--cluster-versionVERSION

Replace the following:

  • CLUSTER_NAME: the name of the new cluster.
  • LOCATION: the region with your TPU capacity available.
  • VERSION: the GKE version, which must support the machine type that you want to use.Note that the default GKE version might not have availability for your target TPU.To learn what are the minimum GKE versions available by TPU machine type,seeTPU availability in GKE.
Important: To use Ironwood (TPU7x) (Preview), you must create the cluster in theRapid release channel.After the cluster is created, you can switch to no channel or usemaintenance exclusionsto manage upgrades.

Provision TPUs

To provision TPUs in GKE you have the following configuration options:
  • Manually create a node pool: you can create a node pool with a specific TPU version and topology.
  • Use GKE node auto-provisioning: you can enable node auto-provisioning at the cluster level and then, in your Pod's manifest, use a nodeSelector to specify the TPU version and topology. When a pending Pod matches these selectors, GKE automatically creates a new node pool that meets the request. This method requires you to set cluster-level resource limits for TPUs.
  • Define custom ComputeClasses: you can requestTPUs by using custom ComputeClasses. Custom ComputeClasses let platformadministrators define a hierarchy of node configurations forGKE to prioritize during node scaling decisions, so thatworkloads run on your selected hardware.

Manually create a node pool

You can create a single or multi-host TPU slice node pool.

Create a single-host TPU slice node pool

You can create asingle-host TPU slice node poolusing the Google Cloud CLI, Terraform, or the Google Cloud console.

gcloud

gcloudcontainernode-poolscreateNODE_POOL_NAME\--location=LOCATION\--cluster=CLUSTER_NAME\--node-locations=NODE_ZONES\--machine-type=MACHINE_TYPE\[--sandbox=type=gvisor]

Replace the following:

  • NODE_POOL_NAME: the name of the new node pool.
  • LOCATION: the name of the zone based onthe TPU version you want to use. To identify an available location, seeTPU availability in GKE.
  • CLUSTER_NAME: the name of the cluster.
  • NODE_ZONES: the comma-separated list of one or more zones where GKE creates the node pool.
  • MACHINE_TYPE: theTPU version and type. Forexample, usetpu7x-standard-4t for Ironwood (TPU7x).

Optionally, you can also use the following flags:

  • --num-nodes=NUM_NODES: The initial number of nodesin the node pool in each zone.

    Best practice:

    If you use theenable-autoscaling flag for the node pool, setnum-nodes to0 so that the autoscalerprovisionsadditional nodes as soon as your workloads demand them.

  • --reservation=RESERVATION_NAME: The name of thereservation GKE uses when creating the node pool. If youomit this flag, GKE uses available TPUs.To learn more about TPU reservations, seeAbout Cloud TPU reservations.

  • --node-labelscloud.google.com/gke-workload-type=HIGH_AVAILABILITY: TellsGKE that the single-host TPU slice node pool is part of acollection. Use this flag if the following conditions apply:

    • The node pool runs inference workload in the new node pool.
    • The node pool uses TPU Trillium.
    • The node pool doesn't use Spot VMs.

    To learn more about collection scheduling management, seeManage collection scheduling in single-host TPU slices.

  • --enable-autoscaling: Create a node pool with autoscaling enabled.Requires the following additional flags:

    • --total-min-nodes=TOTAL_MIN_NODES: Minimumnumber of all nodes in the node pool.
    • --total-max-nodes=TOTAL_MAX_NODES: Maximumnumber of all nodes in the node pool.
    • --location-policy=ANY: prioritize usage of unused reservations andreduce the preemption risk of Spot VMs.
  • --spot: Sets the node pool to useSpot VMs for thenodes in the node pool. This cannot be changed after node pool creation.

  • --flex-start: Sets the node pool to use Flex-start VMs. Flex-start VMs are created by using theflex-startconsumption option. For more information, seeRun a small batch workload with TPUs and Flex-start VMs.

For a full list of all the flags that you can specify, see thegcloud container clusters createreference.

Terraform

  1. Ensure that you use the version 4.84.0 or later of thegoogleprovider.
  2. Add the following block to your Terraform configuration:
resource"google_container_node_pool""NODE_POOL_RESOURCE_NAME"{provider=googleproject=PROJECT_IDcluster=CLUSTER_NAMEname=POOL_NAMElocation=CLUSTER_LOCATIONnode_locations=[NODE_ZONES]node_config{machine_type=MACHINE_TYPEreservation_affinity{consume_reservation_type="SPECIFIC_RESERVATION"key="compute.googleapis.com/reservation-name"values=[RESERVATION_LABEL_VALUES]}spot=trueflex_start=false}}

Replace the following:

  • NODE_POOL_RESOURCE_NAME: The name of the node poolresource in the Terraform template.
  • PROJECT_ID: Your project ID.
  • CLUSTER_NAME: The name of the existing cluster.
  • POOL_NAME: The name of the node pool to create.
  • CLUSTER_LOCATION: The compute zone(s) of thecluster. Specify the region where the TPU version is available. To learnmore, seeSelect a TPU version and topology.
  • NODE_ZONES: The comma-separated list of one or more zones where GKE creates the node pool.
  • MACHINE_TYPE: The type of TPU machine to use. Tosee TPU compatible machine types, use the table inChoose the TPU version.

Optionally, you can also use the following variables:

  • autoscaling: Create a node pool with autoscaling enabled. Forsingle-host TPU slice, GKE scales between theTOTAL_MIN_NODES andTOTAL_MAX_NODES values.
    • TOTAL_MIN_NODES: Minimum number of all nodesin the node pool. This field is optional unless autoscaling is alsospecified.
    • TOTAL_MAX_NODES: Maximum number of all nodesin the node pool. This field is optional unless autoscaling is alsospecified.
  • RESERVATION_NAME: If you useAbout Cloud TPU reservations,this is the list of labels of the reservation resources to use whencreating the node pool. To learn more about how to populate theRESERVATION_LABEL_VALUES in thereservation_affinity field, seeTerraform Provider.
  • spot: Sets the node pool to use Spot VMs for the TPUnodes. This cannot be changed after node pool creation. For moreinformation, seeSpot VMs.
  • flex_start: Sets the node pool to useflex-startconsumption option. Can't be set totrue ifspot is enabled. Flex-start is supported in GKE version 1.33.0-gke.1712000 or later.

Console

To create a node pool with TPUs:

  1. Go to theGoogle Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. In the cluster list, click the name of the cluster you want to modify.

  3. ClickAdd node pool.

  4. In theNode pool details section, check theSpecify node locations box.

  5. Select the zone based on the TPU version you want to use. To identify an available zone, seeTPU availability in GKE.

  6. From the navigation pane, clickNodes.

  7. In theMachine Configuration section, selectTPUs.

  8. In theSeries drop-down menu, select one of the following:

    • CT3: TPU v3, single host device
    • CT3P: TPU v3, multi host pod slice
    • CT4P: TPU v4
    • CT5LP: TPU v5e
    • CT5P: TPU v5p
    • CT6E: TPU Trillium (v6e)
  9. In theMachine type drop-down menu, select the name of the machine to use fornodes. Use theChoose the TPU version tableto learn how to define the machine type and TPU topology that create asingle-host TPU slice node pool.

  10. In theTPU Topology drop-down menu, select the physical topology for the TPU slice.

  11. In theChanges needed dialog, clickMake changes.

  12. Ensure thatBoot disk type iseitherStandard persistent disk orSSD persistent disk.

  13. Optionally, select theEnable nodes on spot VMs checkbox to useSpot VMs for the nodes in the node pool.

  14. ClickCreate.

Create a multi-host TPU slice node pool

The steps to create a multi-host TPU slice node pool differ depending on whether you use Ironwood (TPU7x) or an earlier TPU version.

Ironwood (TPU7x)

Preview — Ironwood (TPU7x)

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

You can create amulti-host TPU slice nodepool in version Ironwood (TPU7x) by using the Google Cloud CLI or Terraform:

gcloud

To create a multi-host TPU slice node pool with Ironwood (TPU7x), you mustfirst create a workload policy.

Note: You don't need to create a new workload policy for every node pool. Aworkload policy is unique per project, per region, and per topology. Youcan reuse the same workload policy for multiple node pools that sharethese characteristics. To see the list of workload policies, use thegcloud compute resource-policies list --filter="region:REGION" command.
  1. Create a workload policy:

    gcloudcomputeresource-policiescreateworkload-policyWORKLOAD_POLICY_NAME\--type=HIGH_THROUGHPUT\--accelerator-topology=TPU_TOPOLOGY\--project=PROJECT_ID\--region=REGION

    Replace the following:

  2. Create the node pool with the workload policy:

    gcloudcontainernode-poolscreateNODE_POOL_NAME\--cluster=CLUSTER_NAME\--machine-type=tpu7x-standard-4t\--placement-policy=WORKLOAD_POLICY_NAME\--location=CONTROL_PLANE_LOCATION\--node-locations=NODE_ZONE\--project=PROJECT_ID\--reservation=RESERVATION_NAME\--reservation-affinity=specific

    Replace the following:

    • NODE_POOL_NAME: the name for your new node pool.
    • CLUSTER_NAME: the name of your GKE cluster.
    • WORKLOAD_POLICY_NAME: the name of the workload policy you created.
    • CONTROL_PLANE_LOCATION: the Compute Enginelocation of the control plane of yourcluster. Provide a region for regional clusters, or a zone for zonal clusters.
    • NODE_ZONE: the name of the zone based onthe TPU version you want to use. To identify an available location, seeTPU availability in GKE.
    • PROJECT_ID: your Google Cloud project ID.
    • RESERVATION_NAME: the name of the reservation to use.

    In this command, the--tpu-topology flag has been replaced by the--placement-policy flag.

Terraform

  1. Ensure that you use the version 4.84.0 or later of thegoogleprovider.
  2. Create a workload policy:

    resource"google_compute_resource_policy"{name="WORKLOAD_POLICY_NAME"region=CLUSTER_LOCATIONworkload_policy{type="HIGH_THROUGHPUT"accelerator_topology="TPU_TOPOLOGY"}}

    Replace the following:

    • WORKLOAD_POLICY_NAME: a name for your workload policy.
    • CLUSTER_LOCATION: Compute location for thecluster. We recommend having a regional cluster for higher reliability ofthe Kubernetes control plane. You can also use a zonal cluster.For more information, seeSelect a TPU version and topology.
    • TPU_TOPOLOGY: the TPU Ironwood (TPU7x) topology.For example,2x2x2. To see all supported Ironwood (TPU7x) topologies, seePlan TPUs.

    For more information about thegoogle_compute_resource_policy reference, seeTerraform Provider.

  3. In your Terraform configuration, add the following block:

    resource"google_container_node_pool""NODE_POOL_RESOURCE_NAME"{provider=googleproject=PROJECT_IDcluster=CLUSTER_NAMEname=POOL_NAMElocation=CLUSTER_LOCATIONnode_locations=[NODE_ZONES]initial_node_count=NUM_NODESautoscaling{max_node_count=MAX_NODESlocation_policy="ANY"}node_config{machine_type=MACHINE_TYPEreservation_affinity{consume_reservation_type="SPECIFIC_RESERVATION"key="compute.googleapis.com/reservation-name"values=[RESERVATION_LABEL_VALUES]}flex_start=falsespot=true}placement_policy{policy_name=WORKLOAD_POLICY_NAME}}

    Replace the following:

    • NODE_POOL_RESOURCE_NAME: the name of the nodepool resource in the Terraform template.
    • PROJECT_ID: your project ID.
    • CLUSTER_NAME: the name of the existing clusterto add the node pool to.
    • POOL_NAME: the name of the node pool to create.
    • NODE_ZONES: the comma-separated list of one or more zones where GKE creates the node pool.
    • NUM_NODES: the number of nodes in the nodepool. It must be zero or the product of the number of the TPU chipsdivided by four, because in multi-host TPU slices each TPU slice node has fourchips. For example, ifTPU_TOPOLOGY is4x8,then there are 32 chips, which meansNUM_NODESmust be 8. To learn more about TPU topologies, use the table inChoose the TPU version.
    • TPU_TOPOLOGY: this indicates the selectedphysical topology for the TPU slice. The format of the topologydepends on the TPU version you are using. To learn more about TPU topologies, use the table inChoose a topology.

    Optionally, you can also use the following variables:

    • RESERVATION_NAME: if you use aTPU reservation,provide a list of reservation-resource labels to use whencreating the node pool. To learn more about how to populatetheRESERVATION_LABEL_VALUES in thereservation_affinity field, seeTerraform Provider.
    • autoscaling: create a node pool with autoscaling enabled. WhenGKE scales a multi-host TPU slice node pool, itatomically scales up the node pool from zero to the maximum size.
      • MAX_NODES: the maximum size of the nodepool. The value must be equal to the product of the values defined inTPU_TOPOLOGY ({A}x{B}x{C}) divided by the number of chips in each VM. For example, ifTPU_TOPOLOGY is2x2x2, the product is 8. Since each VM intpu7x-standard-4t has 4 chips, the number of nodes is 2.
    • spot: the node pool that will use Spot VMs for the TPU slice nodes.This setting cannot be changed after the node pool is created. For more information,seeSpot VMs.
    • flex_start: the node pool that will useflex-startconsumption option. This setting can't be set totrue ifspot is enabled.

Other TPU versions

You can create amulti-host TPU slice nodepool in version v3, v4, v5p, v5e, and Trillium (v6e) by using the Google Cloud CLI, Terraform, orthe Google Cloud console.

gcloud

gcloudcontainernode-poolscreatePOOL_NAME\--location=CONTROL_PLANE_LOCATION\--cluster=CLUSTER_NAME\--node-locations=NODE_ZONE\--machine-type=MACHINE_TYPE\--tpu-topology=TPU_TOPOLOGY\[--num-nodes=NUM_NODES]\[--spot\][--flex-start\][--enable-autoscaling\--max-nodesMAX_NODES][--reservation-affinity=specific\--reservation=RESERVATION_NAME]\[--node-labelscloud.google.com/gke-nodepool-group-name=COLLECTION_NAME,cloud.google.com/gke-workload-type=HIGH_AVAILABILITY][--placement-type=COMPACT]

Replace the following:

  • POOL_NAME: the name of the new node pool.
  • CONTROL_PLANE_LOCATION: the Compute Enginelocation of the controlplane of your cluster. Provide a region for regional clusters, or azone for zonal clusters.
  • CLUSTER_NAME: the name of the cluster.
  • NODE_ZONES: the name of the zone based onthe TPU version you want to use. To identify an available location, seeTPU availability in GKE.
  • MACHINE_TYPE: the type of machine to use fornodes. To learn more about the available machine types, seeChoose the TPU version.
  • TPU_TOPOLOGY: the physicaltopology for the TPU slice. The format of the topology depends on the TPUversion. For more information about TPU topologies, use the table inChoose a topology.

    For more information, seeTopology.

    Optionally, you can also use the following flags:

  • NUM_NODES: the number of nodes in the node pool. It must be zero or the product of the values defined inTPU_TOPOLOGY ({A}x{B}x{C}) divided by the numberof chips in each VM. For multi-host TPU v4 and TPU v5e, the number of chips in eachVM is four. Therefore, if yourTPU_TOPOLOGY is2x4x4 (TPU v4 with four chips in each VM), then theNUM_NODES is 32/4 which equals to 8. If you omit this flag, the number of nodes is calculated anddefaulted based on the topology and machine type.

  • RESERVATION_NAME: the name of the reservation GKE uses when creating the node pool. If you omit this flag, GKE uses available TPU slice node pools. For more information about TPU reservations, seeTPU reservation.

  • --spot: sets the node pool to use Spot VMs forthe TPU slice nodes. This cannot be changed after node pool creation. For moreinformation, seeSpot VMs.

  • --flex-start: sets the node pool to use Flex-start VMs. Flex-start VMs are created by using theflex-start consumption option, which is supported in GKE version 1.33.0-gke.1712000 or later.

  • --enable-autoscaling: Create a node pool with autoscaling enabled. WhenGKE scales a multi-host TPU slice node pool, itatomically scales up the node pool from zero to the maximum size.

    • MAX_NODES: the maximum size of the nodepool. The--max-nodes flag is required if--enable-autoscaling is supplied andmust be equal to the product of the values defined inTPU_TOPOLOGY ({A}x{B}x{C}) divided by the number of chips in each VM.
  • --node-label=cloud.google.com/gke-nodepool-group-name=COLLECTION_NAME,cloud.google.com/gke-workload-type=HIGH_AVAILABILITY: TellsGKE that the multi-host TPU slice node pool is acollection. Use this flag if the following conditions apply:

    • The node pool runs inference workloads.
    • The node pool uses TPU Trillium.
    • Spot VMs don't support collection scheduling.

    For more information about collection scheduling management, seeManage collection scheduling in multi-host TPU slices.

  • --placement-type=COMPACT: Create a node pool with compact placement enabled.This option must be used with the flag--tpu-topology.For more information, seeCreate a compact placement policy andTPU Topology.

Terraform

  1. Ensure that you use the version 4.84.0 or later of thegoogleprovider.
  2. Add the following block to your Terraform configuration:

    resource"google_container_node_pool""NODE_POOL_RESOURCE_NAME"{provider=googleproject=PROJECT_IDcluster=CLUSTER_NAMEname=POOL_NAMElocation=CLUSTER_LOCATIONnode_locations=[NODE_ZONES]initial_node_count=NUM_NODESautoscaling{max_node_count=MAX_NODESlocation_policy="ANY"}node_config{machine_type=MACHINE_TYPEreservation_affinity{consume_reservation_type="SPECIFIC_RESERVATION"key="compute.googleapis.com/reservation-name"values=[RESERVATION_LABEL_VALUES]}flex_start=falsespot=true}placement_policy{type="COMPACT"tpu_topology=TPU_TOPOLOGY}}

    Replace the following:

    • NODE_POOL_RESOURCE_NAME: the name of the nodepool resource in the Terraform template.
    • PROJECT_ID: your project ID.
    • CLUSTER_NAME: the name of the existing clusterto add the node pool to.
    • POOL_NAME: the name of the node pool to create.
    • CLUSTER_LOCATION: compute location for thecluster. We recommend having a regional cluster for higher reliability ofthe Kubernetes control plane. You can also use a zonal cluster.To learn more, seeSelect a TPU version and topology.
    • NODE_ZONES: the comma-separated list of one or more zones where GKE creates the node pool.
    • NUM_NODES: the number of nodes in the nodepool. It must be zero or the product of the number of the TPU chipsdivided by four, because in multi-host TPU slices each TPU slice node has 4chips. For example, ifTPU_TOPOLOGY is4x8,then there are 32 chips which meansNUM_NODESmust be 8. To learn more about TPU topologies, use the table inChoose the TPU version.
    • TPU_TOPOLOGY: this indicates the physical topology for the TPU slice. The format of the topologydepends on the TPU version you are using. To learn more about TPU topologies, use the table inChoose a topology.

    Optionally, you can also use the following variables:

    • RESERVATION_NAME: if you useTPU reservation,this is the list of labels of the reservation resources to use whencreating the node pool. To learn more about how to populatetheRESERVATION_LABEL_VALUES in thereservation_affinity field, seeTerraform Provider.
    • autoscaling: Create a node pool with autoscaling enabled. WhenGKE scales a multi-host TPU slice node pool, itatomically scales up the node pool from zero to the maximum size.
      • MAX_NODES: it is the maximum size of the nodepool. It must be equal to the product of the values defined inTPU_TOPOLOGY ({A}x{B}x{C}) divided by the number of chips in each VM).
    • spot: lets the node pool to use Spot VMs for the TPU slice nodes.This cannot be changed after node pool creation. For more information,seeSpot VMs.
    • flex_start: Sets the node pool to useflex-startconsumption option. Can't be set totrue ifspot is enabled.

Console

To create a node pool with TPUs:

  1. Go to theGoogle Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. In the cluster list, click the name of the cluster you want to modify.

  3. ClickAdd node pool.

  4. In theNode pool details section, check theSpecify node locations box.

  5. Select the name of the zone based onthe TPU version you want to use. To identify an available location, seeTPU availability in GKE.

  6. From the navigation pane, clickNodes.

  7. In theMachine Configuration section, selectTPUs.

  8. In theSeries drop-down menu, select one of the following:

    • CT3: TPU v3, single-host device
    • CT3P: TPU v3, multi-host pod slice
    • CT4P: TPU v4
    • CT5LP: TPU v5e
    • CT5P: TPU v5p
    • CT6E: TPU Trillium (v6e)
  9. In theMachine type drop-down menu, select the name of the machine to use fornodes. Use theChoose the TPU version tableto learn how to define the machine type and TPU topology that create amulti-host TPU slice node pool.

  10. In theTPU Topology drop-down menu, select the physical topology for the TPU slice.

  11. In theChanges needed dialog, clickMake changes.

  12. Ensure thatBoot disk type iseitherStandard persistent disk orSSD persistent disk.

  13. Optionally, select theEnable nodes on spot VMs checkbox to useSpot VMs for the nodes in thenode pool.

  14. ClickCreate.

Use GKE node auto-provisioning

You can configure GKE to automatically create and delete nodepools to meet theresource demands of your TPU workloads.

  1. To enable node pool auto-provisioning, edit your cluster TPU resource limits:

    gcloudcontainerclustersupdateCLUSTER_NAME\--location=CONTROL_PLANE_LOCATION\--enable-autoprovisioning\--min-cpu=MINIMUM_CPU\--min-memory=MINIMUM_MEMORY\--max-cpu=MAXIMUM_CPU\--max-memory=MAXIMUM_MEMORY\--min-accelerator=type=TPU_TYPE,count=MINIMUM_TPU_COUNT\--max-accelerator=type=TPU_TYPE,count=MAXIMUM_TPU_COUNT

    Replace the following:

    • TPU_TYPE: theTPU type. For example, usetpu7x-standard-4t for Ironwood (TPU7x).
    • MINIMUM_TPU_COUNT: the minimum number of TPUchips of the specified type that the cluster can have. If the value thatyou specify is larger than the number of TPU chips in a multi-host TPUslice, GKE removesall nodes in the slice. Multi-host node pools scale between 0 andthe number of nodes in the slice, with no intermediate values.
    • MAXIMUM_TPU_COUNT: the maximum number of TPUchips of the specified type that the cluster can have. For multi-hostTPU slices, specify a value that's greater than the number of chips ineach slice so that GKE can scale the slice atomically.The number of chips in a slice is the product of the TPU topology. Forexample, if the topology is2x2x2, the number of chips in the slice is8, which means that the value ofMAXIMUM_TPU_COUNT must be greater than8.

Define custom ComputeClasses

You can also configure GKE to request TPUs duringscaling operations that create new nodes by usingcustom ComputeClasses.

You can specify TPU configuration options in your custom ComputeClassspecification. When a GKE workload uses that custom ComputeClass, GKE attempts to provision TPUs that use yourspecified configuration when scaling up.

The following sections show you how to create a custom ComputeClass and thencreate a Job that consumes the TPUs defined in the ComputeClass.

Create a custom ComputeClass

The steps to create a custom ComputeClass that follows theTPU rules differ depending on whether you use Ironwood (TPU7x) or an earlier TPU version.

Ironwood (TPU7x)

Preview — Ironwood (TPU7x)

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

  1. Create a workload policy. This step is required only if you are creating amulti-host node pool, which depends on the topology you choose. If you use a single-host node pool, skip this step.

    gcloudcomputeresource-policiescreateworkload-policyWORKLOAD_POLICY_NAME\--type=HIGH_THROUGHPUT\--accelerator-topology=TPU_TOPOLOGY\--project=PROJECT_ID\--region=REGION

    Replace the following:

    • WORKLOAD_POLICY_NAME: a name for your workload policy.
    • TPU_TOPOLOGY: the TPU Ironwood (TPU7x) topology. For example, use2x2x2. For more information about all supported Ironwood (TPU7x) topologies, seetopology section.
    • PROJECT_ID: Your Google Cloud project ID.
    • REGION: The region for the workload policy. A workload policy is a regional resource and you can use it across node pools.
  2. Save the following manifest astpu-compute-class.yaml:

    apiVersion:cloud.google.com/v1kind:ComputeClassmetadata:name:tpu-classspec:priorities:-tpu:type:tpu7xtopology:TPU_TOPOLOGYcount:4placement:policyName:WORKLOAD_POLICY_NAMEnodePoolAutoCreation:enabled:true
  3. (Optional) You can consume a specific reservation or sub-block. For example, you can add the followingspecs to yourComputeClass manifest:

    reservations:affinity:Specificspecific:-name:RESERVATION_NAMEreservationBlock:name:RESERVATION_BLOCK_NAMEreservationSubBlock:name:RESERVATION_SUB_BLOCK_NAME

    Replace the following:

    • RESERVATION_NAME: the name of theCompute Engine capacity reservation.
    • RESERVATION_BLOCK_NAME: the name of theCompute Engine capacity reservation block.
    • RESERVATION_SUB_BLOCK_NAME: the name of theCompute Engine capacity reservation sub-block.

    For more information, seeConsuming reserved zonal resources.

Other TPU versions

To provision v3, v4, v5p, v5e, or v6e (Trillium) TPUs by using a custom ComputeClassconfigured for TPUs, complete the following steps:

  1. Save the following manifest astpu-compute-class.yaml:

    apiVersion:cloud.google.com/v1kind:ComputeClassmetadata:name:tpu-classspec:priorities:-tpu:type:TPU_TYPEcount:NUMBER_OF_CHIPStopology:TOPOLOGY-spot:truetpu:type:{"<var>"}}TPU_TYPEcount:NUMBER_OF_CHIPStopology:TOPOLOGY-flexStart:enabled:truetpu:type:{"<var>"}}TPU_TYPEcount:NUMBER_OF_CHIPStopology:TOPOLOGYnodePoolAutoCreation:enabled:true

    Replace the following:

    • TPU_TYPE: the TPU type to use, liketpu-v4-podslice. Must be a valuesupported by GKE.
    • TOPOLOGY: the arrangement of TPU chips in theslice, like2x2x4. Must be a supported topology for the selected TPUtype.
    • NUMBER_OF_CHIPS: the number of TPU chips forthe container to use. Must be the same value forlimits andrequests.
  2. Deploy the ComputeClass:

    kubectlapply-ftpu-compute-class.yaml

    For more information about custom ComputeClasses and TPUs, seeTPU configuration.

Create a Job that consumes TPUs

  1. Save the following manifest astpu-job.yaml:

    apiVersion:v1kind:Servicemetadata:name:headless-svcspec:clusterIP:Noneselector:job-name:tpu-job---apiVersion:batch/v1kind:Jobmetadata:name:tpu-jobspec:backoffLimit:0completions:4parallelism:4completionMode:Indexedtemplate:spec:subdomain:headless-svcrestartPolicy:NevernodeSelector:cloud.google.com/compute-class:tpu-classcontainers:-name:tpu-jobimage:us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latestports:-containerPort:8471# Default port using which TPU VMs communicate-containerPort:8431# Port to export TPU runtime metrics, if supported.command:-bash--c-|python -c 'import jax; print("TPU cores:", jax.device_count())'resources:requests:cpu:10memory:MEMORY_SIZEgoogle.com/tpu:NUMBER_OF_CHIPSlimits:cpu:10memory:MEMORY_SIZEgoogle.com/tpu:NUMBER_OF_CHIPS

    Replace the following:

    • NUMBER_OF_CHIPS: the number of TPU chips forthe container to use. Must be the same value forlimits andrequests,equal to theCHIP_COUNT value in the selectedcustom ComputeClass.
    • MEMORY_SIZE: The maximum amount of memory thatthe TPU uses. Memory limits depend on the TPU version and topology thatyou use. To learn more, seeMinimums and maximums for accelerators.
    • NUMBER_OF_CHIPS: the number of TPU chips forthe container to use. Must be the same value forlimits andrequests.
  2. Deploy the Job:

    kubectlcreate-ftpu-job.yaml

    When you create this Job, GKE automatically does the following:

    • Provisions nodes to run the Pods. Depending on the TPU type, topology,and resource requests that you specified, these nodes are either single-hostslices or multi-host slices. Depending on the availability of TPU resourcesin the top priority, GKE might fall back to lower prioritiesto maximize obtainability.
    • Adds taints to the Pods and tolerations to the nodes to prevent any of yourother workloads from running on the same nodes as TPU workloads.

    To learn more, seeAbout custom ComputeClasses.

  3. When you finish this section, you can avoid continued billing by deleting theresources you created:

    kubectldelete-ftpu-job.yaml

Prepare your workloads

TPU workloads have the following preparation requirements.

  1. Frameworks like JAX, PyTorch, and TensorFlow access TPU VMs using thelibtpu shared library.libtpu includes the XLA compiler, TPU runtime software, and the TPU driver. Each release of PyTorch and JAX requires a certainlibtpu.so version. To avoid package version conflicts, we recommend using aJAX AI image. To use TPUs in GKE, ensure that you use the following versions:tpu7x
    TPU typelibtpu.so version
    Ironwood (TPU7x) (Preview)
    TPU Trillium (v6e)
    tpu-v6e-slice
    TPU v5e
    tpu-v5-lite-podslice
    TPU v5p
    tpu-v5p-slice
    • Recommended JAX AI image:jax0.4.35-rev1 or later
    • Recommended jax[tpu] version:0.4.19 or later.
    • Recommended torchxla[tpuvm] version: suggested to use a nightly version build on October 23, 2023.
    TPU v4
    tpu-v4-podslice
    TPU v3
    tpu-v3-slice
    tpu-v3-device
  2. In your workload manifest, add Kubernetes node selectors to ensure that GKE schedules your TPU workload on the TPU machine type andTPU topology you defined:

      nodeSelector:    cloud.google.com/gke-tpu-accelerator:TPU_ACCELERATOR    cloud.google.com/gke-tpu-topology:TPU_TOPOLOGY    cloud.google.com/placement-policy-name:WORKLOAD_POLICY # Required only for Ironwood (TPU7x)

    Replace the following:

    • TPU_ACCELERATOR: the name of theTPU accelerator. For example, usetpu7x-standard-4t.
    • TPU_TOPOLOGY: the physical topology for the TPU slice. The format of the topology depends on the TPU version. For example, use2x2x2. To learn more, seePlan TPUs in GKE.
    • WORKLOAD_POLICY: the name of the workload policy that you want to use to place your TPU Pods. This node selector is required only for Ironwood (TPU7x).

After you complete the workload preparation, you can run a Job that uses TPUs.

The following sections show examples on how to run a Job that performsbasic computation with TPUs.

Run your workload on TPU slice nodes

This section explains how to prepare your workloads and examples of how youcan run your workloads.

Example 1: Run a Deployment that requests TPUs in the Pod specification

GKE uses the configuration in your Pod or ComputeClass todetermine the configuration of your TPU nodes. The following manifest is anexample of a Deployment specification that requests TPUs in the Podspecification. If the cluster-level node auto-provisioning setting is enabled,this Deployment triggers node pool auto-creation. When you create this exampleDeployment, GKE creates a node pool that contains a TPU v4 slicewith a2x2x2 topology and twoct4p-hightpu-4t machines.

apiVersion:apps/v1kind:Deploymentmetadata:name:tpu-workloadlabels:app:tpu-workloadspec:replicas:2template:spec:nodeSelector:cloud.google.com/gke-tpu-accelerator:tpu-v4-podslicecloud.google.com/gke-tpu-topology:2x2x2containers:-name:tpu-jobimage:us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latestports:-containerPort:8431# Port to export TPU runtime metrics, if supported.securityContext:privileged:true# Required for GKE versions earlier than 1.28 to access TPUs.command:-bash--c-|python -c 'import jax; print("Total TPU chips:", jax.device_count())'resources:requests:google.com/tpu:4limits:google.com/tpu:4ports:-containerPort:80

In this manifest, the following fields define TPU configuration:

  • cloud.google.com/gke-tpu-accelerator: theTPU version and type. Forexample, usetpu7x-standard-4t for Ironwood (TPU7x).
  • cloud.google.com/gke-tpu-topology: thetopology with number and physical arrangement ofTPU chips within a TPU slice. For example, use2x2x2.
  • limits.google.com/tpu: the number ofTPU chips per VM. For example, if you usetpu7x-standard-4t, the number of TPU chips per VM is4.

Example 2: Run a workload that displays the number of available TPU chips in a TPU slice node pool

The following workload returns the number of TPU chips across all of the nodes in a multi-host TPU slice. To create a multi-host slice, the workload has the following parameters:

  • TPU version: TPU v4
  • Topology: 2x2x4

This version and topology selection result in a multi-host slice.

  1. Save the following manifest asavailable-chips-multihost.yaml:
    apiVersion:v1kind:Servicemetadata:name:headless-svcspec:clusterIP:Noneselector:job-name:tpu-available-chips---apiVersion:batch/v1kind:Jobmetadata:name:tpu-available-chipsspec:backoffLimit:0completions:4parallelism:4completionMode:Indexedtemplate:spec:subdomain:headless-svcrestartPolicy:NevernodeSelector:cloud.google.com/gke-tpu-accelerator:tpu-v4-podslice# Node selector to target TPU v4 slice nodes.cloud.google.com/gke-tpu-topology:2x2x4# Specifies the physical topology for the TPU slice.containers:-name:tpu-jobimage:us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latestports:-containerPort:8471# Default port using which TPU VMs communicate-containerPort:8431# Port to export TPU runtime metrics, if supported.securityContext:privileged:true# Required for GKE versions earlier than 1.28 to access TPUs.command:-bash--c-|python -c 'import jax; print("TPU cores:", jax.device_count())' # Python command to count available TPU chips.resources:requests:cpu:10memory:407Gigoogle.com/tpu:4# Request 4 TPU chips for this workload.limits:cpu:10memory:407Gigoogle.com/tpu:4# Limit to 4 TPU chips for this workload.
  2. Deploy the manifest:
    kubectl create -f available-chips-multihost.yaml

    GKE runs a TPU v4 slice with four VMs (multi-host TPU slice). The slice has 16 interconnected TPU chips.

  3. Verify that the Job created four Pods:
    kubectl get pods

    The output is similar to the following:

    NAME                       READY   STATUS      RESTARTS   AGEtpu-job-podslice-0-5cd8r   0/1     Completed   0          97stpu-job-podslice-1-lqqxt   0/1     Completed   0          97stpu-job-podslice-2-f6kwh   0/1     Completed   0          97stpu-job-podslice-3-m8b5c   0/1     Completed   0          97s
  4. Get the logs of one of the Pods:
    kubectl logsPOD_NAME

    ReplacePOD_NAME with the name of one of the created Pods. For example,tpu-job-podslice-0-5cd8r.

    The output is similar to the following:

    TPU cores: 16
  5. Optional: Remove the workload:
    kubectl delete -f available-chips-multihost.yaml

Example 3: Run a workload that displays the number of available TPU chips in the TPU slice

The following workload is a static Pod that displays the number of TPU chips that are attached to a specific node. To create a single-host node, the workload has the following parameters:

  • TPU version: TPU v5e
  • Topology: 2x4

This version and topology selection result in a single-host slice.

  1. Save the following manifest asavailable-chips-singlehost.yaml:
    apiVersion:v1kind:Podmetadata:name:tpu-job-jax-v5spec:restartPolicy:NevernodeSelector:cloud.google.com/gke-tpu-accelerator:tpu-v5-lite-podslice# Node selector to target TPU v5e slice nodes.cloud.google.com/gke-tpu-topology:2x4# Specify the physical topology for the TPU slice.containers:-name:tpu-jobimage:us-docker.pkg.dev/cloud-tpu-images/jax-ai-image/tpu:latestports:-containerPort:8431# Port to export TPU runtime metrics, if supported.securityContext:privileged:true# Required for GKE versions earlier than 1.28 to access TPUs.command:-bash--c-|python -c 'import jax; print("Total TPU chips:", jax.device_count())'resources:requests:google.com/tpu:8# Request 8 TPU chips for this container.limits:google.com/tpu:8# Limit to 8 TPU chips for this container.
  2. Deploy the manifest:
    kubectl create -f available-chips-singlehost.yaml

    GKE provisions nodes with eight single-host TPU slices that use TPU v5e. Each TPU node has eight TPU chips (single-host TPU slice).

  3. Get the logs of the Pod:
    kubectl logs tpu-job-jax-v5

    The output is similar to the following:

    Total TPU chips: 8
  4. Optional: Remove the workload:
      kubectl delete -f available-chips-singlehost.yaml

Upgrade node pools using accelerators (GPUs and TPUs)

GKEautomatically upgradesStandard clusters, including node pools. You can alsomanuallyupgrade nodepools if you want your nodes on a later version sooner. To control how upgradeswork for your cluster, usereleasechannels,maintenancewindows andexclusions,androlloutsequencing.

You can also configure anode upgradestrategy foryour node pool, such assurgeupgrades,blue-greenupgradesorshort-lived upgrades.By configuring these strategies, you can ensure that the node pools are upgradedin a way that achieves the optimal balance between speed and disruption for yourenvironment. Formulti-host TPU slice nodepools, instead of using theconfigured node upgrade strategy, GKE atomically recreates theentire node pool in a single step. To learn more, see the definition ofatomicity inTerminology related to TPU inGKE.

Using a node upgrade strategy temporarily requires GKE toprovision additional resources, depending on the configuration. If Google Cloudhas limited capacity for your node pool's resources—for example, you're seeingresource availabilityerrors when trying to create more nodes with GPUs or TPUs—seeUpgrade in aresource-constrainedenvironment.

Clean up

To avoid incurring charges to your Google Cloud account for the resourcesused in this guide, consider deleting the TPU slice node pools that no longer havescheduled workloads. If the workloads running must be gracefullyterminated, usekubectl drain to clean up the workloads before you delete the node.

  1. Delete a TPU slice node pool:

    gcloudcontainernode-poolsdeletePOOL_NAME\--location=LOCATION\--cluster=CLUSTER_NAME

    Replace the following:

    • POOL_NAME: The name of the node pool.
    • CLUSTER_NAME: The name of the cluster.
    • LOCATION: The compute location of the cluster.

Configure additional settings

The following sections describe the additional configurations you can apply to your TPU workloads.

Use Multislice

You can aggregate smaller slices together in a Multislice to handlelarger training workloads. For more information, seeMultislice TPUs in GKE.

Migrate your TPU reservation

If you have existing TPU reservations, you must first migrateyour TPU reservation to a new Compute Engine-based reservation system. You can also create Compute Engine-based reservation system where no migration is needed. Tolearn how to migrate your TPU reservations, seeTPU reservation.

Enable logging

Logs emitted by containers running on GKE nodes, including TPUVMs, arecollected by theGKE logging agent, sent to Logging, and arevisible in Logging.

Configure auto repair for TPU slice nodes

If a TPU slice node in a multi-host TPU slice node pool is unhealthy, the entirenode pool is recreated. Whereas, In a single-host TPU slice node pool, only theunhealthy TPU node is auto-repaired.

Conditions that result in unhealthy TPU slice nodes include thefollowing:

  • Any TPU slice node with common nodeconditions.
  • Any TPU slice node with an unallocatable TPU count larger than zero.
  • Any VM instance in a TPU slice that is stopped (due to preemption) or is terminated.
  • Node maintenance: If any TPU slice node within a multi-host TPU slice nodepool goes down for host maintenance, GKE recreates the entireTPU slice node pool.

You can see the repair status (including the failure reason) in theoperation history.If the failure is caused by insufficient quota, contact yourGoogle Cloud account representative to increase the corresponding quota.

Configure graceful termination for TPU slice nodes

In GKE clusters with the control plane running 1.29.1-gke.1425000or later, TPU slice nodes supportSIGTERM signals that alert the node of an imminentshutdown. The imminent shutdown notification is configurable up tofive minutesin TPU nodes.

To configure GKE to terminate your workloads gracefullywithin this notification timeframe, follow the steps inManage GKE node disruption for GPUs and TPUs.

Run containers without privileged mode

Containers running in nodes in GKE version 1.28 or later don't need to have privileged mode enabled to accessTPUs. Nodes in GKE version 1.28 and earlierrequire privileged mode.

If your TPU slice node is running versions less than 1.28, read the following section:

A container running on a VM in a TPU slice needs access to higher limits on lockedmemory so the driver can communicate with the TPU chips over direct memoryaccess (DMA). To enable this, you must configure a higherulimit. If you want toreduce the permission scope on your container, complete the following steps:

  1. Edit thesecurityContext to include the following fields:

    securityContext:capabilities:add:["SYS_RESOURCE"]
  2. Increaseulimit by running the following command inside the containerbefore your setting up your workloads to use TPU resources:

    ulimit-l68719476736

For TPU v5e, running containers without privileged mode is availablein clusters in version 1.27.4-gke.900 and later.

Observability and metrics

Dashboard

Node pool observability in theGoogle Cloud console is generally available.To view the status of your TPU multi-host node pools on GKE, go toGKE TPU Node Pool Status dashboard provided by Cloud Monitoring:

Go to GKE TPU Node Pool Status

This dashboard gives you comprehensive insights into the health of your multi-host TPU node pools.For more information, seeMonitor health metrics for TPU nodes and node pools.

In theKubernetes Clusters page in theGoogle Cloud console, theObservability tab also displays TPU observabilitymetrics, such as TPU usage, under theAccelerators > TPU heading.For more information, seeView observability metrics.

The TPU dashboard is populated only if you havesystem metricsenabled in your GKE cluster.

Runtime metrics

In GKE version 1.27.4-gke.900 or later, TPU workloadsthat both use JAX version0.4.14or later and specifycontainerPort: 8431 export TPU utilization metrics as GKEsystem metrics.The following metrics are available in Cloud Monitoringto monitor your TPU workload's runtime performance:

  • Duty cycle: percentage of time over the past sampling period (60 seconds) duringwhich the TensorCores were actively processing on a TPU chip.Larger percentage means better TPU utilization.
  • Memory used: amount of accelerator memory allocated in bytes. Sampled every 60seconds.
  • Memory total: total accelerator memory in bytes. Sampled every 60 seconds.

These metrics are located in the Kubernetes node (k8s_node) and Kubernetescontainer (k8s_container) schema.

Kubernetes container:

  • kubernetes.io/container/accelerator/duty_cycle
  • kubernetes.io/container/accelerator/memory_used
  • kubernetes.io/container/accelerator/memory_total

Kubernetes node:

  • kubernetes.io/node/accelerator/duty_cycle
  • kubernetes.io/node/accelerator/memory_used
  • kubernetes.io/node/accelerator/memory_total

Monitor health metrics for TPU nodes and node pools

When a training job has an error or terminates in failure, you can check metricsrelated to the underlying infrastructure to figure out if the interruption was caused by an issue with the underlying node or node pool.

Node status

In GKE version 1.32.1-gke.1357001 or later, the followingGKE system metricexposes the condition of a GKE node:

  • kubernetes.io/node/status_condition

Thecondition field reports conditions on the node, such asReady,DiskPressure, andMemoryPressure. Thestatus field shows the reported status of the condition,which can beTrue,False, orUnknown. This is a metric with thek8s_node monitored resource type.

This PromQL query shows if a particular node isReady:

kubernetes_io:node_status_condition{monitored_resource="k8s_node",cluster_name="CLUSTER_NAME",node_name="NODE_NAME",condition="Ready",status="True"}

To help troubleshoot issues in a cluster, you might want to look at nodes that haveexhibited other conditions:

kubernetes_io:node_status_condition{monitored_resource="k8s_node",cluster_name="CLUSTER_NAME",condition!="Ready",status="True"}

You might want to specifically look at nodes that aren'tReady:

kubernetes_io:node_status_condition{monitored_resource="k8s_node",cluster_name="CLUSTER_NAME",condition="Ready",status="False"}

If there is no data, then the nodes are ready. The status condition is sampledevery 60 seconds.

You can use the following query to understand the node status across the fleet:

avgby(condition,status)(avg_over_time(kubernetes_io:node_status_condition{monitored_resource="k8s_node"}[${__interval}]))

Node pool status

The followingGKE system metric for thek8s_node_pool monitored resourceexposes the status of a GKE node pool:

  • kubernetes.io/node_pool/status

This metric is reported only for multi-host TPU node pools.

Thestatus field reports the status of the node pool, such asProvisioning,Running,Error,Reconciling, orStopping. Status updates happen after GKE API operations complete.

To verify if a particular node pool hasRunning status, use the following PromQL query:

kubernetes_io:node_pool_status{monitored_resource="k8s_node_pool",cluster_name="CLUSTER_NAME",node_pool_name="NODE_POOL_NAME",status="Running"}

To monitor the number of node pools in your project grouped by their status,use the following PromQL query:

countby(status)(count_over_time(kubernetes_io:node_pool_status{monitored_resource="k8s_node_pool"}[${__interval}]))

Node pool availability

The followingGKE system metric shows whether a multi-host TPU node pool is available:

  • kubernetes.io/node_pool/multi_host/available

The metric has a value ofTrue if all of the nodes in the node pool are available,andFalse otherwise. The metric is sampled every 60 seconds.

To check the availability of multi-host TPU node pools in your project, use thefollowing PromQL query:

avgby(node_pool_name)(avg_over_time(kubernetes_io:node_pool_multi_host_available{monitored_resource="k8s_node_pool",cluster_name="CLUSTER_NAME"}[${__interval}]))

Node interruption count

The followingGKE system metric reports the count of interruptions for a GKE node sincethe last sample (the metric is sampled every 60 seconds):

  • kubernetes.io/node/interruption_count

Theinterruption_type (such asTerminationEvent,MaintenanceEvent, orPreemptionEvent) andinterruption_reason(likeHostError,Eviction, orAutoRepair) fields can help provide the reason for whya node was interrupted.

To get a breakdown of the interruptions and their causes in TPU nodes in theclusters in your project, use the following PromQL query:

sumby(interruption_type,interruption_reason)(sum_over_time(kubernetes_io:node_interruption_count{monitored_resource="k8s_node"}[${__interval}]))

To only see thehost maintenance events,update the query to filter theHW/SW Maintenance value for theinterruption_reason. Use the following PromQL query:

sumby(interruption_type,interruption_reason)(sum_over_time(kubernetes_io:node_interruption_count{monitored_resource="k8s_node",interruption_reason="HW/SW Maintenance"}[${__interval}]))

To see the interruption count aggregated by node pool, use the following PromQL query:

sumby(node_pool_name,interruption_type,interruption_reason)(sum_over_time(kubernetes_io:node_pool_interruption_count{monitored_resource="k8s_node_pool",interruption_reason="HW/SW Maintenance",node_pool_name=NODE_POOL_NAME}[${__interval}]))

Node pool times to recover (TTR)

The followingGKE system metric reportsthe distribution of recovery period durations for GKE multi-host TPU node pools:

  • kubernetes.io/node_pool/accelerator/times_to_recover

Each sample recorded in this metric indicates a single recovery event for the node pool from a downtime period.

This metric is useful for tracking the multi-host TPU node pool time to recover and time between interruptions.

You can use the following PromQL query to calculate the mean time to recovery (MTTR) for the last 7 days in your cluster:

sum(sum_over_time(kubernetes_io:node_pool_accelerator_times_to_recover_sum{monitored_resource="k8s_node_pool",cluster_name="CLUSTER_NAME"}[7d]))/sum(sum_over_time(kubernetes_io:node_pool_accelerator_times_to_recover_count{monitored_resource="k8s_node_pool",cluster_name="CLUSTER_NAME"}[7d]))

Node pool times between interruptions (TBI)

Node pool times between interruptions measures how long your infrastructure runs before experiencing an interruption.It is computed as the average over a window of time, where the numerator measures the total time that your infrastructure was up and the denominator measures the total interruptions to your infrastructure.

The following PromQL example shows the 7-day mean time between interruptions (MTBI) for the given cluster:

sum(count_over_time(kubernetes_io:node_memory_total_bytes{monitored_resource="k8s_node",node_name=~"gke-tpu.*|gk3-tpu.*",cluster_name="CLUSTER_NAME"}[7d]))/sum(sum_over_time(kubernetes_io:node_interruption_count{monitored_resource="k8s_node",node_name=~"gke-tpu.*|gk3-tpu.*",cluster_name="CLUSTER_NAME"}[7d]))

Host metrics

In GKE version 1.28.1-gke.1066000 or later, VMs in a TPU sliceexport TPU utilization metrics as GKEsystem metrics.The following metrics are available in Cloud Monitoringto monitor your TPU host's performance:

  • TensorCore utilization: current percentage of the TensorCore that is utilized. The TensorCore value equals the sum of thematrix-multiply units (MXUs) plus the vector unit.The TensorCore utilization value is the division of the TensorCore operations that wereperformed over the past sample period (60 seconds) by thesupported number ofTensorCore operations over the same period.Larger value means better utilization.
  • Memory bandwidth utilization: current percentage of the acceleratormemory bandwidth that is being used. Computed by dividing the memory bandwidthused over a sample period (60s) by the maximum supported bandwidth over thesame sample period.

These metrics are located in the Kubernetes node (k8s_node) and Kubernetescontainer (k8s_container) schema.

Kubernetes container:

  • kubernetes.io/container/accelerator/tensorcore_utilization
  • kubernetes.io/container/accelerator/memory_bandwidth_utilization

Kubernetes node:

  • kubernetes.io/node/accelerator/tensorcore_utilization
  • kubernetes.io/node/accelerator/memory_bandwidth_utilization

For more information, seeKubernetes metricsandGKE system metrics.

Manage collection scheduling

In TPU Trillium, you can use collection scheduling to group TPU slice nodes.Grouping these TPU slice nodes makes it easier to adjust the number of replicas tomeet the workload demand. Google Cloud controls software updates to ensurethat sufficient slices within the collection are always available to serve traffic.

TPU Trillium supports collection scheduling for single-host and multi-host node poolsthat run inference workloads. The following describes how collection schedulingbehavior depends on the type of TPU slice that you use:

  • Multi-host TPU slice: GKE groupsmulti-host TPU slices to form a collection. EachGKE node pool is a replica withinthis collection. To define a collection, create a multi-host TPU sliceand assign a unique name to the collection. To add more TPU slicesto the collection, create another multi-host TPU slice node pool with the samecollection name and workload type.
  • Single-host TPU slice: GKE considers the entiresingle-host TPU slice node pool as a collection. To add more TPU slicesto the collection, you can resize the single-host TPU slice node pool.

To manage a collection, perform any of these actions based on the type of nodepool that you use.

Manage collection scheduling in multi-host TPU slice node pools

Use the following tasks to manage multi-host TPU slice node pools.

  • To check if a multi-host TPU slice pool is part of a collection, run the following command:

    gcloudcontainernode-poolsdescribeNODE_POOL_NAME\--locationLOCATION\--clusterCLUSTER_NAME\--format="json"|jq-r\'"nodepool-group-name: \(.config.labels["cloud.google.com/gke-nodepool-group-name"] // "")\ngke-workload-type: \(.config.labels["cloud.google.com/gke-workload-type"] // "")"'

    The output is similar to the following:

    nodepool-group-name: <code><var>NODE_POOL_COLLECTION_NAME</var></code>gke-workload-type: HIGH_AVAILABILITY

    If multi-host TPU slice pool is part of a collection, the output has the following labels:

    • cloud.google.com/gke-workload-type: HIGH_AVAILABILITY
    • cloud.google.com/gke-nodepool-group-name: <code><var>COLLECTION_NAME</var></code>
  • To get the list of collections in the cluster, run the following command:

    #!/bin/bash# Replace with your cluster name, project, and locationCLUSTER_NAME=CLUSTER_NAMEPROJECT=PROJECT_IDLOCATION=LOCATIONdeclare-Acollection_namesnode_pools=$(gcloudcontainernode-poolslist--cluster"$CLUSTER_NAME"--project"$PROJECT"--location"$LOCATION"--format="value(name)")# Iterate over each node poolforpoolin$node_pools;do# Describe the node pool and extract labels using jqcollection_name=$(gcloudcontainernode-poolsdescribe"$pool"\--cluster"$CLUSTER_NAME"\--project"$PROJECT"\--location"$LOCATION"\--format="json"|jq-r'.config.labels["cloud.google.com/gke-nodepool-group-name"]')# Add the collection name to the associative array if it's not emptyif[[-n"$collection_name"]];thencollection_names["$collection_name"]=1fidone# Print the unique node pool collection namesecho"Unique cloud.google.com/gke-nodepool-group-name values:"fornamein"${!collection_names[@]}";doecho"$name"done

    The output is similar to the following:

    Unique cloud.google.com/gke-nodepool-group-name values: {COLLECTION_NAME_1}, {COLLECTION_NAME_2}, {COLLECTION_NAME_3}
  • To get a list of node pools that belong to a collection, run the followingcommand:

    #!/bin/bashTARGET_COLLECTION_NAME=COLLECTION_NAMECLUSTER_NAME=CLUSTER_NAMEPROJECT=PROJECT_IDLOCATION=LOCATIONmatching_node_pools=()# Get the list of all node pools in the clusternode_pools=$(gcloudcontainernode-poolslist--cluster"$CLUSTER_NAME"--project"$PROJECT"--location"$LOCATION"--format="value(name)")# Iterate over each node poolforpoolin$node_pools;do# Get the value of the cloud.google.com/gke-nodepool-group-name labelcollection_name=$(gcloudcontainernode-poolsdescribe"$pool"\--cluster"$CLUSTER_NAME"\--project"$PROJECT"\--location"$LOCATION"\--format="json"|jq-r'.config.labels["cloud.google.com/gke-nodepool-group-name"]')# Check if the group name matches the target valueif[["$collection_name"=="$TARGET_COLLECTION_NAME"]];thenmatching_node_pools+=("$pool")fidone# Print the list of matching node poolsecho"Node pools with collection name '$TARGET_COLLECTION_NAME':"forpoolin"${matching_node_pools[@]}";doecho"$pool"done

    The output is similar to the following:

    Node pools with collection name 'COLLECTION_NAME':{NODE_POOL_NAME_1}{NODE_POOL_NAME_2}{NODE_POOL_NAME_3}
  • To scale up the collection, create another multi-host TPU slice node pool and add thecloud.google.com/gke-workload-type andcloud.google.com/gke-nodepool-group-name. Use the same collection name incloud.google.com/gke-nodepool-group-name and run the same workload type. If node auto-provisioning is enabled on the cluster, GKE automatically creates pools based on workload demands.

  • To scale down the collection,delete the node pool.

  • To delete the collection, remove all of the attached node pools. You candelete the node pool ordelete the cluster. Deleting the cluster removes all of the collections in it.

Manage collection scheduling in single-host TPU slice node pools

Use the following tasks to manage single-host TPU slice node pools.

  • To check if a single-host TPU slice pool has collection scheduling enabled, run the following command:

    gcloudcontainernode-poolsdescribeNODE_POOL_NAME\--clusterCLUSTER_NAME\--projectPROJECT_NAME\--locationLOCATION\--format="json"|jq-r'.config.labels["cloud.google.com/gke-workload-type"]'

    The output is similar to the following:

    gke-workload-type: HIGH_AVAILABILITY

    If the single-host TPU slice pool is part of a collection, the output has thecloud.google.com/gke-workload-type: HIGH_AVAILABILITY label.

  • To scale up the collection, resize the node poolmanually orautomatically with node auto-provisioning.

  • To scale down the collection,delete the node pool.

  • To delete the collection, remove all of the attached node pools. You candelete the node pool ordelete the cluster. Deleting the clusterremoves all of the collections in it.

Known issues

  • Cluster autoscaler might incorrectly calculate capacity for new TPU slice nodes beforethose nodes report available TPUs. Cluster autoscaler might then performadditional scale up and as a result create more nodes than needed. Clusterautoscaler scales down additional nodes, if they are not needed, afterregular scale down operation.
  • Cluster autoscaler cancels scaling up of TPU slice node pools that remain in waitingstatus for more than 10 hours. Cluster Autoscaler retries such scale upoperations later. This behavior might reduce TPU obtainability for customerswho don't use reservations.
  • Non-TPU workloads that have a toleration for the TPU taint can prevent scale down of thenode pool if they are being recreated during draining of the TPU slice node pool.
  • Memory bandwidth utilization metric is not available for v5e TPUs.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.