Run a pipeline with TPUs

This page explains how to run an Apache Beam pipeline onDataflow with TPUs. Jobs that use TPUs incur charges as specifiedin the Dataflowpricing page.

For more information about using TPUs with Dataflow, seeDataflow support for TPUs.

Optional: Make a specific reservation to use accelerators

While you can use TPUs on-demand, we strongly recommend you useDataflow TPUs withspecifically targeted Google Cloud Platformreservations.This helps to ensure you have access to available accelerators and quick workerstartup times. Pipelines that consume a TPU reservation don't require additional TPUquota.

If you don't make a reservation and choose to use TPUs on-demand,provision TPU quota before you run your pipeline.

Optional: Provision TPU quota

You can use TPUs in an on-demand capacity or using a reservation. If you want touse TPUs on-demand, you must provision TPU quota before you do. If you use aspecifically targeted reservation, you can skip this section.

To use TPUs on-demand without a reservation, check the limit andcurrent usage of your Compute Engine API quota for TPUs as follows:

Console

Go to theQuotas page in the Google Cloud console:
Go to Quotas

In theFilter box,do the following:

Use the following table to select and copy the property of the quotabased on the TPU version and machine type. For example, if you plan tocreate on-demand TPU v5e nodes whose machine type begins withct5lp-,enterName: TPU v5 Lite PodSlice chips.

TPU version, machine type begins with	Property and name of the quota for on-demand instances
TPU v5e, `ct5lp-`	Name: TPU v5 Lite PodSlice chips
TPU v5p, `ct5p-`	Name: TPU v5p chips
TPU v6e, `ct6e-`	Dimensions (e.g. location): tpu_family:CT6E

Select theDimensions (e.g. location) property and enterregion: followed by the name of the region in which you plan tostart your pipeline. For example, enterregion:us-west4 if you planto use the zoneus-west4-a. TPU quota is regional, so all zoneswithin the same region consume the same TPU quota.

Configure a custom container image

To interact with TPUs in Dataflow pipelines, you need to providesoftware that can operate on XLA devices in your pipeline runtime environment.This requires installing TPU libraries based on your pipeline needs andconfiguring environment variables based on the TPU device you use.

To customize the container image, install Apache Beam into anoff-the-shelf base imagethat has the necessary TPU libraries. Alternatively,install the TPU softwareinto the images publishedwith Apache Beam SDK releases.

To provide a custom container image, use thesdk_container_image pipelineoption. For more information, seeUse custom containers inDataflow.

When you use a TPU accelerator, you need to set the following environmentvariables in the container image.

ENVTPU_SKIP_MDS_QUERY=1# Don't query metadataENVTPU_HOST_BOUNDS=1,1,1# There's only one hostENVTPU_WORKER_HOSTNAMES=localhostENVTPU_WORKER_ID=0# Always 0 for single-host TPUs

Depending on the accelerator you use, the variables in the following table alsoneed to be set.

type	topology	Required Dataflow`worker_machine_type`	additional environment variables
tpu-v5-lite-podslice	1x1	ct5lp-hightpu-1t	TPU_ACCELERATOR_TYPE=v5litepod-1 TPU_CHIPS_PER_HOST_BOUNDS=1,1,1
tpu-v5-lite-podslice	2x2	ct5lp-hightpu-4t	TPU_ACCELERATOR_TYPE=v5litepod-4 TPU_CHIPS_PER_HOST_BOUNDS=2,2,1
tpu-v5-lite-podslice	2x4	ct5lp-hightpu-8t	TPU_ACCELERATOR_TYPE=v5litepod-8 TPU_CHIPS_PER_HOST_BOUNDS=2,4,1
tpu-v6e-slice	1x1	ct6e-standard-1t	TPU_ACCELERATOR_TYPE=v6e-1 TPU_CHIPS_PER_HOST_BOUNDS=1,1,1
tpu-v6e-slice	2x2	ct6e-standard-4t	TPU_ACCELERATOR_TYPE=v6e-4 TPU_CHIPS_PER_HOST_BOUNDS=2,2,1
tpu-v6e-slice	2x4	ct6e-standard-8t	TPU_ACCELERATOR_TYPE=v6e-8 TPU_CHIPS_PER_HOST_BOUNDS=2,4,1
tpu-v5p-slice	2x2x1	ct5p-hightpu-4t	TPU_ACCELERATOR_TYPE=v5p-8 TPU_CHIPS_PER_HOST_BOUNDS=2,2,1

A sample Dockerfile for the custom container image might look like the followingexample:

FROMpython:3.11-slimCOPY--from=apache/beam_python3.11_sdk:2.66.0/opt/apache/beam/opt/apache/beam# Configure the environment to access TPU deviceENVTPU_SKIP_MDS_QUERY=1ENVTPU_HOST_BOUNDS=1,1,1ENVTPU_WORKER_HOSTNAMES=localhostENVTPU_WORKER_ID=0# Configure the environment for the chosen accelerator.# Adjust according to the accelerator you use.ENVTPU_ACCELERATOR_TYPE=v5litepod-1ENVTPU_CHIPS_PER_HOST_BOUNDS=1,1,1# Install TPU software stack.RUNpipinstalljax[tpu]apache-beam[gcp]==2.66.0-fhttps://storage.googleapis.com/jax-releases/libtpu_releases.htmlENTRYPOINT["/opt/apache/beam/boot"]

Run your job with TPUs

The considerations for running a Dataflow job with TPUs includethe following:

Because TPU containers can be large, to avoid running out of disk space,increase the default boot disk size to 50 gigabytes or an appropriate sizeas required by your container image by using the--disk_size_gb pipeline option.
Limitintra-worker parallelism.

TPUs and worker parallelism

In the default configuration, Dataflow Python pipelines launchone Apache Beam SDK process per VMcore. TPU machine types have alarge number of vCPU cores, but only one process may perform computations on aTPU device. Additionally, a TPU device might be reserved by a process for thelifetime of the process. Therefore, you must limit intra-worker parallelism whenrunning a Dataflow TPU pipeline. To limit worker parallelism, usethe following guidance:

If your use case involves running inferences on a model, use the BeamRunInference API. For more information, seeLarge Language ModelInference inBeam.
If you cannot use the BeamRunInference API, use Beam'smulti-processsharedobjects to restrict certain operations to a single process.
If you cannot use the preceding recommendations and prefer to launchonlyone Python process perworker, set the--experiments=no_use_multiple_sdk_containers pipeline option.
You can further reduce the number of threads by usingthe--number_of_worker_harness_threads pipeline option if thatachieves better performance.

The following table lists the total compute resources per worker for each TPUconfiguration.

TPU type	topology	machine type	TPU chips	vCPU	RAM (GB)
tpu-v5-lite-podslice	1x1	ct5lp-hightpu-1t	1	24	48
tpu-v5-lite-podslice	2x2	ct5lp-hightpu-4t	4	112	192
tpu-v5-lite-podslice	2x4	ct5lp-hightpu-8t	8	224	384
tpu-v6e-slice	1x1	ct6e-standard-1t	1	44	176
tpu-v6e-slice	2x2	ct6e-standard-4t	4	180	720
tpu-v6e-slice	2x4	ct6e-standard-8t	8	360	1440
tpu-v5p-slice	2x2x1	ct5p-hightpu-4t	4	208	448