Troubleshooting JAX - TPU

This guide provides pointers to JAX troubleshooting information tohelp you identify and resolve problems you might encounter while trainingJAX models on Cloud TPU.

For a more general guide togetting started with Cloud TPU, see theJAX quickstart.

Note: If you aren't able to resolve your issue using this guide, seeGetting Support for further assistance.

General JAX issues

If you run into issues while developing your training model ortraining with JAX, see theJAX FAQ.

For more general programmingerrors you might encounter when writing a training application with JAX, seeJAX Errors.

Profile JAX performance

You can understand how your TPU resources are being utilized using thetools described inProfiling JAX performance.

Troubleshoot memory issues

Note: TPU memory (high bandwidth memory)is shared across TensorCores on each TPU chip. This enables efficientcoordination between the on-chip TensorCores.

You can monitor how the memory is used with theJAX device memory profiler,but you cannot directly manage how it is used.

The JAX device memory profiler can be used to:

You cannot specify how TPU memory is allocated for specific operations.For more information on JAX-specific TPU performance issues, seePerformance Notes for using TPUs with JAX.

Troubleshoot TPU issues

The following sections describe how to resolve some common issues you mightencounter when you run a JAX program on a TPU.

How can I verify that the TPU is running?

Everything will be run on the TPU as long as JAX doesn't print"No GPU/TPU found, falling back to CPU."

You can verify the TPU is active by either looking atjax.devices(), whereyou should see several TPU devices displayed, or verifyprogrammatically with:assert jax.devices()[0].platform == 'tpu'.

RuntimeError: Unable to initialize backend 'tpu': UNAVAILABLE: No TPU Platform available.

This runtime error message or finding the following in/tmp/tpu_logs/tpu_driver.WARNING on the TPU VM:W1118 17:40:20.985243 23901 tpu_version_flag.cc:57] No hardware is found. Using default TPU version:xxxxxxcan indicate that you are running the wrong TPU VM version.

Verify that you are running thecurrent JAX runtime version and retry.

Troubleshoot TPU and GKE issues

To help with troubleshooting, enable verbose logging in your GKEworkload manifest, and then provide the logs to GKE support.

TPU_MIN_LOG_LEVEL=0 TF_CPP_MIN_LOG_LEVEL=0 TPU_STDERR_LOG_LEVEL=0

The following sections describe error messages related to TPU andGKE setups and resolutions.

no endpoints available for service 'jobset-webhook-service'

This error means the jobset wasn't installed properly. Check to see ifjobset-controller-manager deployment Kubernetes Pods are running. For moreinformation, see theJobSet troubleshootingdocumentation.

TPU initialization failed: Failed to connect

Make sure your GKE node version is 1.30.4-gke.1348000 or later(GKE 1.31 is not supported).

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.