TPU7x (Ironwood)
Preview
This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
This page describes the architecture and available configurations for TPU7x, thelatest TPU available on Google Cloud.TPU7x is the first release within the Ironwood family, Google Cloud's seventhgeneration TPU. The Ironwood generation is designed for large-scale AI trainingand inference.
With a 9,216-chip footprint per Pod, TPU7x shares many similarities withTPUv5p. TPU7x provides high performance for large scale dense andMoE models, pre-training, sampling and decode-heavy inference.
To use TPU7x, you must use Google Kubernetes Engine (GKE). For more information, seeAbout TPUs in GKE.
You can also use TPU7x and GKE with TPU Cluster Director. TPUCluster Director is available through an All Capacitymode reservation, which gives you full access to all of your reserved capacity(no hold-backs) and full visibility into the TPU hardware topology, utilizationstatus, and health status. For more information, seeAll Capacity modeoverview.
To get access to TPU7x, contact your account team.
Note: You can use the JAX framework on TPU7x. TensorFlowis not supported.System architecture
Each TPU7x chip contains two TensorCores and four SparseCores. The followingtable shows the key specifications and their values for TPU7x compared to priorgenerations.
| Specification | v5p | v6e (Trillium) | TPU7x (Ironwood) |
|---|---|---|---|
| Number of chips per pod | 8960 | 256 | 9216 |
| Peak compute per chip (BF16) (TFLOPs) | 459 | 918 | 2307 |
| Peak compute per chip (FP8) (TFLOPs) | 459 | 918 | 4614 |
| HBM capacity per chip (GiB) | 95 | 32 | 192 |
| HBM bandwidth per chip (GBps) | 2765 | 1638 | 7380 |
| Number of vCPUs (4-chip VM) | 208 | 180 | 224 |
| RAM (GB) (4-chip VM) | 448 | 720 | 960 |
| Number of TensorCores per chip | 2 | 1 | 2 |
| Number of SparseCores per chip | 4 | 2 | 4 |
| Bidirectional inter-chip interconnect (ICI) bandwidth per chip (GBps) | 1200 | 800 | 1200 |
| Data center network (DCN) bandwidth per chip (Gbps) | 50 | 100 | 100 |
The following diagram illustrates the architecture of Ironwood:

Dual-chiplet architecture
The Ironwood programming model lets you access two TPU devices instead of thesingle logical core (also known asMegaCore)architecture used in previous generations (TPU v4 and v5p). This change improvesthe cost-effectiveness and efficiency of manufacturing the chip. While thisrepresents an architectural shift, the new design ensures that you can reuseexisting software models with minimal changes.
Ironwood TPUs are composed of two distinct chiplets. This is a departure fromthe unified memory space of the MegaCore architecture.
Chiplet composition: Each chiplet is a self-contained unit with oneTensorCore, two SparseCores, and 96 GB of high-bandwidth memory (HBM).
High-speed interconnect: The two chiplets are connected by a die-to-die(D2D) interface that is six times faster than a 1D inter-chip interconnect(ICI) link. Inter-chiplet communication is managed using collectiveoperations.
Programming model and framework exposure
The programming model for Ironwood is similar to that of TPU generations earlierthan v4, such as TPU v3. The new architecture is exposed in the following ways:
Two devices per chip: Frameworks like JAX expose each Ironwood chipas two separate "devices," one for each chiplet.
4D topology: JAX adds a fourth dimension to the topology tospecify which of the two on-chip devices to use. This lets you use existingsoftware models with minimal modification.
For more information about achieving optimal performance with the dual-chipletarchitecture, seePerformance recommendations for Ironwood's dual-chipletarchitecture
Supported configurations
TPU7x chips have a direct connection to the nearest neighboring chips in 3dimensions, resulting in a 3D mesh of networking connections. Slices larger than64 chips are made up of one or more 4x4x4 "cubes" of chips.
The following table shows common 3D slice shapes that are supported for TPU7x:
| Topology | TPU chips | Hosts | VMs | Cubes | Scope |
|---|---|---|---|---|---|
| 2x2x1 | 4 | 1 | 1 | 1/16 | Single-host |
| 2x2x2 | 8 | 2 | 2 | 1/8 | Multi-host |
| 2x2x4 | 16 | 4 | 4 | 1/4 | Multi-host |
| 2x4x4 | 32 | 8 | 8 | 1/2 | Multi-host |
| 4x4x4 | 64 | 16 | 16 | 1 | Multi-host |
| 4x4x8 | 128 | 32 | 32 | 2 | Multi-host |
| 4x8x8 | 256 | 64 | 64 | 4 | Multi-host |
| 8x8x8 | 512 | 128 | 128 | 8 | Multi-host |
| 8x8x16 | 1024 | 256 | 256 | 16 | Multi-host |
| 8x16x16 | 2048 | 512 | 512 | 32 | Multi-host |
tpu7x-standard-4t machine type.TPU7x VM
Each TPU7x virtual machine (VM) contains 4 chips. Each VM has access to two NUMAnodes. For more information about NUMA nodes, seeNon-uniform memoryaccess on Wikipedia.
All TPU7x slices use full-host, 4-chip VMs. The technical specifications for aTPU7x VM are:
- Number of vCPUs per VM: 224
- RAM per VM: 960 GB
- Number of NUMA nodes per VM: 2
Hyperdisk
By default, the VM boot disk for TPU7x is Hyperdisk Balanced. You can attach additionalHyperdisk Balanced disks to your TPU VM for additional storage.
For more information about Hyperdisk, seeHyperdisk overview. For moreinformation about storage options for Cloud TPU, seeStorage options forCloud TPU data.
What's next
- Use TPU7x with GKE
- Use TPU7x with TPU Cluster Director
- Use the Google Cloud ML Diagnostics platform to optimize and diagnose yourworkloads
- Run a training workload using a recipe optimized for TPU7x
- Run a TPU7x microbenchmark
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.