TPU7x (Ironwood)

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

This page describes the architecture and available configurations for TPU7x, thelatest TPU available on Google Cloud.TPU7x is the first release within the Ironwood family, Google Cloud's seventhgeneration TPU. The Ironwood generation is designed for large-scale AI trainingand inference.

With a 9,216-chip footprint per Pod, TPU7x shares many similarities withTPUv5p. TPU7x provides high performance for large scale dense andMoE models, pre-training, sampling and decode-heavy inference.

To use TPU7x, you must use Google Kubernetes Engine (GKE). For more information, seeAbout TPUs in GKE.

You can also use TPU7x and GKE with TPU Cluster Director. TPUCluster Director is available through an All Capacitymode reservation, which gives you full access to all of your reserved capacity(no hold-backs) and full visibility into the TPU hardware topology, utilizationstatus, and health status. For more information, seeAll Capacity modeoverview.

To get access to TPU7x, contact your account team.

Note: You can use the JAX framework on TPU7x. TensorFlowis not supported.

System architecture

Each TPU7x chip contains two TensorCores and four SparseCores. The followingtable shows the key specifications and their values for TPU7x compared to priorgenerations.

Specification	v5p	v6e (Trillium)	TPU7x (Ironwood)
Number of chips per pod	8960	256	9216
Peak compute per chip (BF16) (TFLOPs)	459	918	2307
Peak compute per chip (FP8) (TFLOPs)	459	918	4614
HBM capacity per chip (GiB)	95	32	192
HBM bandwidth per chip (GBps)	2765	1638	7380
Number of vCPUs (4-chip VM)	208	180	224
RAM (GB) (4-chip VM)	448	720	960
Number of TensorCores per chip	2	1	2
Number of SparseCores per chip	4	2	4
Bidirectional inter-chip interconnect (ICI) bandwidth per chip (GBps)	1200	800	1200
Data center network (DCN) bandwidth per chip (Gbps)	50	100	100

The following diagram illustrates the architecture of Ironwood:

Dual-chiplet architecture

The Ironwood programming model lets you access two TPU devices instead of thesingle logical core (also known as MegaCore)architecture used in previous generations (TPU v4 and v5p). This change improvesthe cost-effectiveness and efficiency of manufacturing the chip. While thisrepresents an architectural shift, the new design ensures that you can reuseexisting software models with minimal changes.

Ironwood TPUs are composed of two distinct chiplets. This is a departure fromthe unified memory space of the MegaCore architecture.

Chiplet composition: Each chiplet is a self-contained unit with oneTensorCore, two SparseCores, and 96 GB of high-bandwidth memory (HBM).
High-speed interconnect: The two chiplets are connected by a die-to-die(D2D) interface that is six times faster than a 1D inter-chip interconnect(ICI) link. Inter-chiplet communication is managed using collectiveoperations.

Programming model and framework exposure

The programming model for Ironwood is similar to that of TPU generations earlierthan v4, such as TPU v3. The new architecture is exposed in the following ways:

Two devices per chip: Frameworks like JAX expose each Ironwood chipas two separate "devices," one for each chiplet.
4D topology: JAX adds a fourth dimension to the topology tospecify which of the two on-chip devices to use. This lets you use existingsoftware models with minimal modification.

For more information about achieving optimal performance with the dual-chipletarchitecture, seePerformance recommendations for Ironwood's dual-chipletarchitecture

Supported configurations

TPU7x chips have a direct connection to the nearest neighboring chips in 3dimensions, resulting in a 3D mesh of networking connections. Slices larger than64 chips are made up of one or more 4x4x4 "cubes" of chips.

The following table shows common 3D slice shapes that are supported for TPU7x:

Topology	TPU chips	Hosts	VMs	Cubes	Scope
2x2x1	4	1	1	1/16	Single-host
2x2x2	8	2	2	1/8	Multi-host
2x2x4	16	4	4	1/4	Multi-host
2x4x4	32	8	8	1/2	Multi-host
4x4x4	64	16	16	1	Multi-host
4x4x8	128	32	32	2	Multi-host
4x8x8	256	64	64	4	Multi-host
8x8x8	512	128	128	8	Multi-host
8x8x16	1024	256	256	16	Multi-host
8x16x16	2048	512	512	32	Multi-host

Note: All TPU7x topologies use thetpu7x-standard-4t machine type.

TPU7x VM

Each TPU7x virtual machine (VM) contains 4 chips. Each VM has access to two NUMAnodes. For more information about NUMA nodes, see Non-uniform memoryaccess on Wikipedia.

All TPU7x slices use full-host, 4-chip VMs. The technical specifications for aTPU7x VM are:

Number of vCPUs per VM: 224
RAM per VM: 960 GB
Number of NUMA nodes per VM: 2

Hyperdisk

By default, the VM boot disk for TPU7x is Hyperdisk Balanced. You can attach additionalHyperdisk Balanced disks to your TPU VM for additional storage.

For more information about Hyperdisk, seeHyperdisk overview. For moreinformation about storage options for Cloud TPU, seeStorage options forCloud TPU data.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換