GPU machine types

This document outlines the NVIDIA GPU models available on Compute Engine,which you can use to accelerate machine learning (ML), data processing, andgraphics-intensive workloads on your virtual machine (VM) instances. Thisdocument also details which GPUs come pre-attached to accelerator-optimizedmachine series such as A4X, A4, A3, A2, G4, and G2, and which GPUs you can attachto N1 general-purpose instances.

Use this document to compare the performance, memory, and features of differentGPU models. For a more detailed overview of the accelerator-optimized machinefamily, including information on CPU platforms, storage options, and networkingcapabilities, and to find the specific machine type that matches your workload,seeAccelerator-optimized machine family.

For more information about GPUs on Compute Engine, seeAbout GPUs.

To view available regions and zones for GPUs on Compute Engine, seeGPUs regions and zone availability.

GPU machine types

Compute Engine offers different machine types to support your variousworkloads.

Some machine types supportNVIDIA RTX Virtual Workstations (vWS).When you create an instance that uses NVIDIA RTX Virtual Workstation,Compute Engine automatically adds a vWS license. For information about pricingfor virtual workstations, see theGPU pricing page.

GPU machine types
AI and ML workloadsGraphics and visualizationOther GPU workloads
Accelerator-optimized A series machine types are designed for high performance computing (HPC), artificial intelligence (AI), and machine learning (ML) workloads.

The later generation A series are ideal for pre-training and fine-tuning foundation models that involves large clusters of accelerators, while the A2 series can be used for training smaller models and single host inference.

For these machine types, the GPU model is automatically attached to the instance.

Accelerator-optimized G series machine types are designed for workloads such as NVIDIA Omniverse simulation workloads, graphics-intensive applications, video transcoding, and virtual desktops. These machine types supportNVIDIA RTX Virtual Workstations (vWS).

The G series can also be used for training smaller models and for single-host inference.

For these machine types, the GPU model is automatically attached to the instance.

For N1 general-purpose machine types, except for the N1 shared-core (f1-micro andg1-small), you can attach a select set of GPU models. Some of these GPU models also support NVIDIA RTX Virtual Workstations (vWS).

  • A4X (NVIDIA GB200 Superchips)
    (nvidia-gb200)
  • A4 (NVIDIA B200)
    (nvidia-b200)
  • A3 Ultra (NVIDIA H200)
    (nvidia-h200-141gb)
  • A3 Mega (NVIDIA H100)
    (nvidia-h100-mega-80gb)
  • A3 High (NVIDIA H100)
    (nvidia-h100-80gb)
  • A3 Edge (NVIDIA H100)
    (nvidia-h100-80gb)
  • A2 Ultra (NVIDIA A100 80GB)
    (nvidia-a100-80gb)
  • A2 Standard (NVIDIA A100)
    (nvidia-a100-40gb)
  • G4 (NVIDIA RTX PRO 6000)
    (nvidia-rtx-pro-6000)
    (nvidia-rtx-pro-6000-vws)
  • G2 (NVIDIA L4)
    (nvidia-l4)
    (nvidia-l4-vws)
The following GPU models can be attached to N1 general-purpose machine types:
  • NVIDIA T4
    (nvidia-tesla-t4)
    (nvidia-tesla-t4-vws)
  • NVIDIA P4
    (nvidia-tesla-p4)
    (nvidia-tesla-p4-vws)
  • NVIDIA V100
    (nvidia-tesla-v100)
  • NVIDIA P100
    (nvidia-tesla-p100)
    (nvidia-tesla-p100-vws)

You can also use some GPU machine types onAI Hypercomputer. AI Hypercomputer is asupercomputing system that is optimized to support your artificial intelligence(AI) and machine learning (ML) workloads. This option is recommended for creating adensely allocated, performance-optimized infrastructure that has integrationsfor Google Kubernetes Engine (GKE) and Slurm schedulers.

A4X machine series

A4X accelerator-optimized machine types use NVIDIA GB200 Grace Blackwell Superchips (nvidia-gb200) and are ideal for foundation model training and serving.

A4X is an exascale platform based onNVIDIA GB200 NVL72. Each machine has two sockets with NVIDIA Grace CPUs with Arm Neoverse V2 cores. These CPUs are connected to four NVIDIA B200 Blackwell GPUs with fast chip-to-chip (NVLink-C2C) communication.

Tip: When provisioning A4X instances, you mustreserve capacity to create instances and cluster. You can then create instances that use the features and services available from AI Hypercomputer. For more information, seeDeployment options overview in the AI Hypercomputer documentation.
Attached NVIDIA GB200 Grace Blackwell Superchips
Machine typevCPU count1Instance memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)2GPU countGPU memory3
(GB HBM3e)
a4x-highgpu-4g14088412,00062,0004744

1A vCPU is implemented as a single hardware hyper-thread on one ofthe availableCPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actualegress bandwidth depends on the destination IP address and other factors.For more information about network bandwidth,seeNetwork bandwidth.
3GPU memory is the memory on a GPU device that can be used fortemporary storage of data. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.

A4 machine series

A4 accelerator-optimizedmachine types haveNVIDIA B200 Blackwell GPUs(nvidia-b200) attached and are ideal for foundation modeltraining and serving.

Tip: When provisioning A4 machine types, you mustreserve capacity to create instances or clusters, use Spot VMs, useFlex-start VMs, or create a resize request in a MIG. For instructions on how to create A4instances, seeCreate an A3 Ultra or A4 instance. .
Attached NVIDIA B200 Blackwell GPUs
Machine typevCPU count1Instance memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)2GPU countGPU memory3
(GB HBM3e)
a4-highgpu-8g2243,96812,000103,60081,440

1A vCPU is implemented as a single hardware hyper-thread on one ofthe availableCPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actualegress bandwidth depends on the destination IP address and other factors.For more information about network bandwidth, seeNetwork bandwidth.
3GPU memory is the memory on a GPU device that can be used fortemporary storage of data. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.

A3 machine series

A3 accelerator-optimizedmachine types have NVIDIA H100 SXM or NVIDIA H200 SXM GPUs attached.

A3 Ultra machine type

A3 Ultramachine types haveNVIDIA H200 SXM GPUs(nvidia-h200-141gb) attached and provides the highest networkperformance in the A3 series. A3 Ultra machine types are ideal for foundation model training andserving.

Tip: When provisioning A3 Ultra machinetypes, you must reserve capacity to create instances or clusters, use Spot VMs, useFlex-start VMs, or create a resize request in a MIG. For more information about theparameters to set when creating an A3 Ultra instance, seeCreate an A3 Ultra or A4 instance.
Attached NVIDIA H200 GPUs
Machine typevCPU count1Instance memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)2GPU countGPU memory3
(GB HBM3e)
a3-ultragpu-8g2242,95212,000103,60081128

1A vCPU is implemented as a single hardware hyper-thread on one ofthe availableCPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actualegress bandwidth depends on the destination IP address and other factors.For more information about network bandwidth,seeNetwork bandwidth.
3GPU memory is the memory on a GPU device that can be used fortemporary storage of data. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.

A3 Mega, High, and Edge machine types

To useNVIDIA H100 SXM GPUs,you have the following options:

  • A3 Mega: thesemachine types have H100 SXM GPUs (nvidia-h100-mega-80gb)and are ideal for large-scale training and serving workloads.
  • A3 High: thesemachine types have H100 SXM GPUs (nvidia-h100-80gb) and arewell-suited for both training and serving tasks.
  • A3 Edge: thesemachine types have H100 SXM GPUs (nvidia-h100-80gb),are designed specifically for serving, and are available in alimited set of regions.

A3 Mega

A3 High

Tip: When provisioninga3-highgpu-1g,a3-highgpu-2g, ora3-highgpu-4g machine types,you must create instances by using Spot VMs orFlex-start VMs. For detailed instructions on these options, review the following:
Attached NVIDIA H100 GPUs
Machine typevCPU count1Instance memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)2GPU countGPU memory3
(GB HBM3)
a3-highgpu-1g26234750125180
a3-highgpu-2g524681,5001502160
a3-highgpu-4g1049363,00011004320
a3-highgpu-8g2081,8726,00051,0008640

A3 Edge

Tip: To get started with A3 Edge instances, seeCreate an A3 VM with GPUDirect-TCPX enabled.
Attached NVIDIA H100 GPUs
Machine typevCPU count1Instance memory (GB)Attached Local SSD (GiB)Physical NIC countMaximum network bandwidth (Gbps)2GPU countGPU memory3
(GB HBM3)
a3-edgegpu-8g2081,8726,0005
  • 800:for asia-south1 and northamerica-northeast2
  • 400:for all otherA3 Edge regions
8640

1A vCPU is implemented as a single hardware hyper-thread on one ofthe availableCPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actualegress bandwidth depends on the destination IP address and other factors.For more information about network bandwidth,seeNetwork bandwidth.
3GPU memory is the memory on a GPU device that can be used fortemporary storage of data. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.

A2 machine series

A2 accelerator-optimizedmachine types haveNVIDIA A100 GPUsattached and are ideal for model fine tuning, large modeland cost optimized inference.

A2 machine series are available in two types:

  • A2 Ultra: these machine types have A100 80GB GPUs(nvidia-a100-80gb) and Local SSD disks attached.
  • A2 Standard: these machine types have A100 40GB GPUs(nvidia-tesla-a100) attached. You can also add LocalSSD disks when creating an A2 Standard instance. For the number of disksyou can attach, seeMachine types that require you to choose a number of Local SSD disks.

A2 Ultra

Attached NVIDIA A100 80GB GPUs
Machine typevCPU count1Instance memory (GB)Attached Local SSD (GiB)Maximum network bandwidth (Gbps)2GPU countGPU memory3
(GB HBM2e)
a2-ultragpu-1g1217037524180
a2-ultragpu-2g24340750322160
a2-ultragpu-4g486801,500504320
a2-ultragpu-8g961,3603,0001008640

A2 Standard

Attached NVIDIA A100 40GB GPUs
Machine typevCPU count1Instance memory (GB)Local SSD supportedMaximum network bandwidth (Gbps)2GPU countGPU memory3
(GB HBM2)
a2-highgpu-1g1285Yes24140
a2-highgpu-2g24170Yes32280
a2-highgpu-4g48340Yes504160
a2-highgpu-8g96680Yes1008320
a2-megagpu-16g961,360Yes10016640

1A vCPU is implemented as a single hardware hyper-thread on one ofthe availableCPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actualegress bandwidth depends on the destination IP address and other factors.For more information about network bandwidth,seeNetwork bandwidth.
3GPU memory is the memory on a GPU device that can be used fortemporary storage of data. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.

G4 machine series

G4 accelerator-optimized machine types use NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs (nvidia-rtx-pro-6000) and are suitable for NVIDIA Omniverse simulation workloads, graphics-intensive applications, video transcoding, and virtual desktops. G4 machine types also provide a low-cost solution for performing single host inference and model tuning compared with A series machine types.

A key feature of the G4 series is support for direct GPU peer-to-peer (P2P) communication on multi-GPU machine types (g4-standard-96,g4-standard-192,g4-standard-384). This allows GPUs within the same instance to exchange data directly over the PCIe bus, without involving the CPU host. For more information about G4 GPU peer-to-peer communication, seeG4 GPU peer-to-peer communication.

Important: For information on how to get started withG4 machine types, contact your Google account team.
Attached NVIDIA RTX PRO 6000 GPUs
Machine typevCPU count1Instance memory (GB)Maximum Titanium SSD supported (GiB)2Physical NIC countMaximum network bandwidth (Gbps)3GPU countGPU memory4
(GB GDDR7)
g4-standard-48481801,500150196
g4-standard-96963603,00011002192
g4-standard-1921927206,00012004384
g4-standard-3843841,44012,00024008768

1A vCPU is implemented as a single hardware hyper-thread on one ofthe availableCPU platforms.
2You can add Titanium SSD disks when creating a G4 instance. For the number of disksyou can attach, seeMachine types that require you to choose a number of Local SSD disks.
3Maximum egress bandwidth cannot exceed the number given. Actualegress bandwidth depends on the destination IP address and other factors.SeeNetwork bandwidth.
4GPU memory is the memory on a GPU device that can be used fortemporary storage of data. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.

G2 machine series

G2 accelerator-optimizedmachine types haveNVIDIA L4 GPUsattached and are ideal for cost-optimized inference, graphics-intensive andhigh performance computing workloads.

Each G2 machine type also has a default memory and a custommemory range. The custom memory range defines the amount of memory thatyou can allocate to your instance for each machine type. You can also add LocalSSD disks when creating a G2 instance. For the number of disksyou can attach, seeMachine types that require you to choose a number of Local SSD disks.

Attached NVIDIA L4 GPUs
Machine typevCPU count1Default instance memory (GB)Custom instance memory range (GB)Max Local SSD supported (GiB)Maximum network bandwidth (Gbps)2GPU countGPU memory3 (GB GDDR6)
g2-standard-441616 to 3237510124
g2-standard-883232 to 5437516124
g2-standard-12124848 to 5437516124
g2-standard-16166454 to 6437532124
g2-standard-24249696 to 10875032248
g2-standard-323212896 to 12837532124
g2-standard-4848192192 to 2161,50050496
g2-standard-9696384384 to 4323,0001008192

1A vCPU is implemented as a single hardware hyper-thread on one ofthe availableCPU platforms.
2Maximum egress bandwidth cannot exceed the number given. Actualegress bandwidth depends on the destination IP address and other factors.For more information about network bandwidth,seeNetwork bandwidth.
3GPU memory is the memory on a GPU device that can be used fortemporary storage of data. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.

N1 machine series

You can attach the following GPU models to anN1 machine typewith the exception of theN1 shared-core machine types.

Unlike the machine types in the accelerator-optimized machine series, N1 machinetypes don't come with a set number of attached GPUs. Instead, you specify thenumber of GPUs to attach when creating the instance.

N1 instances with fewer GPUs limit the maximum number of vCPUs. In general, ahigher number of GPUs lets you create instances with a higher number of vCPUsand memory.

N1+T4 GPUs

You can attachNVIDIA T4GPUs to N1 general-purpose instances with the following instance configurations.

Accelerator typeGPU countGPU memory1 (GB GDDR6)vCPU countInstance memory (GB)Local SSD supported
nvidia-tesla-t4 or
nvidia-tesla-t4-vws
1161 to 481 to 312Yes
2321 to 481 to 312Yes
4641 to 961 to 624Yes

1GPU memory is the memory available on a GPU device that you can usefor temporary data storage. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.

N1+P4 GPUs

You can attachNVIDIA P4GPUs to N1 general-purpose instances with the following instance configurations.

Accelerator typeGPU countGPU memory1 (GB GDDR5)vCPU countInstance memory (GB)Local SSD supported2
nvidia-tesla-p4 or
nvidia-tesla-p4-vws
181 to 241 to 156Yes
2161 to 481 to 312Yes
4321 to 961 to 624Yes

1GPU memory is the memory that is available on a GPU devicethat you can use for temporary data storage. It is separate from the instance'smemory and is specifically designed to handle the higher bandwidth demands ofyour graphics-intensive workloads.
2For instances with attached NVIDIA P4 GPUs, Local SSD disksare only supported in zonesus-central1-c andnorthamerica-northeast1-b.

N1+V100 GPUs

You can attachNVIDIA V100GPUs to N1 general-purpose instances with the following instance configurations.

Accelerator typeGPU countGPU memory1 (GB HBM2)vCPU countInstance memory (GB)Local SSD supported2
nvidia-tesla-v1001161 to 121 to 78Yes
2321 to 241 to 156Yes
4641 to 481 to 312Yes
81281 to 961 to 624Yes

1GPU memory is the memory available on a GPU device that you can usefor temporary data storage. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.
2For instances with attached NVIDIA V100 GPUs, Local SSD disksaren't supported inus-east1-c.

N1+P100 GPUs

You can attachNVIDIA P100 GPUsto N1 general-purpose instances with the following instance configurations.

For some NVIDIA P100 GPUs, the maximum CPU and memory available for someconfigurations depends on the zone in which the GPU resource runs.

Accelerator typeGPU countGPU memory1 (GB HBM2)ZonevCPU countInstance memory (GB)Local SSD supported
nvidia-tesla-p100 or
nvidia-tesla-p100-vws
116All P100 zones1 to 161 to 104Yes
232All P100 zones1 to 321 to 208Yes
464us-east1-c,
europe-west1-d,
europe-west1-b
1 to 641 to 208Yes
All other P100 zones1 to 961 to 624Yes

1GPU memory is the memory available on a GPU device that you can usefor temporary data storage. It is separate from the instance's memory and isspecifically designed to handle the higher bandwidth demands of yourgraphics-intensive workloads.

General comparison chart

The following table describes the GPU memory size, feature availability, andideal workload types of different GPU models that are available onCompute Engine.

GPU modelGPU memoryInterconnectNVIDIA RTX Virtual Workstation (vWS) supportBest used for
GB200186 GB HBM3e @ 8 TBpsNVLink Full Mesh @ 1,800 GBpsLarge-scale distributed training and inference of LLMs, Recommenders, HPC
B200180 GB HBM3e @ 8 TBpsNVLink Full Mesh @ 1,800 GBpsLarge-scale distributed training and inference of LLMs, Recommenders, HPC
H200141 GB HBM3e @ 4.8 TBpsNVLink Full Mesh @ 900 GBpsLarge models with massive data tables for ML Training, Inference, HPC, BERT, DLRM
H10080 GB HBM3 @ 3.35 TBpsNVLink Full Mesh @ 900 GBpsLarge models with massive data tables for ML Training, Inference, HPC, BERT, DLRM
A100 80GB80 GB HBM2e @ 1.9 TBpsNVLink Full Mesh @ 600 GBpsLarge models with massive data tables for ML Training, Inference, HPC, BERT, DLRM
A100 40GB40 GB HBM2 @ 1.6 TBpsNVLink Full Mesh @ 600 GBpsML Training, Inference, HPC
RTX PRO 600096 GB GDDR7 with ECC @ 1597 GBpsN/AML Inference, Training, Remote Visualization Workstations,Video Transcoding, HPC
L424 GB GDDR6 @ 300 GBpsN/AML Inference, Training, Remote Visualization Workstations,Video Transcoding, HPC
T416 GB GDDR6 @ 320 GBpsN/AML Inference, Training, Remote Visualization Workstations, Video Transcoding
V10016 GB HBM2 @ 900 GBpsNVLink Ring @ 300 GBpsML Training, Inference, HPC
P48 GB GDDR5 @ 192 GBpsN/ARemote Visualization Workstations, ML Inference, and Video Transcoding
P10016 GB HBM2 @ 732 GBpsN/AML Training, Inference, HPC, Remote Visualization Workstations

To compare GPU pricing for the different GPU models and regions that areavailable on Compute Engine, seeGPU pricing.

Performance comparison chart

The following table describes the performance specifications of different GPUmodels that are available on Compute Engine.

Compute performance

GPU modelFP64FP32FP16INT8
GB20090 TFLOPS180 TFLOPS
B20040 TFLOPS80 TFLOPS
H20034 TFLOPS67 TFLOPS
H10034 TFLOPS67 TFLOPS
A100 80GB9.7 TFLOPS19.5 TFLOPS
A100 40GB9.7 TFLOPS19.5 TFLOPS
L40.5 TFLOPS130.3 TFLOPS
T40.25 TFLOPS18.1 TFLOPS
V1007.8 TFLOPS15.7 TFLOPS
P40.2 TFLOPS15.5 TFLOPS22 TOPS2
P1004.7 TFLOPS9.3 TFLOPS18.7 TFLOPS

1To allow FP64 code to work correctly, the T4, L4, and P4 GPUarchitecture includes a small number of FP64 hardware units.
2TeraOperations per Second.

Tensor core performance

GPU modelFP64TF32Mixed-precision FP16/FP32INT8INT4FP8
GB20090 TFLOPS2,500 TFLOPS25,000 TFLOPS1, 210,000 TFLOPS220,000 TFLOPS210,000 TFLOPS2
B20040 TFLOPS1,100 TFLOPS24,500 TFLOPS1, 29,000 TFLOPS29,000 TFLOPS2
H20067 TFLOPS989 TFLOPS21,979 TFLOPS1, 23,958 TOPS23,958 TFLOPS2
H10067 TFLOPS989 TFLOPS21,979 TFLOPS1, 23,958 TOPS23,958 TFLOPS2
A100 80GB19.5 TFLOPS156 TFLOPS312 TFLOPS1624 TOPS1248 TOPS
A100 40GB19.5 TFLOPS156 TFLOPS312 TFLOPS1624 TOPS1248 TOPS
L4120 TFLOPS2242 TFLOPS1, 2485 TOPS2485 TFLOPS2
T465 TFLOPS130 TOPS260 TOPS
V100125 TFLOPS
P4
P100

1For mixed precision training, NVIDIA GB200, B200, H200, H100,A100, and L4 GPUs also support thebfloat16 data type.
2NVIDIA GB200, B200, H200, H100, and L4 GPUssupport structural sparsity. You can use structural sparsity to double the performanceof your models. The values that are documented apply when using structured sparsity.If you aren't using structured sparsity, the values are halved.

What's next?

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.