About GPU instances

This document describes the features and limitations of GPU virtual machine (VM)instances that run on Compute Engine.

To accelerate specific workloads on Compute Engine, you can either deploy anaccelerator-optimized instance that has attached GPUs, or attach GPUs to anN1 general-purpose instance. Compute Engine provides GPUs for yourinstances inpass-through mode. Pass-through mode providesyour instances with direct control over GPUs and their memory.

You can also use some GPU machine types onAI Hypercomputer. AI Hypercomputer is asupercomputing system that is optimized to support your artificial intelligence(AI) and machine learning (ML) workloads. This option is recommended for creating adensely allocated, performance-optimized infrastructure that has integrationsfor Google Kubernetes Engine (GKE) and Slurm schedulers.

Supported machine types

Compute Engine offers different machine types to support your variousworkloads.

Some machine types supportNVIDIA RTX Virtual Workstations (vWS).When you create an instance that uses NVIDIA RTX Virtual Workstation,Compute Engine automatically adds a vWS license. For information about pricingfor virtual workstations, see theGPU pricing page.

GPU machine types
AI and ML workloads	Graphics and visualization	Other GPU workloads
Accelerator-optimized A series machine types are designed for high performance computing (HPC), artificial intelligence (AI), and machine learning (ML) workloads. The later generation A series are ideal for pre-training and fine-tuning foundation models that involves large clusters of accelerators, while the A2 series can be used for training smaller models and single host inference. For these machine types, the GPU model is automatically attached to the instance.	Accelerator-optimized G series machine types are designed for workloads such as NVIDIA Omniverse simulation workloads, graphics-intensive applications, video transcoding, and virtual desktops. These machine types supportNVIDIA RTX Virtual Workstations (vWS). The G series can also be used for training smaller models and for single-host inference. For these machine types, the GPU model is automatically attached to the instance.	For N1 general-purpose machine types, except for the N1 shared-core (`f1-micro` and`g1-small`), you can attach a select set of GPU models. Some of these GPU models also support NVIDIA RTX Virtual Workstations (vWS).
A4X (NVIDIA GB200 Superchips) (`nvidia-gb200`) A4 (NVIDIA B200) (`nvidia-b200`) A3 Ultra (NVIDIA H200) (`nvidia-h200-141gb`) A3 Mega (NVIDIA H100) (`nvidia-h100-mega-80gb`) A3 High (NVIDIA H100) (`nvidia-h100-80gb`) A3 Edge (NVIDIA H100) (`nvidia-h100-80gb`) A2 Ultra (NVIDIA A100 80GB) (`nvidia-a100-80gb`) A2 Standard (NVIDIA A100) (`nvidia-a100-40gb`)	G4 (NVIDIA RTX PRO 6000) (`nvidia-rtx-pro-6000`) (`nvidia-rtx-pro-6000-vws`) G2 (NVIDIA L4) (`nvidia-l4`) (`nvidia-l4-vws`)	The following GPU models can be attached to N1 general-purpose machine types: NVIDIA T4 (`nvidia-tesla-t4`) (`nvidia-tesla-t4-vws`) NVIDIA P4 (`nvidia-tesla-p4`) (`nvidia-tesla-p4-vws`) NVIDIA V100 (`nvidia-tesla-v100`) NVIDIA P100 (`nvidia-tesla-p100`) (`nvidia-tesla-p100-vws`)

GPU machine types

AI and ML workloads Graphics and visualization Other GPU workloads

Accelerator-optimized A series machine types are designed for high performance computing (HPC), artificial intelligence (AI), and machine learning (ML) workloads.

The later generation A series are ideal for pre-training and fine-tuning foundation models that involves large clusters of accelerators, while the A2 series can be used for training smaller models and single host inference.

For these machine types, the GPU model is automatically attached to the instance.

Accelerator-optimized G series machine types are designed for workloads such as NVIDIA Omniverse simulation workloads, graphics-intensive applications, video transcoding, and virtual desktops. These machine types supportNVIDIA RTX Virtual Workstations (vWS).

The G series can also be used for training smaller models and for single-host inference.

For these machine types, the GPU model is automatically attached to the instance.

For N1 general-purpose machine types, except for the N1 shared-core (f1-micro andg1-small), you can attach a select set of GPU models. Some of these GPU models also support NVIDIA RTX Virtual Workstations (vWS).

A4X (NVIDIA GB200 Superchips)
(nvidia-gb200)
A4 (NVIDIA B200)
(nvidia-b200)
A3 Ultra (NVIDIA H200)
(nvidia-h200-141gb)
A3 Mega (NVIDIA H100)
(nvidia-h100-mega-80gb)
A3 High (NVIDIA H100)
(nvidia-h100-80gb)
A3 Edge (NVIDIA H100)
(nvidia-h100-80gb)
A2 Ultra (NVIDIA A100 80GB)
(nvidia-a100-80gb)
A2 Standard (NVIDIA A100)
(nvidia-a100-40gb)

G4 (NVIDIA RTX PRO 6000)
(nvidia-rtx-pro-6000)
(nvidia-rtx-pro-6000-vws)
G2 (NVIDIA L4)
(nvidia-l4)
(nvidia-l4-vws)

The following GPU models can be attached to N1 general-purpose machine types:

NVIDIA T4
(nvidia-tesla-t4)
(nvidia-tesla-t4-vws)
NVIDIA P4
(nvidia-tesla-p4)
(nvidia-tesla-p4-vws)
NVIDIA V100
(nvidia-tesla-v100)
NVIDIA P100
(nvidia-tesla-p100)
(nvidia-tesla-p100-vws)

GPUs on Spot VMs

You can add GPUs to your Spot VMsat lowerspot prices for the GPUs. GPUsattached to Spot VMs work like normal GPUs but persist only forthe life of the VM. Spot VMs with GPUs follow the samepreemption processas all Spot VMs.

Consider requesting dedicatedPreemptible GPU quota to use for GPUs onSpot VMs. For more information, seeQuotas for Spot VMs.

During maintenance events, Spot VMs with GPUs are preempted bydefault and cannot be automatically restarted. If you want to recreate yourVMs after they have been preempted, use amanaged instance group.Managed instance groups recreate your VM instances if the vCPU, memory, andGPU resources are available.

If you want a warning before your VMs are preempted, or want toconfigure your VMs to automatically restart after a maintenance event, usestandard VMs with a GPU. For standard VMs with GPUs,Compute Engine providesone hour advance noticebefore preemption.

Compute Engine does notcharge you for GPUs if their VMs are preempted in the firstminute after they start running.

To learn how to create Spot VMs with GPUs attached, readCreate a VM with attached GPUsandCreating Spot VMs.For example, seeCreate an A3 Ultra or A4 instance using Spot VMs.

GPUs on instances with predefined run times

Instances that use thestandard provisioning modeltypically can't usepreemptible allocation quotas. Preemptible quotas are fortemporary workloads and are usually more available. If your project doesn't havepreemptible quota, and you have never requested it, then all instances in yourproject consume standard allocation quotas.

If you request preemptible allocation quota, then instances that use the standardprovisioning model must meet all of the following criteria to consumepreemptible allocation quota:

The instances have GPUs attached.
The instances are configured to be automatically deleted after a predefined run timethrough themaxRunDuration orterminationTime field. For more information,see the following:
- Limit the run time of an instance
- Limit the run time of instances in a MIG
The instance isn't allowed to consume reservations. For more information, seePrevent compute instances from consuming reservations.

When you consume preemptible allocation for time-bound GPU workloads, you canbenefit from both uninterrupted run time and the high obtainability ofpreemptible allocation quota. For more information, seePreemptible quotas.

GPUs and Confidential VM

You can use a GPU with a Confidential VM instance that uses Intel TDXon A3 machine series. For more information, see Confidential VMsupported configurations.To learn how to create a Confidential VM instance with GPUs, seeCreate a Confidential VM instance with GPU.

GPUs and block storage

When you create an instance by using a GPU machine type, you can add persistentor temporary block storage to the instance.To store non-transient data, use persistent block storage likeHyperdisk orPersistent Diskbecause these disks are independent of the instance's lifecycle.Data on persistent storage can be retained even after you delete the instance.

For temporary scratch storage or caches, use temporary block storage by addingLocal SSD disks when you create the instance.

Persistent block storage with Persistent Disk and Hyperdisk volumes

You can attach Persistent Disk and selectHyperdiskvolumes to GPU-enabled instances.

For machine learning (ML) and serving workloads, use Hyperdisk MLvolumes, which offer high throughput and shorter data load times. Hyperdisk ML is amore cost-effective option for ML workloads because it offers lower GPU idletimes.

Hyperdisk ML volumes provide read-only multi-attach support, so you can attach thesame disk to multiple instances, giving each instance access to the same data.

For more informationabout the supported disk types for machine series that support GPUs, seetheN1 andaccelerator optimizedmachine series pages.

Local SSD disks

Local SSD disks provide fast, temporary storagefor caching, data processing, or other transient data. Local SSD disks providefast storage because they are physically attached to the server that hosts yourinstance. Local SSD disks provide temporary storage because the instance losesdata if it restarts.

Avoid storing data with strong persistency requirements on Local SSD disks.To store non-transient data, usepersistent storage instead.

Warning: For instances with GPUs, Compute Engine cannot recover data on anyLocal SSD disks attached to the instance if Compute Engine restarts theinstance for host maintenance events.

If you manually stop an instance with a GPU, you can preserve the Local SSDdata, with certain restrictions. See theLocal SSD documentation for moredetails.

For regional support for Local SSD with GPU types, seeLocal SSD availability.

GPUs and host maintenance

Compute Engine always stops instances with attached GPUs when itperforms maintenance events on the host server. If the instance has attachedLocal SSD disks, the instance loses the Local SSD data after it stops.

For information on handling maintenance events, seeHandling GPU host maintenance events.

Reserve GPU capacity

Reservations provide high assurance of capacity for zone-specific resources,including GPUs. You can use reservations to ensure that you have GPUs availablewhen you need to use them for performance-intensive applications. For thedifferent methods to reserve zone-specific resources in Compute Engine,seeChoose a reservation type.

Reservations are also required when you want toreceivecommitted use discounts (CUDs) for your GPUs.

GPU pricing

If you request Compute Engine to provision GPUs using thespot, flex-start, or reservation-bound provisioning model,then you get the GPUs at discounted prices, depending on the GPU type.You can also receive committed use discounts or sustained use discounts (onlywith N1 VMs) for your GPU usage.

For hourly and monthly pricing for GPUs, seeGPU pricing page.

Committed use discounts for GPUs

Resource-based commitments provide deep discounts forCompute Engine resources in return for committing to using the resourcesin a specific region for at least one year. You typically purchase commitmentsfor resources such as vCPUs, memory, GPUs, and Local SSD disks for use with aspecific machine series. When you use your resources, you receive qualifyingresource usage at discounted prices. To learn more about these discounts, seeResource-based committed use discounts.

To purchase a commitment with GPUs, youmust also reserve the GPUs and attach the reservations to your commitment.For more information about attaching reservations to commitments, seeAttach reservations to resource-based commitments.

Sustained use discounts for GPUs

Instances that use N1 machine types with attached GPUs receivesustained use discounts (SUDs),similar to vCPUs. When you select a GPU for a virtual workstation,Compute Engine automatically adds an NVIDIA RTX Virtual Workstationlicense to your instance.

GPU restrictions and limitations

For instances with attached GPUs, the following restrictions and limitationsapply:

Only accelerator-optimized (A4X, A4, A3, A2, G4, and G2) and general-purposeN1 machine types support GPUs.
To protect Compute Engine systems and users, new projects have aglobal GPU quota that limits the total number of GPUs you can create inany supported zone. When you request a GPU quota, you must request a quotafor the GPU models that you want to create in each region, and an additionalglobal quota for the total number of GPUs of all types in all zones.
Instances with one or more GPUs have a maximum number of vCPUs foreach GPU that you add to the instance. To seethe available vCPU and memory ranges for different GPU configurations,see theGPUs list.
GPUs require device drivers to function properly. NVIDIA GPUs that runon Compute Engine must use a minimum driver version. For more informationabout driver versions, seeRequired NVIDIA driver versions.
TheCompute Engine SLA coversinstances with an attached GPU model only if that attached GPU model isgenerally available.
For regions that have multiple zones, the Compute Engine SLA coversthe instance only if the GPU model is available in more than one zone withinthat region. For GPU models by region, seeAccelerator availability.
Compute Engine supports one concurrent user per GPU.
Also see thelimitations for each machine type with attached GPUs.

What's next?

Learn how tocreate instances with attached GPUs.
Learn how toadd or remove GPUs.
Learn how tocreate a Confidential VM instance with an attached GPU.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

About GPU instances Stay organized with collections Save and categorize content based on your preferences.