GPU support for services

Preview — NVIDIA RTX PRO 6000 Blackwell GPU

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

This page describes GPU configuration for your Cloud Run services.GPUs work well for AI inference workloads, such as large language models (LLMs)or other compute intensive non-AI use cases such as video transcoding and 3Drendering.Google provides NVIDIA L4 GPUs with 24 GB of GPU memory (VRAM)and NVIDIA RTX PRO 6000 Blackwell GPU (Preview) with 96 GB of GPU memory (VRAM), which isseparate from theinstance memory.

Important: If you are using LLM with the Cloud Run GPU feature, makesure youalso consultBest practices: Cloud Run services with GPUs.

GPU on Cloud Run is fully managed, with no extra drivers orlibrariesneeded. The GPU feature offers on-demand availability with no reservations needed,similar to the way on-demandCPU andon-demandmemory workin Cloud Run. Instances of a Cloud Runservice that has been configured to use GPU can scale down to zero for cost savingswhen not in use.

Cloud Run instances with an attached L4 or NVIDIA RTX PRO 6000 Blackwell GPU withdrivers pre-installed start in approximately 5 seconds, at which point theprocesses running in your container can start to use the GPU.

You can configure one GPU per Cloud Run instance. If you use sidecarcontainers, note that the GPU can only be attached to one container.

Supported GPU types

Cloud Run supports two types of GPUs:

  • L4 GPU with the current NVIDIA driver version:535.216.03 (12.2).For L4 GPUs, you must use a minimum of 4 CPU and 16 GiB of memory.
  • NVIDIA RTX PRO 6000 Blackwell GPUwith the current NVIDIA driver version: 580.x.x(13.0) (Preview). For NVIDIA RTX PRO 6000 Blackwell GPU,you must use a minimum of 20 CPU and 80 GiB of memory.

Supported regions

The following regions are supported by the L4 GPU:

  • asia-southeast1 (Singapore)
  • asia-south1 (Mumbai) . This region is available by invitation only. Contact your Google Account team if you are interested in this region.
  • europe-west1 (Belgium)leaf iconLow CO2
  • europe-west4 (Netherlands)leaf iconLow CO2
  • us-central1 (Iowa)leaf iconLow CO2. This region is available by invitation only. Contact your Google Account team if you are interested in this region.
  • us-east4 (Northern Virginia)

The following regions are supported by the NVIDIA RTX PRO 6000 Blackwell GPU (Preview):

  • asia-southeast1 (Singapore). This region is available by invitation only. Contact your Google Account team if you are interested in this region.
  • asia-south2 (Delhi, India). This region is available by invitation only. Contact your Google Account team if you are interested in this region.
  • europe-west4 (Netherlands)leaf iconLow CO2
  • us-central1 (Iowa)leaf iconLow CO2

Pricing impact

SeeCloud Run pricing for GPU pricing details. Note thefollowing requirements and considerations:

  • There are no per request fees. You must useinstance-based billing to usethe GPU feature, minimum instances are charged at the full rate even whenidle.
  • There is a difference in cost betweenGPU zonal redundancyand non-zonal redundancy. SeeCloud Run pricing for GPU pricing details.
  • When you deploy a Cloud Run service or function from source code with GPUs enabled, Cloud Run uses thee2-highcpu-8 machine type, instead of the defaulte2-standard-2 machine type to build your source code. The larger machine type provides higher CPU support, and higher network bandwidth which results in faster build times.
  • The CPU and memory configurations of your resource.
  • GPU is billed for the entire duration of the instance lifecycle.

GPU zonal redundancy options

By default, Cloud Run deploys your service across multiple zoneswithin a region. This architecture provides inherent resilience: if a zoneexperiences an outage, Cloud Run automatically routes traffic awayfrom the affected zone to healthy zones within the same region.

When working with GPU resources, keep in mind GPU resources have specificcapacity constraints. During a zonal outage, the standard failover mechanismfor GPU workloads relies on sufficient unused GPU capacity being available inthe remaining healthy zones. Due to the constrained nature of GPUs, thiscapacity might not always be available.

To increase the availability of your GPU-accelerated services during zonaloutages, you can configure zonal redundancy specifically for GPUs:

  • Zonal Redundancy Turned On (default): Cloud Run reserves GPU capacityfor your service across multiple zones. This significantly increases theprobability that your service can successfully handle traffic rerouted froman affected zone, offering higher reliability during zonal failures withadditional cost per GPU second.

  • Zonal Redundancy Turned Off: Cloud Run attempts failover forGPU workloads on a best-effort basis. Traffic is routed to other zones only ifsufficient GPU capacity is available at that moment. This option does notguarantee reserved capacity for failover scenarios but results in a lower costper GPU second.

SLA

The SLA for Cloud Run GPU depends on whether the service uses the zonalredundancy or non-zonal redundancyoption.Refer to theSLA page for details.

Request a quota increase

Projects using Cloud Runnvidia-l4 GPUs in a region for the first timeare automatically granted 3 GPU quota (zonal redundancy off) when the firstdeployment is created. Quota for Cloud Runnvidia-rtx-pro-6000 GPUis granted in milliGPUs. Projects usingnvidia-rtx-pro-6000 GPU in a regionfor the first time will be automatically granted 3,000 milliGPU quota(zonal redundancy off) when the first deployment is created. This is equivalentto 3 GPUs.

If you need additional Cloud Run GPUs, you must request a quota increasefor your Cloud Run service. Use the links provided in the followingbuttons to request the quota you need.

Important: Quota increase for GPUs with zonal redundancy turnedoff are more likely to be granted and will be made available more quickly.
Quota neededQuota link
L4 GPU with zonal redundancy turnedoff (lower price)Request GPU quota without zonal redundancy
L4 GPU with zonal redundancy turnedon (higher price)Request GPU quota with zonal redundancy
NVIDIA RTX PRO 6000 Blackwell GPU with zonal redundancy turnedoff (lower price)Request GPU quota without zonal redundancy
NVIDIA RTX PRO 6000 Blackwell GPU with zonal redundancy turnedon (higher price)Request GPU quota with zonal redundancy

For more information on requesting quota increases, seeHow to increase quota.

Before you begin

The following list describes requirements and limitations when usingGPUs in Cloud Run:

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. Enable the Cloud Run API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

  7. Request required quota.
  8. ConsultBest practices: AI inference on Cloud Run with GPUs for recommendations on building your container image and loading large models.
  9. Make sure your Cloud Run service has the following configurations:

Required roles

To get the permissions that you need to configure and deploy Cloud Run services, ask your administrator to grant you the following IAM roles on services:

  • Cloud Run Developer (roles/run.developer) - the Cloud Run service
  • Service Account User (roles/iam.serviceAccountUser) - the service identity

If you are deploying aserviceorfunction from source code, youmust also have additional roles granted to you on your project andCloud Build service account.

For a list of IAM roles and permissions that are associated withCloud Run, seeCloud Run IAM rolesandCloud Run IAM permissions.If your Cloud Run service interfaces withGoogle Cloud APIs, such as Cloud Client Libraries, see theservice identity configuration guide.For more information about granting roles, seedeployment permissionsandmanage access.

Configure a Cloud Run service with GPU

Any configuration change leads to thecreation of a new revision. Subsequent revisions will also automatically getthis configuration setting unless you make explicit updates to change it.

You can use the Google Cloud console, Google Cloud CLI or YAML to configure GPU.

Console

  1. In the Google Cloud console, go to Cloud Run:

    Go to Cloud Run

  2. SelectServices from the Cloud Run navigation menu, and clickDeploy container to configure a new service.If you are configuring an existing service, click theservice, then clickEdit and deploy new revision.

  3. If you are configuring a new service, fill out the initial servicesettings page, then clickContainer(s), Volumes, Networking, Security to expand theservice configuration page.

  4. Click theContainer tab.

    image

    • Configure CPU, memory, concurrency, executionenvironment, and startup probe following the recommendations inBefore you begin.
    • Check the GPU checkbox, then select the GPU typefrom theGPU type menu, and the number of GPUs from theNumber of GPUsmenu.
    • By default for new services, zonal redundancy is turned on. To change thecurrent setting, select the GPU checkbox to show theGPU redundancy options.
      • SelectNo zonal redundancy to turn off zonal redundancy
      • SelectZonal redundancy to turn on zonal redundancy.
  5. ClickCreate orDeploy.

gcloud

To create a service with GPU enabled, use thegcloud run deploy command:

  • To deploy a container:

    gcloudrundeploySERVICE\--imageIMAGE_URL\--gpu1

    Replace the following:

    • SERVICE: the name of your Cloud Run service.
    • IMAGE_URL: a reference to the container image, forexample,us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry,therepositoryREPO_NAME mustalready be created. The URL follows the format ofLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG.
  • To deploy from source code:

    gcloudrundeploySERVICE\--source.\--gpu1

To update the GPU configuration for a service, use thegcloud run services updatecommand. For example, to update an existing service that specifies a containerimage:

gcloudrunservicesupdateSERVICE\--imageIMAGE_URL\--cpuCPU\--memoryMEMORY\--no-cpu-throttling\--gpuGPU_NUMBER\--gpu-typeGPU_TYPE\--max-instancesMAX_INSTANCE--GPU_ZONAL_REDUNDANCY

Replace the following:

  • SERVICE: the name of your Cloud Run service.
  • IMAGE_URL: a reference to the container image, forexample,us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry,therepositoryREPO_NAME mustalready be created. The URL follows the format ofLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG.
  • CPU: the number of CPU. For L4 GPU, you must specify at least4 CPU. For NVIDIA RTX PRO 6000 Blackwell GPU, you must specify at least20 CPU.
  • MEMORY: the amount of memory. For L4 GPU, you must specify atleast16Gi (16 GiB). For NVIDIA RTX PRO 6000 Blackwell GPU, you must specifyat least80Gi (80 GiB).
  • GPU_NUMBER: the value1 (one). If this is unspecified but a GPU_TYPE is present, the default is1.
  • GPU_TYPE: the GPU type. For L4 GPU, enter valuenvidia-l4(nvidia-L4 lowercase L, not numeric value fourteen). NVIDIA RTX PRO 6000 Blackwell GPU,enternvidia-rtx-pro-6000.
  • MAX_INSTANCE: the maximum number of instances. This number can't exceed theGPU quota allocated for your project.
  • GPU_ZONAL_REDUNDANCY:no-gpu-zonal-redundancy to turn offzonal redundancy, orgpu-zonal-redundancy to turn on zonal redundancy.

YAML

  1. If you are creating a new service, skip this step.If you are updating an existing service, download itsYAML configuration:

    gcloudrunservicesdescribeSERVICE--formatexport>service.yaml
  2. Update thenvidia.com/gpu: attribute andnodeSelector:
    run.googleapis.com/accelerator:
    :

    apiVersion:serving.knative.dev/v1kind:Servicemetadata:name:SERVICEspec:template:metadata:annotations:autoscaling.knative.dev/maxScale:'MAX_INSTANCE'run.googleapis.com/cpu-throttling:'false'run.googleapis.com/gpu-zonal-redundancy-disabled:'GPU_ZONAL_REDUNDANCY'spec:containers:-image:IMAGE_URLports:-containerPort:CONTAINER_PORTname:http1resources:limits:cpu:'CPU'memory:'MEMORY'nvidia.com/gpu:'1'# Optional: use a longer startup probe to allow long starting containersstartupProbe:failureThreshold:1800periodSeconds:1tcpSocket:port:CONTAINER_PORTtimeoutSeconds:1nodeSelector:run.googleapis.com/accelerator:GPU_TYPE

    Replace the following:

    • SERVICE: the name of your Cloud Run service.
    • IMAGE_URL: a reference to the container image, forexample,us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry,therepositoryREPO_NAME mustalready be created. The URL follows the format ofLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG.
    • CONTAINER_PORT: the container port set for your service.
    • CPU: the number of CPU. For L4 GPU, you must specify at least4 CPU. For NVIDIA RTX PRO 6000 Blackwell GPU, you must specify at least20 CPU.
    • MEMORY: the amount of memory. For L4 GPU, you must specify atleast16Gi (16 GiB). For NVIDIA RTX PRO 6000 Blackwell GPU, you must specifyat least80Gi (80 GiB).
    • GPU_TYPE: the GPU type. For L4 GPU, enter valuenvidia-l4(nvidia-L4 lowercase L, not numeric value fourteen). NVIDIA RTX PRO 6000 Blackwell GPU,enternvidia-rtx-pro-6000.
    • MAX_INSTANCE: the maximum number of instances. Thisnumber can't exceed theGPU quota allocated for your project.
    • GPU_ZONAL_REDUNDANCY:false to turn on GPU zonalredundancy, ortrue to turn it off.
  3. Create or update the service using the following command:

    gcloudrunservicesreplaceservice.yaml

Terraform

To learn how to apply or remove a Terraform configuration, seeBasic Terraform commands.

Add the following to agoogle_cloud_run_v2_service resource in your Terraform configuration:

resource"google_cloud_run_v2_service""default"{provider=google-betaname="SERVICE"location="europe-west1"template{gpu_zonal_redundancy_disabled="GPU_ZONAL_REDUNDANCY"containers{image="IMAGE_URL"resources{limits={"cpu"="CPU""memory"="MEMORY""nvidia.com/gpu"="1"}}}node_selector{accelerator="GPU_TYPE"}}}

Replace the following:

  • SERVICE: the name of your Cloud Run service.
  • GPU_ZONAL_REDUNDANCY:false to turn on GPU zonalredundancy, ortrue to turn it off.
  • IMAGE_URL: a reference to the container image, forexample,us-docker.pkg.dev/cloudrun/container/hello:latest. If you use Artifact Registry,therepositoryREPO_NAME mustalready be created. The URL follows the format ofLOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG.
  • CPU: the number of CPU. For L4 GPU, you must specify at least4 CPU. For NVIDIA RTX PRO 6000 Blackwell GPU, you must specify at least20 CPU.
  • MEMORY: the amount of memory. For L4 GPU, you must specify atleast16Gi (16 GiB). For NVIDIA RTX PRO 6000 Blackwell GPU, you must specifyat least80Gi (80 GiB).
  • GPU_TYPE: the GPU type. For L4 GPU, enter valuenvidia-l4(nvidia-L4 lowercase L, not numeric value fourteen). For NVIDIA RTX PRO 6000 Blackwell GPU,enternvidia-rtx-pro-6000.

View GPU settings

To view the current GPU settings for yourCloud Run service:

Console

  1. In the Google Cloud console, go to the Cloud RunServices page:

    Go to Cloud Run

  2. Click the service you are interested in to open theService detailspage.

  3. Click theRevisions tab.

  4. In the details panel at the right, the GPU settingis listed under theContainer tab.

gcloud

  1. Use the following command:

    gcloudrunservicesdescribeSERVICE
  2. Locate the GPU setting in the returnedconfiguration.

Remove GPU

You can remove GPU using the Google Cloud console, the Google Cloud CLI, or YAML.

Console

  1. In the Google Cloud console, go to Cloud Run:

    Go to Cloud Run

  2. SelectServices from the Cloud Run navigation menu, and clickDeploy container to configure a new service.If you are configuring an existing service, click theservice, then clickEdit and deploy new revision.

  3. If you are configuring a new service, fill out the initial servicesettings page, then clickContainer(s), Volumes, Networking, Security to expand theservice configuration page.

  4. Click theContainer tab.

    image

    • Uncheck the GPU checkbox.
  5. ClickCreate orDeploy.

gcloud

To remove GPU, set the number of GPUs to0 using thegcloud run services updatecommand:

gcloudrunservicesupdateSERVICE--gpu0

ReplaceSERVICE with the name of your Cloud Run service.

YAML

  1. If you are creating a new service, skip this step.If you are updating an existing service, download itsYAML configuration:

    gcloudrunservicesdescribeSERVICE--formatexport>service.yaml
  2. Delete thenvidia.com/gpu: and thenodeSelector:run.googleapis.com/accelerator:GPU_TYPE lines.

  3. Create or update the service using the following command:

    gcloudrunservicesreplaceservice.yaml

Libraries

By default, all of the NVIDIA L4 and NVIDIA RTX PRO 6000 Blackwell GPU driver librariesare mounted under/usr/local/nvidia/lib64. Cloud Run automaticallyappends this path to theLD_LIBRARY_PATH environment variable(i.e.${LD_LIBRARY_PATH}:/usr/local/nvidia/lib64) of the container with theGPU. This allows the dynamic linker to find the NVIDIA driver libraries. Thelinker searches and resolves paths in the order you list in theLD_LIBRARY_PATHenvironment variable. Any values you specify in this variable take precedenceover the default Cloud Run driver libraries path/usr/local/nvidia/lib64.

If you want to use a CUDA version greater than 12.2,the easiest way is to depend on a newerNVIDIA base imagewith forward compatibility packages already installed. Another option is tomanuallyinstall the NVIDIA forward compatibility packagesand add them toLD_LIBRARY_PATH. ConsultNVIDIA's compatibility matrixto determine which CUDA versions are forward compatible with the provided NVIDIAdriver version.

About GPUs and maximum instances

The number of instances with GPUs is limited in two ways:

What's next

SeeRun AI inference on Cloud Run with GPUsfor tutorials.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.