Configure GPUs for Cloud Run jobs

Preview — NVIDIA RTX PRO 6000 Blackwell GPU

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

This page describes GPU configuration for your Cloud Run jobs.GPUs work well for AI workloads such as,training large language models (LLMs) using your preferred frameworks, performingbatch or offline inference on LLMs, and handling other compute-intensive taskslike video processing and graphics rendering as background jobs.Google provides NVIDIA L4 GPUs with 24 GB of GPU memory (VRAM)and NVIDIA RTX PRO 6000 Blackwell GPU (Preview) with 96 GB of GPU memory (VRAM), which isseparate from theinstance memory.

Important: If you are using LLM with the Cloud Run GPU feature, makesure youalso consultBest practices: Cloud Run jobs with GPUs.

GPU on Cloud Run is fully managed, with no extra drivers orlibrariesneeded. The GPU feature offers on-demand availability with no reservations needed,similar to the way on-demandCPU andon-demandmemory workin Cloud Run.

Cloud Run instances with an attached L4 or NVIDIA RTX PRO 6000 Blackwell GPU withdrivers pre-installed start in approximately 5 seconds, at which point theprocesses running in your container can start to use the GPU.

You can configure one GPU per Cloud Run instance. If you use sidecarcontainers, note that the GPU can only be attached to one container.

Supported GPU types

Cloud Run supports two types of GPUs:

  • L4 GPU with the current NVIDIA driver version:535.216.03 (12.2).For L4 GPUs, you must use a minimum of 4 CPU and 16 GiB of memory.
  • NVIDIA RTX PRO 6000 Blackwell GPUwith the current NVIDIA driver version: 580.x.x(13.0) (Preview). For NVIDIA RTX PRO 6000 Blackwell GPU,you must use a minimum of 20 CPU and 80 GiB of memory.

Supported regions

The following regions are supported by the L4 GPU:

  • asia-southeast1 (Singapore)
  • asia-south1 (Mumbai) . This region is available by invitation only. Contact your Google Account team if you are interested in this region.
  • europe-west1 (Belgium)leaf iconLow CO2
  • europe-west4 (Netherlands)leaf iconLow CO2
  • us-central1 (Iowa)leaf iconLow CO2. This region is available by invitation only. Contact your Google Account team if you are interested in this region.
  • us-east4 (Northern Virginia)

The following regions are supported by the NVIDIA RTX PRO 6000 Blackwell GPU (Preview):

  • asia-southeast1 (Singapore). This region is available by invitation only. Contact your Google Account team if you are interested in this region.
  • asia-south2 (Delhi, India). This region is available by invitation only. Contact your Google Account team if you are interested in this region.
  • europe-west4 (Netherlands)leaf iconLow CO2
  • us-central1 (Iowa)leaf iconLow CO2

Pricing impact

SeeCloud Run pricing for GPU pricing details. Note thefollowing requirements and considerations:

  • GPU for jobs followNo zonal redundancy pricing.
  • The CPU and memory configurations of your resource.
  • GPU is billed for the entire duration of the instance lifecycle.

GPU non-zonal redundancy

The Cloud Run jobs feature provides non-zonal redundancy support only forGPU-enabled instances. With non-zonal redundancy enabled,Cloud Run attempts failover for GPU-enabled jobs on a best-effort basis.Cloud Run routes job executions to other zones only if sufficient GPUcapacity is available at that moment. This option does not guarantee reserved capacity for failover scenarios but results in a lower cost per GPU second.

Seeconfigure a Cloud Run job with GPU for details onenabling non-zonal redundancy.

Request a quota increase

Projects using Cloud Runnvidia-l4 GPUs in a region for the first timeare automatically granted 3 GPU quota (zonal redundancy off) when the firstdeployment is created. Quota for Cloud Runnvidia-rtx-pro-6000 GPUsis granted in milliGPUs. Projects usingnvidia-rtx-pro-6000 GPUs in a regionfor the first time will be automatically granted 3,000 milliGPU quota(zonal redundancy off) when the first deployment is created. This is equivalentto 3 GPUs.

Note that this automatic quota grant issubject to availability depending on your CPU and memory capacity. This limitsthe count of GPUs that might be active across all of the project's services,jobs, and worker pools at any given time.

If you need additional Cloud Run GPUs for jobs,request a quota increase.

Before you begin

The following list describes requirements and limitations when usingGPUs in Cloud Run:

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. Enable the Cloud Run API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

  7. ConsultBest practices: Cloud Run jobs with GPUs for optimizing performance when using Cloud Run jobswith GPU.

Required roles

To get the permissions that you need to configure Cloud Run jobs, ask your administrator to grant you the following IAM roles on jobs:

  • Cloud Run Developer (roles/run.developer) - the Cloud Run job
  • Service Account User (roles/iam.serviceAccountUser) - the service identity

For a list of IAM roles and permissions that are associated withCloud Run, seeCloud Run IAM rolesandCloud Run IAM permissions.If your Cloud Run job interfaces withGoogle Cloud APIs, such as Cloud Client Libraries, see theservice identity configuration guide.For more information about granting roles, seedeployment permissionsandmanage access.

Configure a Cloud Run job to use GPUs

You can use the Google Cloud console, Google Cloud CLI or YAML to configure GPU.

Console

  1. In the Google Cloud console, go to the Cloud RunJobs page:

    Go to Cloud Run

  2. ClickDeploy container to fill outthe initial job settings page. If you are configuring an existing job,select the job, then clickView and edit job configuration.

  3. ClickContainer(s), Volumes, Connections, Security to expand the job properties page.

  4. Click theContainer tab.

    image

    • Configure CPU, memory, and startup probe following the recommendations inBefore you begin.
    • Check the GPU checkbox. Then select the GPU typefrom theGPU type menu, and the number of GPUs from theNumber of GPUsmenu.
  5. ClickCreate orUpdate.

gcloud

To enable non-zonal redundancy, you must specify--no-gpu-zonal-redundancy. This is required for using GPU with jobs.

To create a job using GPUs enabled, use thegcloud run jobs createcommand:

gcloudrunjobscreateJOB_NAME\--image=IMAGE\--gpu=1\--no-gpu-zonal-redundancy

Replace the following:

  • JOB_NAME: the name of your Cloud Run job.
  • IMAGE_URL: a reference to thecontainer image—forexample,us-docker.pkg.dev/cloudrun/container/job:latest.

To update the GPU configuration for a job, use thegcloud run jobs update command:

gcloudrunjobsupdateJOB_NAME\--imageIMAGE_URL\--cpuCPU\--memoryMEMORY\--gpuGPU_NUMBER\--gpu-typeGPU_TYPE\--parallelismPARALLELISM\--no-gpu-zonal-redundancy

Replace the following:

  • JOB_NAME: the name of your Cloud Run job.
  • IMAGE_URL: a reference to thecontainer image—forexample,us-docker.pkg.dev/cloudrun/container/job:latest.
  • CPU: the number of CPUs. For L4 GPU, you must specify at least4 CPU. For NVIDIA RTX PRO 6000 Blackwell GPU, you must specify at least20 CPU.
  • MEMORY: the amount of memory. For L4 GPU, you must specify atleast16Gi (16 GiB). For NVIDIA RTX PRO 6000 Blackwell GPU, you must specifyat least80Gi (80 GiB).
  • GPU_NUMBER: the value1 (one). If this is unspecified but a GPU_TYPE is present, the default is1.
  • GPU_TYPE: the GPU type. For L4 GPU, enter valuenvidia-l4(nvidia-L4 lowercase L, not numeric value fourteen). NVIDIA RTX PRO 6000 Blackwell GPU,enternvidia-rtx-pro-6000.
  • PARALLELISM: an integer value less than the lowest value of theapplicable quota limits you allocated for your project.

YAML

You must set the annotationrun.googleapis.com/gpu-zonal-redundancy-disabled:to 'true`. This enables non-zonal redundancy, which is required for GPUs forjobs.

  1. If you are creating a new job, skip this step.If you are updating an existing job, download itsYAML configuration:

    gcloudrunjobsdescribeJOB_NAME--formatexport>job.yaml
  2. Update thenvidia.com/gpu attribute,annotations: run.googleapis.com/launch-stage for launch stage, andnodeSelector:
    run.googleapis.com/accelerator
    :

    apiVersion:run.googleapis.com/v1kind:Jobmetadata:name:JOB_NAMElabels:cloud.googleapis.com/location:REGIONspec:template:metadata:annotations:run.googleapis.com/gpu-zonal-redundancy-disabled:'true'spec:template:spec:containers:-image:IMAGE_URLlimits:cpu:'CPU'memory:'MEMORY'nvidia.com/gpu:'GPU_NUMBER'nodeSelector:run.googleapis.com/accelerator:GPU_TYPE

    Replace the following:

    • JOB_NAME: the name of your Cloud Run job.
    • IMAGE_URL: a reference to thecontainer image—forexample,us-docker.pkg.dev/cloudrun/container/job:latest
    • CPU: the number of CPU. For L4 GPU, you must specify at least4 CPU. For NVIDIA RTX PRO 6000 Blackwell GPU, you must specify at least20 CPU.
    • MEMORY: the amount of memory. For L4 GPU, you must specify atleast16Gi (16 GiB). For NVIDIA RTX PRO 6000 Blackwell GPU, you must specifyat least80Gi (80 GiB).
    • GPU_NUMBER: the value1 (one) because we only supportattaching one GPU per Cloud Run instance.
    • GPU_TYPE: the GPU type. For L4 GPU, enter valuenvidia-l4(nvidia-L4 lowercase L, not numeric value fourteen). NVIDIA RTX PRO 6000 Blackwell GPU,enternvidia-rtx-pro-6000.
  3. Create or update the job using the following command:

    gcloudrunjobsreplacejob.yaml

View GPU settings

To view the current GPU settings for yourCloud Run job:

Console

  1. In the Google Cloud console, go to the Cloud Run jobs page:

    Go to Cloud Run jobs

  2. Click the job you are interested in to open theJob details page.

  3. ClickView and Edit job configuration.

  4. Locate the GPU setting in the configurationdetails.

gcloud

  1. Use the following command:

    gcloudrunjobsdescribeJOB_NAME
  2. Locate the GPU setting in the returnedconfiguration.

Detach GPU resources from a job

You can detach GPU resources from a job using the Google Cloud console, Google Cloud CLIor YAML.

Console

  1. In the Google Cloud console, go to the Cloud RunJobs page:

    Go to Cloud Run

  2. In the jobs list, click a job to open that job's details.

  3. ClickView and edit job configuration.

  4. ClickContainer(s), Volumes, Connections, Security to expand the job properties page.

  5. Click theContainer tab.

    image

    • Clear the GPU checkbox.
  6. ClickUpdate.

gcloud

To detach GPU resources from your Cloud Run job, set the number of GPUs to0 using thegcloud run jobs updatecommand:

gcloudrunjobsupdateJOB_NAME--gpu0

ReplaceJOB_NAME with the name of your Cloud Run job.

YAML

  1. If you are creating a new job, skip this step.If you are updating an existing job, download itsYAML configuration:

    gcloudrunjobsdescribeJOB_NAME--formatexport>job.yaml
  2. Delete thenvidia.com/gpu:, therun.googleapis.com/gpu-zonal-redundancy-disabled: 'true', and thenodeSelector:run.googleapis.com/accelerator:GPU_TYPE lines.

  3. Create or update the job using the following command:

    gcloudrunjobsreplacejob.yaml

Libraries

By default, all of the NVIDIA L4 and NVIDIA RTX PRO 6000 Blackwell GPU driver librariesare mounted under/usr/local/nvidia/lib64. Cloud Run automaticallyappends this path to theLD_LIBRARY_PATH environment variable(i.e.${LD_LIBRARY_PATH}:/usr/local/nvidia/lib64) of the container with theGPU. This allows the dynamic linker to find the NVIDIA driver libraries. Thelinker searches and resolves paths in the order you list in theLD_LIBRARY_PATHenvironment variable. Any values you specify in this variable take precedenceover the default Cloud Run driver libraries path/usr/local/nvidia/lib64.

If you want to use a CUDA version greater than 12.2,the easiest way is to depend on a newerNVIDIA base imagewith forward compatibility packages already installed. Another option is tomanuallyinstall the NVIDIA forward compatibility packagesand add them toLD_LIBRARY_PATH. ConsultNVIDIA's compatibility matrixto determine which CUDA versions are forward compatible with the provided NVIDIAdriver version.

About GPUs and parallelism

If you are running parallel tasks in a job execution, determine and setparallelism value to less than theGPU quota without zonal redundancy allocated foryour project. To request for a quota increase, seeHow to increase quota.GPU tasks start as quickly as possible andgo up to a maximum that varies depending on how much GPU quota you allocated forthe project and the region selected. Cloud Run deployments fail ifyou set parallelism to more than the GPU quota limit.

To calculate the GPU quotayour job uses per execution, multiply the number of GPUs per job taskwith the parallelism value. For example, if you have a GPU quota of 10, anddeploy your Cloud Run job with--gpu=1,--parallelism=10, then yourjob consumes all 10 GPU quota. Alternatively, if you deploy with--gpu=1,--parallelism=20, then deployments fail.

For more information, seeBest practices: Cloud Run jobs with GPUs.

What's next

SeeRun AI inference on Cloud Run with GPUsfor tutorials.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.