Dynamically allocate devices to workloads with DRA

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

You can flexibly request devices for your Google Kubernetes Engine (GKE) workloadsby usingdynamic resource allocation (DRA).This document shows you how to create a ResourceClaimTemplate to requestdevices, and then create a workload to observe how Kubernetes flexibly allocatesthe devices to your Pods.

This document is intended forApplication operatorsandData engineerswho run workloads like AI/ML or high performance computing (HPC).

About requesting devices with DRA

When you set up your GKE infrastructure for DRA, the DRA driverson your nodes createDeviceClass objects in the cluster.A DeviceClass defines a category of devices, such as GPUs, that are available to request forworkloads.A platform administrator can optionally deploy additional DeviceClasses thatlimit which devices you can request in specific workloads.

To request devices within a DeviceClass, you create one of the following objects:

  • ResourceClaim: A ResourceClaim lets a Pod or a user request hardware resources by filtering for certain parameterswithin a DeviceClass.
  • ResourceClaimTemplate: A ResourceClaimTemplate defines a template that Pods can use to automatically create new per-PodResourceClaims.

For more information about ResourceClaims and ResourceClaimTemplates,seeWhen to use ResourceClaims and ResourceClaimTemplates.

The examples on this page use a basic ResourceClaimTemplate to request thespecified device configuration. For more information about all of the fieldsthat you can specify, see theResourceClaimTemplate API reference.

Limitations

  • Node auto-provisioning isn't supported.
  • Autopilot clusters don't support DRA.
  • You can't use the following GPU sharing features:
    • Time-sharing GPUs
    • Multi-instance GPUs
    • Multi-process Service (MPS)

Requirements

To use DRA, your GKE version must be version 1.34or later.

You should also be familiar with the following requirements and limitations:

Before you begin

Before you start, make sure that you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task,install and theninitialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running thegcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.Note: For existing gcloud CLI installations, make sure to set thecompute/regionproperty. If you use primarily zonal clusters, set thecompute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following:One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Use DRA to deploy workloads

To request per-Pod device allocation, you create a ResourceClaimTemplate thathas your requested device configuration, such as GPUs of a specific type. Whenyou deploy a workload that references the ResourceClaimTemplate, Kubernetescreates ResourceClaims for each Pod in the workload based on theResourceClaimTemplate. Kubernetes allocates the requested resources andschedules the Pods on corresponding nodes.

To request devices in a workload with DRA, select one of the following options:

GPU

  1. Save the following manifest asclaim-template.yaml:

    apiVersion:resource.k8s.io/v1kind:ResourceClaimTemplatemetadata:name:gpu-claim-templatespec:spec:devices:requests:-name:single-gpuexactly:deviceClassName:gpu.nvidia.comallocationMode:ExactCountcount:1
  2. Create the ResourceClaimTemplate:

    kubectlcreate-fclaim-template.yaml
  3. To create a workload that references the ResourceClaimTemplate, savethe following manifest asdra-gpu-example.yaml:

    apiVersion:apps/v1kind:Deploymentmetadata:name:dra-gpu-examplespec:replicas:1selector:matchLabels:app:dra-gpu-exampletemplate:metadata:labels:app:dra-gpu-examplespec:containers:-name:ctrimage:ubuntu:22.04command:["bash","-c"]args:["echo$(nvidia-smi-L||echoWaiting...)"]resources:claims:-name:single-gpuresourceClaims:-name:single-gpuresourceClaimTemplateName:gpu-claim-templatetolerations:-key:"nvidia.com/gpu"operator:"Exists"effect:"NoSchedule"
  4. Deploy the workload:

    kubectlcreate-fdra-gpu-example.yaml

TPU

  1. Save the following manifest asclaim-template.yaml:

    apiVersion:resource.k8s.io/v1kind:ResourceClaimTemplatemetadata:name:tpu-claim-templatespec:spec:devices:requests:-name:all-tpusexactly:deviceClassName:tpu.google.comallocationMode:All

    This ResourceClaimTemplate requests all TPUs, so all TPUs on a node areallocated to each resulting ResourceClaim.

  2. Create the ResourceClaimTemplate:

    kubectlcreate-fclaim-template.yaml
  3. To create a workload that references the ResourceClaimTemplate, savethe following manifest asdra-tpu-example.yaml:

    apiVersion:apps/v1kind:Deploymentmetadata:name:dra-tpu-examplespec:replicas:1selector:matchLabels:app:dra-tpu-exampletemplate:metadata:labels:app:dra-tpu-examplespec:containers:-name:ctrimage:ubuntu:22.04command:-/bin/sh--c-|echo "Environment Variables:"envecho "Sleeping indefinitely..."sleep infinityresources:claims:-name:all-tpusresourceClaims:-name:all-tpusresourceClaimTemplateName:tpu-claim-templatetolerations:-key:"google.com/tpu"operator:"Exists"effect:"NoSchedule"
  4. Deploy the workload:

    kubectlcreate-fdra-tpu-example.yaml

Verify the hardware allocation

You can verify that your workloads have been allocated hardware by checkingthe ResourceClaim or by looking at the logs for your Pod. To verify theallocation for GPUs or TPUs, select one of the following options:

GPU

  1. Get the ResourceClaim associated with the workload that you deployed:

    kubectlgetresourceclaims

    The output is similar to the following:

    NAME                                               STATE                AGEdra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh   allocated,reserved   9s
  2. Get more details about the hardware assigned to the Pod:

    kubectldescriberesourceclaimsRESOURCECLAIM

    ReplaceRESOURCECLAIM with the full name of theResourceClaim that you got from the output of the previous step.

    The output is similar to the following:

       Name:         dra-gpu-example-68f595d7dc-prv27-single-gpu-qgjq5   Namespace:    default   Labels:       <none>   Annotations:  resource.kubernetes.io/pod-claim-name: single-gpu   API Version:  resource.k8s.io/v1   Kind:         ResourceClaim   Metadata:   # Multiple lines are omitted here.   Spec:     Devices:       Requests:         Exactly:           Allocation Mode:    ExactCount           Count:              1           Device Class Name:  gpu.nvidia.com         Name:                 single-gpu   Status:     Allocation:       Devices:         Results:           Device:   gpu-0           Driver:   gpu.nvidia.com           Pool:     gke-cluster-1-dra-gpu-pool-b56c4961-7vnm           Request:  single-gpu       Node Selector:         Node Selector Terms:           Match Fields:             Key:       metadata.name             Operator:  In             Values:               gke-cluster-1-dra-gpu-pool-b56c4961-7vnm     Reserved For:       Name:      dra-gpu-example-68f595d7dc-prv27       Resource:  pods       UID:       e16c2813-08ef-411b-8d92-a72f27ebf5ef   Events:        <none>   ```
  3. Get logs for the workload that you deployed:

    kubectllogsdeployment/dra-gpu-example--all-pods=true

    The output is similar to the following:

    [pod/dra-gpu-example-64b75dc6b-x8bd6/ctr] GPU 0: Tesla T4 (UUID: GPU-2087ac7a-f781-8cd7-eb6b-b00943cc13ef)

    The output of these steps shows that GKE allocated one GPUto the container.

TPU

  1. Get the ResourceClaim associated with the workload that you deployed:

    kubectlgetresourceclaims|grepdra-tpu-example

    The output is similar to the following:

    NAME                                               STATE                AGEdra-tpu-example-64b75dc6b-x8bd6-all-tpus-jwwdh     allocated,reserved   9s
  2. Get more details about the hardware assigned to the Pod:

    kubectldescriberesourceclaimsRESOURCECLAIM-oyaml

    ReplaceRESOURCECLAIM with the full name of theResourceClaim that you got from the output of the previous step.

    The output is similar to the following:

    apiVersion:resource.k8s.io/v1beta1kind:ResourceClaimmetadata:annotations:resource.kubernetes.io/pod-claim-name:all-tpuscreationTimestamp:"2025-03-04T21:00:54Z"finalizers:-resource.kubernetes.io/delete-protectiongenerateName:dra-tpu-example-59b8785697-k9kzd-all-gpus-name:dra-tpu-example-59b8785697-k9kzd-all-gpus-gnr7znamespace:defaultownerReferences:-apiVersion:v1blockOwnerDeletion:truecontroller:truekind:Podname:dra-tpu-example-59b8785697-k9kzduid:c2f4fe66-9a73-4bd3-a574-4c3eea5fda3fresourceVersion:"12189603"uid:279b5014-340b-4ef6-9dda-9fbf183fbb71spec:devices:requests:-allocationMode:AlldeviceClassName:tpu.google.comname:all-tpusstatus:allocation:devices:results:-adminAccess:nulldevice:"0"driver:tpu.google.compool:gke-tpu-2ec29193-bcc0request:all-tpus-adminAccess:nulldevice:"1"driver:tpu.google.compool:gke-tpu-2ec29193-bcc0request:all-tpus-adminAccess:nulldevice:"2"driver:tpu.google.compool:gke-tpu-2ec29193-bcc0request:all-tpus-adminAccess:nulldevice:"3"driver:tpu.google.compool:gke-tpu-2ec29193-bcc0request:all-tpus-adminAccess:nulldevice:"4"driver:tpu.google.compool:gke-tpu-2ec29193-bcc0request:all-tpus-adminAccess:nulldevice:"5"driver:tpu.google.compool:gke-tpu-2ec29193-bcc0request:all-tpus-adminAccess:nulldevice:"6"driver:tpu.google.compool:gke-tpu-2ec29193-bcc0request:all-tpus-adminAccess:nulldevice:"7"driver:tpu.google.compool:gke-tpu-2ec29193-bcc0request:all-tpusnodeSelector:nodeSelectorTerms:-matchFields:-key:metadata.nameoperator:Invalues:-gke-tpu-2ec29193-bcc0reservedFor:-name:dra-tpu-example-59b8785697-k9kzdresource:podsuid:c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f
  3. Get logs for the workload that you deployed:

    kubectllogsdeployment/dra-tpu-example--all-pods=true|grep"TPU"

    The output is similar to the following:

    [pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_CHIPS_PER_HOST_BOUNDS=2,4,1[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_TOPOLOGY_WRAP=false,false,false[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_SKIP_MDS_QUERY=true[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_RUNTIME_METRICS_PORTS=8431,8432,8433,8434,8435,8436,8437,8438[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_WORKER_ID=0[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_WORKER_HOSTNAMES=localhost[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_TOPOLOGY=2x4[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_ACCELERATOR_TYPE=v6e-8[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_HOST_BOUNDS=1,1,1[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_TOPOLOGY_ALT=false[pod/dra-tpu-example-59b8785697-tm2lc/ctr]TPU_DEVICE_0_RESOURCE_CLAIM=77e68f15-fa2f-4109-9a14-6c91da1a38d3

    The output of these steps indicates that all of the TPUs in a node poolwere allocated to the Pod.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.