Run a large-scale workload with flex-start with queued provisioning Stay organized with collections Save and categorize content based on your preferences.
This page shows you how to optimize GPU obtainability forlarge-scale batch and AI workloads with GPUs using flex-start with queued provisioning powered byDynamic Workload Scheduler.
Before reading this page, ensure that you're familiar with the following:
This guide is intended for Machine learning (ML) engineers,Platform admins and operators, and for Data and AI specialists who are interestedin using Kubernetes container orchestration capabilities for running batchworkloads. For more information about common roles and example tasks that wereference in Google Cloud content, seeCommon GKE user roles and tasks.
How flex-start with queued provisioning works
With flex-start with queued provisioning, GKE allocates allrequested resources at the same time. Flex-start with queued provisioning usesthe following tools:
- Flex-start with queued provisioning is based onDynamic Workload Schedulercombined with theProvisioning Request custom resource definition (CRD).These tools manage the capacity allocated based on the available resources andyour workload requirements.
- (Optional)Kueue automates thelifecycle of flex-start with queued provisioning requests. Kueue implements Job queueingand automatically handles the Provisioning Request lifecycle.
To use flex-start with queued provisioning, you have to addthe--flex-start and--enable-queued-provisioning flags when you create thenode pool.
Useflex-start with queued provisioning for large-scale batch and AI workloads when your workloads meet the following criteria:
- Your workloads have flexible start times.
- Your workloads are required to run across multiple nodes simultaneously.
For smaller workloads that can run on a single node, useFlex-start VMs.For more information about GPU provisioning in GKE, seeObtainaccelerators for AI workloads.
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,install and theninitialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.Note: For existing gcloud CLI installations, make sure to set thecompute/regionproperty. If you use primarily zonal clusters, set thecompute/zoneinstead. By setting a default location, you can avoid errors in the gcloud CLI like the following:One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.
- Ensure that you have a GKE cluster in version 1.32.2-gke.1652000 or later.
- Ensure that you have enoughpreemptible quota for the VMs that will be provisioned.
- Ensure that youmanage disruptions in workloads that use Dynamic Workload Schedulerto prevent workload disruption.
- Ensure that you're familiar with thelimitations of flex-start with queued provisioning.
- When using a Standard cluster, ensure that you maintain at least onenode pool without flex-start with queued provisioning enabled for the cluster tofunction correctly.
Use node pools with flex-start with queued provisioning
This section applies to Standard clusters only.
You can use any of the following methods to designate thatflex-start with queued provisioning can work with specific node pools in your cluster:
- Create a node pool.
- Configure node auto-provisioning to create node pools that has flex-start with queued provisioning enabled.
Create a node pool
Create a node pool that has flex-start with queued provisioning enabled by using thegcloud CLI:
gcloudcontainernode-poolscreateNODEPOOL_NAME\--cluster=CLUSTER_NAME\--location=LOCATION\--enable-queued-provisioning\--acceleratortype=GPU_TYPE,count=AMOUNT,gpu-driver-version=DRIVER_VERSION\--machine-type=MACHINE_TYPE\--flex-start\--enable-autoscaling\--num-nodes=0\--total-max-nodesTOTAL_MAX_NODES\--location-policy=ANY\--reservation-affinity=none\--no-enable-autorepairReplace the following:
NODEPOOL_NAME: The name you choose for the node pool.CLUSTER_NAME: The name of the cluster.LOCATION: The cluster's Compute Engine region, suchasus-central1.GPU_TYPE: TheGPU type.AMOUNT: The number of GPUs to attach to nodes in thenode pool.DRIVER_VERSION: the NVIDIA driver version to install.Can be one of the following:default: Install the default driver version for your GKEversion.latest: Install the latest available driver version for yourGKE version. Available only for nodes that useContainer-Optimized OS.
TOTAL_MAX_NODES: the maximum number of nodes toautomatically scale for the entire node pool.MACHINE_TYPE: The Compute Engine machine type foryour nodes.Best practice: Use anaccelerator-optimized machine type to improve performance and efficiency for AI/ML workloads.
Optionally, you can use the following flags:
--node-locations=COMPUTE_ZONES: The comma-separatedlist of one or more zones where GKE creates the GPU nodes. Thezones must be in the same region as the cluster. Choose zones that haveavailable GPUs.--enable-gvnic: This flag enablesgVNIC on the GPU node pools to increase network trafficspeed.
This command creates a node pool with the following configuration:
- The
--flex-startflag combined with the--enable-queued-provisioningflaginstructs GKE to create a node pool with flex-start with queued provisioningenabled and to add thecloud.google.com/gke-queuedtaint to the node pool. - GKE enables queued provisioning and cluster autoscaling.
- The node pool initially has zero nodes.
- The
--no-enable-autorepairflag disables automatic repairs, which coulddisrupt workloads that run on repaired nodes.
Enable node auto-provisioning to create node pools for flex-start with queued provisioning
You can use node auto-provisioning to manage node pools forflex-start with queued provisioning for clusters running version 1.29.2-gke.1553000 orlater. When you enable node auto-provisioning, GKE creates nodepools with the required resources for the associated workload.
To enable node auto-provisioning, consider the following settings and completethe steps inConfigure GPU limits:
- Specify the required resources for flex-start with queued provisioning when you enablethe feature. To list the available
resourceTypes, run thegcloud computeaccelerator-types listcommand. - Use the
--no-enable-autoprovisioning-autorepairflag to disable nodenode auto-repair. - Let GKE automatically installGPU drivers in auto-provisioned GPU nodes. For more information, seeInstalling drivers using node auto-provisioning with GPUs.
Run your batch and AI workloads with flex-start with queued provisioning
To run batch workloads with flex-start with queued provisioning use any of the followingconfigurations:
Flex-start with queued provisioning for Jobs with Kueue: You can useflex-start with queued provisioningwithKueue to automate thelifecycle of theProvisioning Request requests. Kueue implements Job queueingand observes the status of the flex-start with queued provisioning.Kueue decides when Jobs should wait and when they should start, based onquotas and a hierarchy for sharing resources fairly among teams.
Flex-start with queued provisioning for Jobs without Kueue: Youcan use flex-start with queued provisioningwithout Kueue when you use your own internalbatch scheduling tools or platform. You manually create and cancel theProvisioning Request.
UseKueue to run your batch and AI workloads with flex-start with queued provisioning.
Flex-start with queued provisioning for Jobs with Kueue
The following sections show you how to configure the flex-start with queued provisioningfor Jobs with Kueue:
- Flex-start with queued provisioning node pool setup.
- Reservation and flex-start with queued provisioning node pool setup.
This section uses the samples in thedws-examples directory from theai-on-gke repository. We have published the samples in thedws-examplesdirectory under the Apache2 license.
You need to have administrator permissions to install Kueue. To gain them, makesure you are granted the IAM roleroles/container.admin. Tofind out more about GKE IAM roles, seeCreate IAM allow policies guide.
Prepare your environment
In Cloud Shell, run the following command:
gitclonehttps://github.com/GoogleCloudPlatform/ai-on-gkecdai-on-gke/tutorials-and-examples/workflow-orchestration/dws-examplesInstall thelatest Kueue version in yourcluster:
VERSION=KUEUE_VERSIONkubectlapply--server-side-fhttps://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yamlReplaceKUEUE_VERSION with the latest Kueue version.
If you use Kueue in version earlier than0.7.0, change the Kueue feature gateconfiguration by setting theProvisioningACC feature gate totrue. SeeKueue's feature gatesfor more detailed explanation and default gate values. For more informationabout Kueue installation, seeInstallation.
Create the Kueue resources for the Dynamic Workload Scheduler node pool only setup
With the following manifest, you create acluster-level queue nameddws-cluster-queue and theLocalQueue namespacenameddws-local-queue. Jobs that refer todws-cluster-queue queue in thisnamespace use flex-start with queued provisioning to get the GPU resources.
apiVersion:kueue.x-k8s.io/v1beta1kind:ResourceFlavormetadata:name:"default-flavor"---apiVersion:kueue.x-k8s.io/v1beta1kind:AdmissionCheckmetadata:name:dws-provspec:controllerName:kueue.x-k8s.io/provisioning-requestparameters:apiGroup:kueue.x-k8s.iokind:ProvisioningRequestConfigname:dws-config---apiVersion:kueue.x-k8s.io/v1beta1kind:ProvisioningRequestConfigmetadata:name:dws-configspec:provisioningClassName:queued-provisioning.gke.iomanagedResources:-nvidia.com/gpu---apiVersion:kueue.x-k8s.io/v1beta1kind:ClusterQueuemetadata:name:"dws-cluster-queue"spec:namespaceSelector:{}resourceGroups:-coveredResources:["cpu","memory","nvidia.com/gpu","ephemeral-storage"]flavors:-name:"default-flavor"resources:-name:"cpu"nominalQuota:1000000000# "Infinite" quota-name:"memory"nominalQuota:1000000000Gi# "Infinite" quota-name:"nvidia.com/gpu"nominalQuota:1000000000# "Infinite" quota-name:"ephemeral-storage"nominalQuota:1000000000Ti# "Infinite" quotaadmissionChecks:-dws-prov---apiVersion:kueue.x-k8s.io/v1beta1kind:LocalQueuemetadata:namespace:"default"name:"dws-local-queue"spec:clusterQueue:"dws-cluster-queue"---apiVersion:monitoring.googleapis.com/v1kind:PodMonitoringmetadata:labels:control-plane:controller-managername:controller-manager-metrics-monitornamespace:kueue-systemspec:endpoints:-path:/metricsport:8080scheme:httpinterval:30sselector:matchLabels:control-plane:controller-manager---This cluster's queue has high quota limits and only the flex-start with queued provisioning integration is enabled. For more information about Kueue APIs and how to set up limits, seeKueue concepts.
Deploy the LocalQueue:
kubectlcreate-f./dws-queues.yamlThe output is similar to the following:
resourceflavor.kueue.x-k8s.io/default-flavor createdadmissioncheck.kueue.x-k8s.io/dws-prov createdprovisioningrequestconfig.kueue.x-k8s.io/dws-config createdclusterqueue.kueue.x-k8s.io/dws-cluster-queue createdlocalqueue.kueue.x-k8s.io/dws-local-queue createdIf you want to run Jobs that use flex-start with queued provisioning in other namespaces, you can create additionalLocalQueues using the preceding template.
Run your Job
In the following manifest, the sample Job uses flex-start with queued provisioning:
apiVersion:batch/v1kind:Jobmetadata:name:sample-jobnamespace:defaultlabels:kueue.x-k8s.io/queue-name:dws-local-queueannotations:provreq.kueue.x-k8s.io/maxRunDurationSeconds:"600"spec:parallelism:1completions:1suspend:truetemplate:spec:nodeSelector:cloud.google.com/gke-nodepool:NODEPOOL_NAMEtolerations:-key:"nvidia.com/gpu"operator:"Exists"effect:"NoSchedule"containers:-name:dummy-jobimage:gcr.io/k8s-staging-perf-tests/sleep:v0.0.3args:["120s"]resources:requests:cpu:"100m"memory:"100Mi"nvidia.com/gpu:1limits:cpu:"100m"memory:"100Mi"nvidia.com/gpu:1restartPolicy:NeverThis manifest includes the following fields that are relevant for theflex-start with queued provisioning configuration:
- The
kueue.x-k8s.io/queue-name: dws-local-queuelabel tellsGKE that Kueue is responsible for orchestrating that Job. Thislabel also defines the queue where the Job is queued. - The flag
suspend: truetells GKE to create the Job resourcebut to not schedule the Pods yet. Kueue changes that flag tofalsewhenthe nodes are ready for the Job execution. nodeSelectortells GKE to schedule theJob only on the specified node pool. The value should matchNODEPOOL_NAME, the name of the node pool with queuedprovisioning enabled.
Run your Job:
kubectlcreate-f./job.yamlThe output is similar to the following:
job.batch/sample-job createdCheck the status of your Job:
kubectldescribejobsample-jobThe output is similar to the following:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Suspended 5m17s job-controller Job suspended Normal CreatedWorkload 5m17s batch/job-kueue-controller Created Workload: default/job-sample-job-7f173 Normal Started 3m27s batch/job-kueue-controller Admitted by clusterQueue dws-cluster-queue Normal SuccessfulCreate 3m27s job-controller Created pod: sample-job-9qsfd Normal Resumed 3m27s job-controller Job resumed Normal Completed 12s job-controller Job completed
The flex-start with queued provisioning with Kueue integration also supports other workloadtypes available in the open source ecosystem, like the following:
- RayJob
- JobSet v0.5.2 or later
- Kubeflow MPIJob, TFJob, PyTorchJob.
- Kubernetes Pods that are frequently used by workflow orchestrators
- Flux mini cluster
For more information about this support, seeKueue's batch user.
Create the Kueue resources for Reservation and Dynamic Workload Scheduler node pool setup
With the following manifest, you create twoResourceFlavors tied to two different node pools:reservation-nodepool anddws-nodepool. The name of these node pools are only exemplary names. Modify these names according to your node pool configuration.Additionally, with theClusterQueue configuration, incoming Jobs try to usereservation-nodepool, and if there is no capacity then these Jobs use Dynamic Workload Scheduler to get the GPU resources.
apiVersion:kueue.x-k8s.io/v1beta1kind:ResourceFlavormetadata:name:"reservation"spec:nodeLabels:cloud.google.com/gke-nodepool:"reservation-nodepool"# placeholder value---apiVersion:kueue.x-k8s.io/v1beta1kind:ResourceFlavormetadata:name:"dws"spec:nodeLabels:cloud.google.com/gke-nodepool:"dws-nodepool"# placeholder value---apiVersion:kueue.x-k8s.io/v1beta1kind:ClusterQueuemetadata:name:"cluster-queue"spec:namespaceSelector:{}# match all.resourceGroups:-coveredResources:["cpu","memory","nvidia.com/gpu"]flavors:-name:"reservation"# first we try reservationresources:-name:"cpu"nominalQuota:9-name:"memory"nominalQuota:36Gi-name:"nvidia.com/gpu"nominalQuota:9-name:"dws"# if reservation is saturated we try dwsresources:-name:"cpu"nominalQuota:1000000000# "Infinite" quota-name:"memory"nominalQuota:1000000000Gi# "Infinite" quota-name:"nvidia.com/gpu"nominalQuota:1000000000# "Infinite" quotaadmissionChecksStrategy:admissionChecks:-name:"dws-prov"onFlavors:[dws]---apiVersion:kueue.x-k8s.io/v1beta1kind:LocalQueuemetadata:namespace:"default"name:"user-queue"spec:clusterQueue:"cluster-queue"---apiVersion:kueue.x-k8s.io/v1beta1kind:AdmissionCheckmetadata:name:dws-provspec:controllerName:kueue.x-k8s.io/provisioning-requestparameters:apiGroup:kueue.x-k8s.iokind:ProvisioningRequestConfigname:dws-config---apiVersion:kueue.x-k8s.io/v1beta1kind:ProvisioningRequestConfigmetadata:name:dws-configspec:provisioningClassName:queued-provisioning.gke.iomanagedResources:-nvidia.com/gpuThis cluster's queue has high quota limits and only the flex-start with queued provisioning integration is enabled. For more information about Kueue APIs and how to set up limits, seeKueue concepts.
Deploy the manifest using the following command:
kubectlcreate-f./dws_and_reservation.yamlThe output is similar to the following:
resourceflavor.kueue.x-k8s.io/reservation createdresourceflavor.kueue.x-k8s.io/dws createdclusterqueue.kueue.x-k8s.io/cluster-queue createdlocalqueue.kueue.x-k8s.io/user-queue createdadmissioncheck.kueue.x-k8s.io/dws-prov createdprovisioningrequestconfig.kueue.x-k8s.io/dws-config createdRun your Job
Contrary to the preceding setup, this manifest does not include thenodeSelector field because it's filled by Kueue, depending on the free capacityin theClusterQueue.
apiVersion:batch/v1kind:Jobmetadata:generateName:sample-job-namespace:defaultlabels:kueue.x-k8s.io/queue-name:user-queueannotations:provreq.kueue.x-k8s.io/maxRunDurationSeconds:"600"spec:parallelism:1completions:1suspend:truetemplate:spec:tolerations:-key:"nvidia.com/gpu"operator:"Exists"effect:"NoSchedule"containers:-name:dummy-jobimage:gcr.io/k8s-staging-perf-tests/sleep:v0.0.3args:["120s"]resources:requests:cpu:"100m"memory:"100Mi"nvidia.com/gpu:1limits:cpu:"100m"memory:"100Mi"nvidia.com/gpu:1restartPolicy:NeverRun your Job:
kubectlcreate-f./job-without-node-selector.yamlThe output is similar to the following:
job.batch/sample-job-v8xwm created
To identify which node pool your Job uses, you need to find outwhat ResourceFlavor your Job uses.
Troubleshooting
For more information about Kueue's troubleshooting, seeTroubleshooting Provisioning Request in Kueue.
Flex-start with queued provisioning for Jobs without Kueue
Define a ProvisioningRequest object
Create a request through theProvisioning Request for each Job.Flex-start with queued provisioning doesn't start the Pods, it only provisions the nodes.
Create the following
provisioning-request.yamlmanifest:Standard
apiVersion:v1kind:PodTemplatemetadata:name:POD_TEMPLATE_NAMEnamespace:NAMESPACE_NAMElabels:cloud.google.com/apply-warden-policies:"true"template:spec:nodeSelector:cloud.google.com/gke-nodepool:NODEPOOL_NAMEcloud.google.com/gke-flex-start:"true"tolerations:-key:"nvidia.com/gpu"operator:"Exists"effect:"NoSchedule"containers:-name:piimage:perlcommand:["/bin/sh"]resources:limits:cpu:"700m"nvidia.com/gpu:1requests:cpu:"700m"nvidia.com/gpu:1restartPolicy:Never---apiVersion:autoscaling.x-k8s.io/API_VERSIONkind:ProvisioningRequestmetadata:name:PROVISIONING_REQUEST_NAMEnamespace:NAMESPACE_NAMEspec:provisioningClassName:queued-provisioning.gke.ioparameters:maxRunDurationSeconds:"MAX_RUN_DURATION_SECONDS"podSets:-count:COUNTpodTemplateRef:name:POD_TEMPLATE_NAMEReplace the following:
API_VERSION: The version of the API, eitherv1orv1beta1. We recommend usingv1for stability and access to the latestfeatures.NAMESPACE_NAME: The name of your Kubernetes namespace. The namespace must be the same as the namespace of the Pods.PROVISIONING_REQUEST_NAME: The name of theProvisioningRequest. You'll refer to this name in the Pod annotation.MAX_RUN_DURATION_SECONDS: Optionally, the maximumruntime of a node in seconds, up to the default of seven days. To learnmore, seeHow flex-start with queued provisioning works.You can't change this value aftercreation of the request. This field is available in GKEversion 1.28.5-gke.1355000 or later.COUNT: Number of Pods requested. The nodes are scheduled atomically in one zone.POD_TEMPLATE_NAME: The name of thePodTemplate.NODEPOOL_NAME: The name you choose for the node pool. Remove if you want to use an auto-provisioned node pool.
GKE might apply validations and mutations to Pods during their creation.The
Warning: The flex-start with queued provisioning integration supports only onecloud.google.com/apply-warden-policieslabel allows GKE to apply the same validations and mutations to PodTemplate objects.This label is necessary for GKE to calculate node resource requirements for your Pods.PodSetspec. If you want to mix different Pod templates, use the template that requeststhe most resources. Mixing different machine types, such as VMs with differentGPU types, is not supported.Node auto-provisioning
apiVersion:v1kind:PodTemplatemetadata:name:POD_TEMPLATE_NAMEnamespace:NAMESPACE_NAMElabels:cloud.google.com/apply-warden-policies:"true"template:spec:nodeSelector:cloud.google.com/gke-accelerator:GPU_TYPEcloud.google.com/gke-flex-start:"true"tolerations:-key:"nvidia.com/gpu"operator:"Exists"effect:"NoSchedule"containers:-name:piimage:perlcommand:["/bin/sh"]resources:limits:cpu:"700m"nvidia.com/gpu:1requests:cpu:"700m"nvidia.com/gpu:1restartPolicy:Never---apiVersion:autoscaling.x-k8s.io/API_VERSIONkind:ProvisioningRequestmetadata:name:PROVISIONING_REQUEST_NAMEnamespace:NAMESPACE_NAMEspec:provisioningClassName:queued-provisioning.gke.ioparameters:maxRunDurationSeconds:"MAX_RUN_DURATION_SECONDS"podSets:-count:COUNTpodTemplateRef:name:POD_TEMPLATE_NAMEReplace the following:
API_VERSION: The version of the API, eitherv1orv1beta1. We recommend usingv1for stability and access to the latestfeatures.NAMESPACE_NAME: The name of your Kubernetes namespace. The namespace must be the same as the namespace of the Pods.PROVISIONING_REQUEST_NAME: The name of theProvisioningRequest. You'll refer to this name in the Pod annotation.MAX_RUN_DURATION_SECONDS: Optionally, the maximumruntime of a node in seconds, up to the default of seven days. To learnmore, seeHow flex-start with queued provisioning works.You can't change this value aftercreation of the request. This field is available in GKEversion 1.28.5-gke.1355000 or later.COUNT: Number of Pods requested. The nodes are scheduled atomically in one zone.POD_TEMPLATE_NAME: The name of thePodTemplate.GPU_TYPE: the type of GPU hardware.
GKE might apply validations and mutations to Pods during their creation.The
Warning: The flex-start with queued provisioning integration supports only onecloud.google.com/apply-warden-policieslabel allows GKE to apply the same validations and mutations to PodTemplate objects.This label is necessary for GKE to calculate node resource requirements for your Pods.PodSetspec. If you want to mix different Pod templates, use the template that requeststhe most resources, or use theIdenticalWorkloadSchedulingRequirementsoption in thepodSetMergePolicyfeature to merge Pod templates that have identical scheduling requirements. Mixing different machine types, such as VMs with differentGPU types, is not supported.Apply the manifest:
kubectlapply-fprovisioning-request.yaml
Configure the Pods
This section usesKubernetes Jobs to configure the Pods. However, you can also use aKubernetes JobSet or any other framework like Kubeflow, Ray, or custom controllers. In theJob spec, link the Pods to theProvisioningRequest using the following annotations:
apiVersion:batch/v1kind:Jobspec:template:metadata:annotations:autoscaling.x-k8s.io/consume-provisioning-request:PROVISIONING_REQUEST_NAMEautoscaling.x-k8s.io/provisioning-class-name:"queued-provisioning.gke.io"spec:...The Pod annotation keyconsume-provisioning-request defines whichProvisioningRequest to consume. GKE uses theconsume-provisioning-request andprovisioning-class-name annotations to dothe following:
- To schedule the Pods only in the nodes provisioned by flex-start with queued provisioning.
- To avoid double counting of resource requests between Pods andflex-start with queued provisioning in the cluster autoscaler.
- To inject
safe-to-evict: falseannotation, to prevent the cluster autoscalerfrom moving Pods between nodes and interrupting batch computations. You canchange this behavior by specifyingsafe-to-evict: truein the Podannotations.
Observe the status of a Provisioning Request
The status of a Provisioning Request defines if a Pod can be scheduled or not.You can useKubernetes watchesto observe changes efficiently or other tooling you already use for trackingstatuses of Kubernetes objects. The following table describes the possible status ofa Provisioning Request request and each possible outcome:
| Provisioning Request status | Description | Possible outcome |
|---|---|---|
| Pending | The request was not seen and processed yet. | After processing, the request transitions toAccepted orFailed state. |
Accepted=true | The request is accepted and is waiting for resources to be available. | The request should transition toProvisioned state, if resources were found and nodes were provisioned or toFailed state if that was not possible. |
Provisioned=true | The nodes are ready. | You have 10 minutes tostart the Pods to consume provisioned resources. After this time, the cluster autoscaler considers the nodes as not needed and removes them. |
Failed=true | The nodes can't be provisioned due to errors.Failed=true is a terminal state. | Troubleshoot the condition based on the information in theReason andMessage fields of the condition.Create and retry a new Provisioning Request request. |
Provisioned=false | The nodes haven't been provisioned yet. | If If If |
Start the Pods
When the Provisioning Request request reaches theProvisioned=true status, you canrun your Jobto start the Pods. This avoids proliferation of unschedulable Pods for pendingor failed requests, which can impactkube-schedulerand cluster autoscaler performance.
Alternatively, if you don't care about having unschedulable Pods, you cancreate Pods in parallel with the Provisioning Request request.
Cancel the Provisioning Request request
To cancel the request before it's provisioned, you can delete theProvisioningRequest:
kubectldeleteprovreqPROVISIONING_REQUEST_NAME-nNAMESPACEIn most cases, deletingProvisioningRequest stops nodes from being created.However, depending on timing, for example if nodes were alreadybeing provisioned, the nodes might still end up created. In these cases, thecluster autoscaler removes the nodes after 10 minutes if no Pods are created.
Troubleshoot quota issues
All VMs provisioned by Provisioning Request requests usepreemptible quotas.
The number ofProvisioningRequests that are inAccepted state is limited bya dedicated quota. You configure the quota for each project, one quotaconfiguration per region.
Check quota in the Google Cloud console
To check the name of the quota limit and current usage in theGoogle Cloud console, follow these steps:
Go to theQuotas page in the Google Cloud console:
In theFilter box,select theMetric property, enter
active_resize_requests, and pressEnter.
The default value is 100. To increase the quota, follow the steps listed inRequest a quota adjustment.
Check if the Provisioning Request request is limited by quota
If your Provisioning Request request is taking longer than expected to befulfilled, check that the request isn't limited by quota. You might need torequest more quota.
For clusters running version 1.29.2-gke.1181000 or later, check whether specificquota limitations are preventing your request from being fulfilled:
kubectldescribeprovreqPROVISIONING_REQUEST_NAME\--namespaceNAMESPACEThe output is similar the following:
…Last Transition Time: 2024-01-03T13:56:08Z Message: Quota 'NVIDIA_P4_GPUS' exceeded. Limit: 1.0 in region europe-west4. Observed Generation: 1 Reason: QuotaExceeded Status: False Type: Provisioned…In this example, GKE can't deploy nodes because there isn'tenough quota in the region ofeurope-west4.
Migrate node pools from queued provisioning to flex-start
Theflex-start consumption option creates Flex-start VMs.To migrate existing node pools that were created by using the--enable-queued-provisioning flag to use flex-start, do the followingsteps:
Make sure that the node pool is empty:
kubectlgetnodes-lcloud.google.com/gke-nodepool=NODEPOOL_NAMEIf the command doesn't return any nodes, then you can update the nodepool to use Flex-start VMs.
If the command returns a list of nodes, you must firstmigrate the workloads to another node pool.
Update the node pool to Flex-start VMs:
gcloudcontainernode-poolsupdateNODEPOOL_NAME\--cluster=CLUSTER_NAME--flex-start
This operation does the following:
- Update the node pool to a Flex-start VMs node pool.
- Apply the pricing of nodes that use Flex-start VMs.
All nodes on clusters running on 1.32.2-gke.1652000 or later, the minimumversion for nodes that use Flex-start VMs, use short-lived upgrades.
What's next
- Learn more aboutGPUs in GKE.
- Learn how toDeploy GPU workloads in Autopilot.
- Learn how to runGPUs on Confidential GKE Nodes.
- Learn how torun a small batch workload with GPUs and flex-start provisioning mode.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.