Deploy a batch system using Kueue

This tutorial shows you how to optimize available resources by schedulingJobs onGoogle Kubernetes Engine (GKE) withKueue. In thistutorial, you learn to use Kueue to effectively manage and schedule batch jobs,improve resource utilization, and simplify workload management. You set upa shared cluster for two tenant teams where each team has its own namespaceand each team creates Jobs that share global resources. You also configureKueue to schedule the Jobs based on resource quotas that you define.

This tutorial is for Cloud architects and Platform engineerswho are interested in implementing a batch system using GKE.To learn more about common roles and example tasks referenced in Google Cloudcontent, seeCommon GKE user roles and tasks.

Before reading this page, ensure that you're familiar with the following:

Background

Jobs are applications that run to completion, such as machine learning, rendering,simulation, analytics, CI/CD, and similar workloads.

Kueue is a cloud-native Job scheduler that works with the defaultKubernetes scheduler, the Job controller, and the cluster autoscaler to provide anend-to-end batch system. Kueue implements Job queueing, deciding when Jobs should waitand when they should start, based on quotas and a hierarchy for sharing resourcesfairly among teams.

Kueue has the following characteristics:

  • It is optimized for cloud architectures, where resources areheterogeneous, interchangeable, and scalable.
  • It provides a set of APIs to manage elastic quotas and manage Job queueing.
  • It does not re-implement existing capabilities such as autoscaling,pod scheduling, or Job lifecycle management.
  • Kueue has built-in support for the Kubernetesbatch/v1.Job API.
  • It can integrate with other job APIs.

Kueue refers to jobs defined with any API as Workloads, to avoid the confusion with the specific Kubernetes Job API.

Objectives

  1. Create a GKE cluster
  2. Create theResourceFlavor
  3. Create theClusterQueue
  4. Create theLocalQueue
  5. Create Jobs and observe the admitted workloads

Costs

This tutorial uses the following billable components of Google Cloud:

Use thePricing Calculator togenerate a cost estimate based on your projected usage.

When you finish this tutorial, avoid continued billing by deleting theresources you created. For more information, seeClean up.

Before you begin

Set up your project

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, clickCreate project to begin creating a new Google Cloud project.

    Roles required to create a project

    To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the GKE API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

  5. In the Google Cloud console, on the project selector page, clickCreate project to begin creating a new Google Cloud project.

    Roles required to create a project

    To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the GKE API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

Set defaults for the Google Cloud CLI

  1. In the Google Cloud console, start a Cloud Shell instance:
    Open Cloud Shell

  2. Download the source code for this sample app:

    gitclonehttps://github.com/GoogleCloudPlatform/kubernetes-engine-samplescdkubernetes-engine-samples/batch/kueue-intro
  3. Set the default environment variables:

    gcloudconfigsetprojectPROJECT_IDgcloudconfigsetcompute/regionCONTROL_PLANE_LOCATION

    Replace the following values:

    • PROJECT_ID: your Google Cloudproject ID.
    • CONTROL_PLANE_LOCATION: the Compute Engineregion of the control plane of your cluster.

Create a GKE cluster

  1. Create a GKE Autopilot cluster namedkueue-autopilot:

    gcloudcontainerclusterscreate-autokueue-autopilot\--release-channel"rapid"--locationCONTROL_PLANE_LOCATION

    Autopilot clusters are fully managed, and have built-in autoscaling.Learn more aboutGKE Autopilot.

    Kueue also supports Standard GKE with NodeAuto-provisioning and regular autoscaled node pools.

    Note: Autopilot cluster creation can take up to five minutes to complete.

    The outcome is similar to the following once the cluster is created:

      NAME: kueue-autopilot  LOCATION: us-central1  MASTER_VERSION: 1.26.2-gke.1000  MASTER_IP: 35.193.173.228  MACHINE_TYPE: e2-medium  NODE_VERSION: 1.26.2-gke.1000  NUM_NODES: 3STATUS: RUNNING

    Where theSTATUS isRUNNING for thekueue-autopilot.

  2. Get authentication credentials for the cluster:

    gcloudcontainerclustersget-credentialskueue-autopilot
  3. Install Kueue on the cluster:

    VERSION=VERSIONkubectlapply--server-side-f\https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml

    ReplaceVERSION with the latest version of Kueue. For more information about Kueue versions, seeKueue releases.

  4. Wait until the Kueue Pods are ready:

    watchkubectl-nkueue-systemgetpods

    The output should be similar to the following before you can continue:

    NAME                                        READY   STATUS    RESTARTS   AGEkueue-controller-manager-66d8bb946b-wr2l2   2/2     Running   0          3m36s
    Note: This step may take up to three minutes.
  5. Create two new namespaces calledteam-a andteam-b:

    kubectlcreatenamespaceteam-akubectlcreatenamespaceteam-b

Create the ResourceFlavor

A ResourceFlavor is an object that represents the variations in the nodes available in your cluster byassociating them with node labels and taints. For example, you can use ResourceFlavors torepresent VMs with different provisioning guarantees (for example, spot versus on-demand), architectures(for example, x86 versus ARM CPUs), brands and models (for example, Nvidia A100 versus T4 GPUs).

In this tutorial, thekueue-autopilot cluster has homogeneous resources.As a result, create a single ResourceFlavor for CPU, memory, ephemeral-storage,and GPUs, with no labels or taints.

Deploy the ResourceFlavor:

kubectlapply-fflavors.yaml

Create the ClusterQueue

A ClusterQueue is a cluster-scoped object that manages a pool of resources suchas CPU, memory, GPU. It manages the ResourceFlavors, and limits the usage anddictates the order in which workloads are admitted.

apiVersion:kueue.x-k8s.io/v1beta1kind:ClusterQueuemetadata:name:cluster-queuespec:namespaceSelector:{}# Available to all namespacesqueueingStrategy:BestEffortFIFO# Default queueing strategyresourceGroups:-coveredResources:["cpu","memory","nvidia.com/gpu","ephemeral-storage"]flavors:-name:"default-flavor"resources:-name:"cpu"nominalQuota:10-name:"memory"nominalQuota:10Gi-name:"nvidia.com/gpu"nominalQuota:10-name:"ephemeral-storage"nominalQuota:10Gi

Deploy the ClusterQueue:

kubectlapply-fcluster-queue.yaml

The order of consumption is determined by.spec.queueingStrategy, where there are two configurations:

  • BestEffortFIFO

    • The default queueing strategy configuration.
    • The workload admission follows the first in first out (FIFO) rule, but if there is not enough quotato admit the workload at the head of the queue, the next one in line is tried.
  • StrictFIFO

    • Guarantees FIFO semantics.
    • Workload at the head of the queue can block queueing until the workload can be admitted.

Incluster-queue.yaml, you create a new ClusterQueue calledcluster-queue. ThisClusterQueue manages four resources,cpu,memory,nvidia.com/gpu andephemeral-storage with the flavor created inflavors.yaml.The quota is consumed by the requests in the workload Pod specs.

Each flavor includes usage limits represented as.spec.resourceGroups[].flavors[].resources[].nominalQuota. In this case, the ClusterQueue admitsworkloads if and only if:

  • The sum of the CPU requests is less than or equal to 10
  • The sum of the memory requests is less than or equal to 10Gi
  • The sum of GPU requests is less than or equal to 10
  • The sum of the storage used is less than or equal to 10Gi

Create the LocalQueue

A LocalQueue is a namespaced object that accepts workloads from users in the namespace.LocalQueues from different namespaces can point tothe same ClusterQueue where they can share the resources' quota. In this case,LocalQueue from namespaceteam-a andteam-b points to the same ClusterQueuecluster-queue under.spec.clusterQueue.

apiVersion:kueue.x-k8s.io/v1beta1kind:LocalQueuemetadata:namespace:team-a# LocalQueue under team-a namespacename:lq-team-aspec:clusterQueue:cluster-queue# Point to the ClusterQueue---apiVersion:kueue.x-k8s.io/v1beta1kind:LocalQueuemetadata:namespace:team-b# LocalQueue under team-b namespacename:lq-team-bspec:clusterQueue:cluster-queue# Point to the ClusterQueue

Each team sends their workloads to the LocalQueue in their own namespace.Which are then allocated resources by the ClusterQueue.

Deploy the LocalQueues:

kubectlapply-flocal-queue.yaml

Create Jobs and observe the admitted workloads

In this section, you create Kubernetes Jobs in the namespaceteam-a. A Job controller in Kubernetes creates one or more Pods and ensures that they successfully execute a specific task.

The Job in the namespaceteam-a has the following attributes:

  • It points to thelq-team-a LocalQueue.
  • It requests GPU resources by setting thenodeSelector field tonvidia-tesla-t4.
  • It is composed of three Pods that sleep for 10 seconds in parallel. Jobs arecleaned up after 60 seconds according to the value defined in thettlSecondsAfterFinished field.
  • It requires 1,500 milliCPU, 1536 Mi of memory, 1,536 Mi of ephemeral storage,and three GPUs since there are three Pods.
apiVersion:batch/v1kind:Jobmetadata:namespace:team-a# Job under team-a namespacegenerateName:sample-job-team-a-annotations:kueue.x-k8s.io/queue-name:lq-team-a# Point to the LocalQueuespec:ttlSecondsAfterFinished:60# Job will be deleted after 60 secondsparallelism:3# This Job will have 3 replicas running at the same timecompletions:3# This Job requires 3 completionssuspend:true# Set to true to allow Kueue to control the Job when it startstemplate:spec:nodeSelector:cloud.google.com/gke-accelerator:"nvidia-tesla-t4"# Specify the GPU hardwarecontainers:-name:dummy-jobimage:gcr.io/k8s-staging-perf-tests/sleep:latestargs:["10s"]# Sleep for 10 secondsresources:requests:cpu:"500m"memory:"512Mi"ephemeral-storage:"512Mi"nvidia.com/gpu:"1"limits:cpu:"500m"memory:"512Mi"ephemeral-storage:"512Mi"nvidia.com/gpu:"1"restartPolicy:Never

Jobs are also created under the filejob-team-b.yaml where its namespace belongs toteam-b, with requests torepresent different teams with different needs.

To learn more, seedeploying GPU workloads in Autopilot.

  1. In a new terminal, observe the status of the ClusterQueue that refreshes every two seconds:

    watch-n2kubectlgetclusterqueuecluster-queue-owide
  2. In a new terminal, observe the status of the nodes:

    watch-n2kubectlgetnodes-owide
  3. In a new terminal, create Jobs to LocalQueue from namespaceteam-a andteam-b every 10 seconds:

    ./create_jobs.shjob-team-a.yamljob-team-b.yaml10
  4. Observe the Jobs being queued up, admitted in the ClusterQueue, and nodes being brought up with GKE Autopilot.

    Note: It is normal to see a warning for the Pods with messageUnschedulable while Nodes are scaling up.
  5. Obtain a Job from namespaceteam-a:

    kubectl-nteam-agetjobs

    The outcome is similar to the following:

    NAME                      COMPLETIONS   DURATION   AGEsample-job-team-b-t6jnr   3/3           21s        3m27ssample-job-team-a-tm7kc   0/3                      2m27ssample-job-team-a-vjtnw   3/3           30s        3m50ssample-job-team-b-vn6rp   0/3                      40ssample-job-team-a-z86h2   0/3                      2m15ssample-job-team-b-zfwj8   0/3                      28ssample-job-team-a-zjkbj   0/3                      4ssample-job-team-a-zzvjg   3/3           83s        4m50s
  6. Copy a Job name from the previous step and observe the admission statusand events for a Job through the Workloads API:

    kubectl-nteam-adescribeworkloadJOB_NAME
  7. When the pending Jobs start increasing from the ClusterQueue, end the script by pressingCTRL + C on the running script.

  8. Once all Jobs are completed, notice the nodes being scaled down.

    Note: The scale down process can take up to two minutes.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.
  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

Delete the individual resource

  1. Delete the Kueue quota system:

    kubectldelete-nteam-alocalqueuelq-team-akubectldelete-nteam-blocalqueuelq-team-bkubectldeleteclusterqueuecluster-queuekubectldeleteresourceflavordefault-flavor
  2. Delete the Kueue manifest:

    VERSION=VERSIONkubectldelete-f\https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
  3. Delete the cluster:

    gcloudcontainerclustersdeletekueue-autopilot--location=CONTROL_PLANE_LOCATION

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.