Run a Batch job using Workflows

Batch is a fully managed service that lets youschedule, queue, and executebatch processing workloads on Compute Engine virtual machine (VM) instances.Batch provisions resources and manages capacity on yourbehalf, allowing your batch workloads to run at scale.

Workflows allows you to execute the services youneed in an order that you define described using theWorkflows syntax.

In this tutorial, you use theWorkflows connector for Batchto schedule and run a Batch job that executes six tasks inparallel on two Compute Engine VMs. Using both Batchand Workflows allows you to combine the advantages they offerand efficiently provision and orchestrate the entire process.

Objectives

In this tutorial you will:

  1. Create an Artifact Registry repository for a Docker container image.
  2. Get the code for the batch processing workload from GitHub: a sample programthat generates prime numbers in batches of 10,000.
  3. Build the Docker image for the workload.
  4. Deploy and execute a workflow that does the following:
    1. Creates a Cloud Storage bucket to store the results of the primenumber generator.
    2. Schedules and runs a Batch job that runs the Dockercontainer as six tasks in parallel on two Compute Engine VMs.
    3. Optionally deletes the Batch job after it has completed.
  5. Confirm that the results are as expected and that the batches of generatedprime numbers are stored in Cloud Storage.

You can run most of the following commands in the Google Cloud console, or run allthe commands using the Google Cloud CLI in either your terminal or Cloud Shell.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use thepricing calculator.

New Google Cloud users might be eligible for afree trial.

Before you begin

Security constraints defined by your organization might prevent you from completing the following steps. For troubleshooting information, seeDevelop applications in a constrained Google Cloud environment.

Console

  1. In the Google Cloud console, on the project selector page, select orcreate a Google Cloud project.

    Note: If you don't plan to keep the resources that you create in thisprocedure, create a project instead of selecting an existing project. Afteryou finish these steps, you can delete the project, removing all resourcesassociated with the project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project. Learn how tocheck if billing is enabled on a project.

  3. Enable the Artifact Registry, Batch, Cloud Build,Compute Engine, Workflow Executions, and Workflows APIs.

    Enable the APIs

  4. Create a service account for your workflow to use for authentication withother Google Cloud services and grant it the appropriate roles:

    1. In the Google Cloud console, go to theCreate service accountpage.

      Go to Create service account

    2. Select your project.

    3. In theService account name field, enter a name. The Google Cloud consolefills in theService account ID field based on this name.

      In theService account description field, enter a description. Forexample,Service account for tutorial.

    4. ClickCreate and continue.

    5. In theSelect a role list, filter for the following roles to grantto the user-managed service account you created in the previous step:

      • Batch job editor: to edit Batch jobs.
      • Logs Writer: to write logs.
      • Storage Admin: to control Cloud Storage resources.

      For additional roles, clickAdd another role and add each additional role.

      Note: TheRole field affects which resources your service accountcan access in your project. You can revoke these roles or grantadditional roles later. In production environments, do not grant theOwner, Editor, or Viewer roles. Instead, grant apredefined role orcustom role that meets yourneeds.
    6. ClickContinue.

    7. To finish creating the account, clickDone.

  5. Grant the IAM Service Account User role on the defaultservice account to the user-managed service account created in the previousstep. After you enabled the Compute Engine API, the default service accountis the Compute Engine default service account(PROJECT_NUMBER-compute@developer.gserviceaccount.com),and the permission is typically assigned through theroles/iam.serviceAccountUser role.

    Caution: Assigning theService Account User roleindirectly grants the role associated with the default service account tothe user. For example, if the default service account has the Editor role,the user can then "act as" an Editor. To minimize the impact of these roleassignments, we recommend configuring the default service accountaccording tothe principle of least privilege.For more information, seeDisable automatic role grants to default service accounts.
    1. On theService Accounts page, click the email address of thedefault service account(PROJECT_NUMBER-compute@developer.gserviceaccount.com).

    2. Click thePermissions tab.

    3. Click theGrantaccess button.

    4. To add a new principal, enter the email address of your service account(SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com).

    5. In theSelect a role list, select theService Accounts>Service Account User role.

    6. ClickSave.

gcloud

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, aCloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

  2. Make sure that billing is enabled for your Google Cloud project.Learn how tocheck if billing is enabled on a project.

  3. Enable the Artifact Registry, Batch, Cloud Build,Compute Engine Workflow Executions, and Workflows APIs.

    gcloudservicesenableartifactregistry.googleapis.com\batch.googleapis.com\cloudbuild.googleapis.com\compute.googleapis.com\workflowexecutions.googleapis.com\workflows.googleapis.com
  4. Create a service account for your workflow to use for authentication withother Google Cloud services and grant it the appropriate roles.

    1. Create the service account:

      gcloudiamservice-accountscreateSERVICE_ACCOUNT_NAME

      ReplaceSERVICE_ACCOUNT_NAME with a name forthe service account.

    2. Grant roles to the user-managed service account you created in theprevious step. Run the following command once for each of the followingIAM roles:

      • roles/batch.jobsEditor: to edit Batch jobs.
      • roles/logging.logWriter: to write logs.
      • roles/storage.admin: to control Cloud Storage resources.
      gcloudprojectsadd-iam-policy-bindingPROJECT_ID\--member=serviceAccount:SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com\--role=ROLE

      Replace the following:

      • PROJECT_ID: the project ID where you createdthe service account
      • ROLE: the role to grant
      Note: The--role flag affects which resources your service accountcan access in your project. You can revoke these roles or grantadditional roles later. In production environments, do not grant theOwner, Editor, or Viewer roles. Instead, grant apredefined role orcustom role that meets yourneeds.
  5. Grant the IAM Service Account User role on the defaultservice account to the user-managed service account you created in theprevious step. After you enabled the Compute Engine API, the default serviceaccount is the Compute Engine default service account(PROJECT_NUMBER-compute@developer.gserviceaccount.com),and the permission is typically assigned through theroles/iam.serviceAccountUser role.

    Caution: Assigning theService Account User roleindirectly grants the role associated with the default service account tothe user. For example, if the default service account has the Editor role,the user can then "act as" an Editor. To minimize the impact of these roleassignments, we recommend configuring the default service accountaccording tothe principle of least privilege.For more information, seeDisable automatic role grants to default service accounts.
    PROJECT_NUMBER=$(gcloudprojectsdescribePROJECT_ID--format='value(projectNumber)')gcloudiamservice-accountsadd-iam-policy-binding\$PROJECT_NUMBER-compute@developer.gserviceaccount.com\--member=serviceAccount:SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com\--role=roles/iam.serviceAccountUser

Create an Artifact Registry repository

Create a repository to store your Docker container image.

Console

  1. In the Google Cloud console, go to theRepositories page.

    Go to Repositories

  2. ClickCreate Repository.

  3. Entercontainers as the repository name.

  4. ForFormat, chooseDocker.

  5. ForLocation Type, chooseRegion.

  6. In theRegion list, selectus-central1.

  7. ClickCreate.

gcloud

Run the following command:

  gcloud artifacts repositories create containers \    --repository-format=docker \    --location=us-central1

You have created an Artifact Registry repository namedcontainers in theus-central1 region. For more information about supported regions, seeArtifact Registry locations.

Get the code samples

Google Cloud stores the application source code for this tutorial inGitHub. You can clone that repository or download the samples.

  1. Clone the sample app repository to your local machine:

    git clone https://github.com/GoogleCloudPlatform/batch-samples.git

    Alternatively, you candownload the samples in themain.zip file and extract it.

  2. Change to the directory that contains the sample code:

    cd batch-samples/primegen

You now have the source code for the application in your development environment.

Build the Docker image using Cloud Build

TheDockerfile contains the information needed to build a Docker imageusing Cloud Build. Run the following command to build it:

gcloud builds submit \  -t us-central1-docker.pkg.dev/PROJECT_ID/containers/primegen-service:v1 PrimeGenService/

ReplacePROJECT_ID with your Google Cloudproject ID.

When the build is complete, you should see output similar to the following:

DONE----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ID: a54818cc-5d14-467b-bfda-5fc9590af68cCREATE_TIME: 2022-07-29T01:48:50+00:00DURATION: 48SSOURCE: gs://project-name_cloudbuild/source/1659059329.705219-17aee3a424a94679937a7200fab15bcf.tgzIMAGES: us-central1-docker.pkg.dev/project-name/containers/primegen-service:v1STATUS: SUCCESS

Using a Dockerfile, you've built a Docker image namedprimegen-service andpushed the image to an Artifact Registry repository namedcontainers.

Deploy a workflow that schedules and runs a Batch job

The following workflow schedules and runs a Batch job that runsa Docker container as six tasks in parallel on two Compute Engine VMs. Theresult is the generation of six batches of prime numbers, stored in aCloud Storage bucket.

Console

  1. In the Google Cloud console, go to theWorkflowspage.

    Go to Workflows

  2. ClickCreate.

  3. Enter a name for the new workflow, such asbatch-workflow.

  4. In theRegion list, selectus-central1.

  5. Select theService account you previously created.

  6. ClickNext.

  7. In the workflow editor, enter the following definition for your workflow:

    YAML

    main:params:[args]steps:-init:assign:-projectId:${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}-region:"us-central1"-imageUri:${region + "-docker.pkg.dev/" + projectId + "/containers/primegen-service:v1"}-jobId:${"job-primegen-" + string(int(sys.now()))}-bucket:${projectId + "-" + jobId}-createBucket:call:googleapis.storage.v1.buckets.insertargs:query:project:${projectId}body:name:${bucket}-logCreateBucket:call:sys.logargs:data:${"Created bucket " + bucket}-logCreateBatchJob:call:sys.logargs:data:${"Creating and running the batch job " + jobId}-createAndRunBatchJob:call:googleapis.batch.v1.projects.locations.jobs.createargs:parent:${"projects/" + projectId + "/locations/" + region}jobId:${jobId}body:taskGroups:taskSpec:runnables:-container:imageUri:${imageUri}environment:variables:BUCKET:${bucket}# Run 6 tasks on 2 VMstaskCount:6parallelism:2logsPolicy:destination:CLOUD_LOGGINGresult:createAndRunBatchJobResponse# You can delete the batch job or keep it for debugging-logDeleteBatchJob:call:sys.logargs:data:${"Deleting the batch job " + jobId}-deleteBatchJob:call:googleapis.batch.v1.projects.locations.jobs.deleteargs:name:${"projects/" + projectId + "/locations/" + region + "/jobs/" + jobId}result:deleteResult-returnResult:return:jobId:${jobId}bucket:${bucket}

    JSON

    {"main":{"params":["args"],"steps":[{"init":{"assign":[{"projectId":"${sys.get_env(\"GOOGLE_CLOUD_PROJECT_ID\")}"},{"region":"us-central1"},{"imageUri":"${region + \"-docker.pkg.dev/\" + projectId + \"/containers/primegen-service:v1\"}"},{"jobId":"${\"job-primegen-\" + string(int(sys.now()))}"},{"bucket":"${projectId + \"-\" + jobId}"}]}},{"createBucket":{"call":"googleapis.storage.v1.buckets.insert","args":{"query":{"project":"${projectId}"},"body":{"name":"${bucket}"}}}},{"logCreateBucket":{"call":"sys.log","args":{"data":"${\"Created bucket \" + bucket}"}}},{"logCreateBatchJob":{"call":"sys.log","args":{"data":"${\"Creating and running the batch job \" + jobId}"}}},{"createAndRunBatchJob":{"call":"googleapis.batch.v1.projects.locations.jobs.create","args":{"parent":"${\"projects/\" + projectId + \"/locations/\" + region}","jobId":"${jobId}","body":{"taskGroups":{"taskSpec":{"runnables":[{"container":{"imageUri":"${imageUri}"},"environment":{"variables":{"BUCKET":"${bucket}"}}}]},"taskCount":6,"parallelism":2},"logsPolicy":{"destination":"CLOUD_LOGGING"}}},"result":"createAndRunBatchJobResponse"}},{"logDeleteBatchJob":{"call":"sys.log","args":{"data":"${\"Deleting the batch job \" + jobId}"}}},{"deleteBatchJob":{"call":"googleapis.batch.v1.projects.locations.jobs.delete","args":{"name":"${\"projects/\" + projectId + \"/locations/\" + region + \"/jobs/\" + jobId}"},"result":"deleteResult"}},{"returnResult":{"return":{"jobId":"${jobId}","bucket":"${bucket}"}}}]}}
  8. ClickDeploy.

gcloud

  1. Create a source code file for your workflow:

    touchbatch-workflow.JSON_OR_YAML

    ReplaceJSON_OR_YAML withyaml orjsondepending on the format of your workflow.

  2. In a text editor, copy the following workflow to your source code file:

    YAML

    main:params:[args]steps:-init:assign:-projectId:${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}-region:"us-central1"-imageUri:${region + "-docker.pkg.dev/" + projectId + "/containers/primegen-service:v1"}-jobId:${"job-primegen-" + string(int(sys.now()))}-bucket:${projectId + "-" + jobId}-createBucket:call:googleapis.storage.v1.buckets.insertargs:query:project:${projectId}body:name:${bucket}-logCreateBucket:call:sys.logargs:data:${"Created bucket " + bucket}-logCreateBatchJob:call:sys.logargs:data:${"Creating and running the batch job " + jobId}-createAndRunBatchJob:call:googleapis.batch.v1.projects.locations.jobs.createargs:parent:${"projects/" + projectId + "/locations/" + region}jobId:${jobId}body:taskGroups:taskSpec:runnables:-container:imageUri:${imageUri}environment:variables:BUCKET:${bucket}# Run 6 tasks on 2 VMstaskCount:6parallelism:2logsPolicy:destination:CLOUD_LOGGINGresult:createAndRunBatchJobResponse# You can delete the batch job or keep it for debugging-logDeleteBatchJob:call:sys.logargs:data:${"Deleting the batch job " + jobId}-deleteBatchJob:call:googleapis.batch.v1.projects.locations.jobs.deleteargs:name:${"projects/" + projectId + "/locations/" + region + "/jobs/" + jobId}result:deleteResult-returnResult:return:jobId:${jobId}bucket:${bucket}

    JSON

    {"main":{"params":["args"],"steps":[{"init":{"assign":[{"projectId":"${sys.get_env(\"GOOGLE_CLOUD_PROJECT_ID\")}"},{"region":"us-central1"},{"imageUri":"${region + \"-docker.pkg.dev/\" + projectId + \"/containers/primegen-service:v1\"}"},{"jobId":"${\"job-primegen-\" + string(int(sys.now()))}"},{"bucket":"${projectId + \"-\" + jobId}"}]}},{"createBucket":{"call":"googleapis.storage.v1.buckets.insert","args":{"query":{"project":"${projectId}"},"body":{"name":"${bucket}"}}}},{"logCreateBucket":{"call":"sys.log","args":{"data":"${\"Created bucket \" + bucket}"}}},{"logCreateBatchJob":{"call":"sys.log","args":{"data":"${\"Creating and running the batch job \" + jobId}"}}},{"createAndRunBatchJob":{"call":"googleapis.batch.v1.projects.locations.jobs.create","args":{"parent":"${\"projects/\" + projectId + \"/locations/\" + region}","jobId":"${jobId}","body":{"taskGroups":{"taskSpec":{"runnables":[{"container":{"imageUri":"${imageUri}"},"environment":{"variables":{"BUCKET":"${bucket}"}}}]},"taskCount":6,"parallelism":2},"logsPolicy":{"destination":"CLOUD_LOGGING"}}},"result":"createAndRunBatchJobResponse"}},{"logDeleteBatchJob":{"call":"sys.log","args":{"data":"${\"Deleting the batch job \" + jobId}"}}},{"deleteBatchJob":{"call":"googleapis.batch.v1.projects.locations.jobs.delete","args":{"name":"${\"projects/\" + projectId + \"/locations/\" + region + \"/jobs/\" + jobId}"},"result":"deleteResult"}},{"returnResult":{"return":{"jobId":"${jobId}","bucket":"${bucket}"}}}]}}
  3. Deploy the workflow by entering the following command:

    gcloudworkflowsdeploybatch-workflow\--source=batch-workflow.yaml\--location=us-central1\--service-account=SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com

    ReplaceSERVICE_ACCOUNT_NAME with the name ofthe service account you previously created.

Execute the workflow

Executing a workflow runs the current workflow definition associated with theworkflow.

Console

  1. In the Google Cloud console, go to theWorkflowspage.

    Go to Workflows

  2. On theWorkflows page, click thebatch-workflow workflow to go to its details page.

  3. On theWorkflow details page, clickExecute.

  4. ClickExecute again.

    The workflow execution should take a few minutes.

    Note: If you encounter anIAM permission denied for service accounterror in thecreateBucket step, this is likely due to apropagationdelay. Wait a minute or two, andthen try executing the workflow again.
  5. View the results of the workflow in theOutput pane.

    The results should look similar to the following:

    {  "bucket": "project-name-job-primegen-TIMESTAMP",  "jobId": "job-primegen-TIMESTAMP"}

gcloud

  1. Execute the workflow:

    gcloudworkflowsrunbatch-workflow\--location=us-central1

    The workflow execution should take a few minutes.

    Note: If you encounter anIAM permission denied for service accounterror in thecreateBucket step, this is likely due to apropagationdelay. Wait a minute or two, andthen try executing the workflow again.
  2. You cancheck the status of a long running execution.

  3. To get the status of the last completed execution, run the followingcommand:

    gcloudworkflowsexecutionsdescribe-last

    The results should be similar to the following:

    name: projects/PROJECT_NUMBER/locations/us-central1/workflows/batch-workflow/executions/EXECUTION_IDresult: '{"bucket":"project-name-job-primegen-TIMESTAMP","jobId":"job-primegen-TIMESTAMP"}'startTime: '2022-07-29T16:08:39.725306421Z'state: SUCCEEDEDstatus:  currentSteps:  - routine: main    step: returnResultworkflowRevisionId: 000001-9ba

List the objects in the output bucket

You can confirm that the results are as expected by listing the objects in yourCloud Storage output bucket.

Console

  1. In the Google Cloud console, go to the Cloud StorageBuckets page.

    Go to Buckets

  2. In the bucket list, click on the name of the bucket whose contents youwant to view.

    The results should be similar to the following, with six files in total,and each listing a batch of 10,000 prime numbers:

    primes-1-10000.txtprimes-10001-20000.txtprimes-20001-30000.txtprimes-30001-40000.txtprimes-40001-50000.txtprimes-50001-60000.txt

gcloud

  1. Retrieve your output bucket name:

    gcloudstoragels

    The output is similar to the following:

    gs://PROJECT_ID-job-primegen-TIMESTAMP/

  2. List the objects in your output bucket:

    gcloudstoragelsgs://PROJECT_ID-job-primegen-TIMESTAMP/**--recursive

    ReplaceTIMESTAMP with the timestamp returned bythe previous command.

    The output should be similar to the following, with six files in total,and each listing a batch of 10,000 prime numbers:

    gs://project-name-job-primegen-TIMESTAMP/primes-1-10000.txtgs://project-name-job-primegen-TIMESTAMP/primes-10001-20000.txtgs://project-name-job-primegen-TIMESTAMP/primes-20001-30000.txtgs://project-name-job-primegen-TIMESTAMP/primes-30001-40000.txtgs://project-name-job-primegen-TIMESTAMP/primes-40001-50000.txtgs://project-name-job-primegen-TIMESTAMP/primes-50001-60000.txt

Clean up

If you created a new project for this tutorial,delete the project.If you used an existing project and wish to keep it without the changes addedin this tutorial,delete resources created for the tutorial.

Delete the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

Delete resources created in this tutorial

  1. Delete the Batch job:

    1. First retrieve the job name:

      gcloudbatchjobslist--location=us-central1

      The output should be similar to the following:

      NAME: projects/project-name/locations/us-central1/jobs/job-primegen-TIMESTAMPSTATE: SUCCEEDED

      Wherejob-primegen-TIMESTAMP is the name of the Batchjob.

    2. Delete the job:

      gcloudbatchjobsdeleteBATCH_JOB_NAME--locationus-central1
  2. Delete the workflow:

    gcloudworkflowsdeleteWORKFLOW_NAME
  3. Delete the container repository:

    gcloudartifactsrepositoriesdeleteREPOSITORY_NAME--location=us-central1
  4. Cloud Build uses Cloud Storage to store build resources.To delete a Cloud Storage bucket, refer toDelete buckets.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.