Use workflows Stay organized with collections Save and categorize content based on your preferences.
You set up and run a workflow by:
- Creating a workflow template
- Configuring a managed (ephemeral) cluster or selecting an existing cluster
- Adding jobs
- Instantiating the template to run the workflow
Create a template
gcloud CLI
Run the followingcommand to create a Dataproc workflow template resource.
gcloud dataproc workflow-templates createTEMPLATE_ID \ --region=REGION
Notes:
- REGION: Specify theregion where your template will run.
- TEMPLATE_ID: Provide an ID for your template, such as, "workflow-template-1".
- CMEK encryption. You can add the--kms-keyflag to useCMEK encryption on workflow template job arguments.
REST API
Submit aWorkflowTemplate as partof aworkflowTemplates.createrequest. You can add theWorkflowTemplate.EncryptionConfig.kmsKey field to useCMEK encryptionon workflow template job arguments.kmsKey
Console
You can view existing workflow templates and instantiated workflows from the DataprocWorkflows page in Google Cloud console.
Configure or select a cluster
Dataproc can create and use a new, "managed" cluster for yourworkflow or an existing cluster.
Existing cluster: SeeUsing cluster selectors with workflowsto select an existing cluster for your workflow.
Managed cluster: You must configure a managed cluster foryour workflow. Dataproc will create this new cluster to runworkflow jobs, then delete the cluster at the end of the workflow.
You can configure a managed cluster for your workflow using the
gcloudcommand-line tool or the Dataproc API.Google Cloud CLI
Use flags inherited fromgcloud dataproc clustercreate to configure the managed cluster, such as the number of workers andthe master and worker machine type. Dataproc will add a suffix tothe cluster name to ensure uniqueness. You can use the
--service-accountflag to specify aVM service accountfor the managed cluster.gcloud dataproc workflow-templates set-managed-clusterTEMPLATE_ID \ --region=REGION \ --master-machine-type=MACHINE_TYPE \ --worker-machine-type=MACHINE_TYPE \ --num-workers=NUMBER \ --cluster-name=CLUSTER_NAME --service-account=SERVICE_ACCOUNT
REST API
SeeWorkflowTemplatePlacement.ManagedCluster, which you can provide as part of acompletedWorkflowTemplatesubmitted with aworkflowTemplates.createorworkflowTemplates.updaterequest.
You can use the
GceClusterConfig.serviceAccountfield to specify aVM service accountfor the managed cluster.Console
You can view existing workflow templates and instantiated workflows fromthe DataprocWorkflows page in Google Cloud console.
Add jobs to a template
All jobs run concurrently unless you specify one or more job dependencies. Ajob's dependencies are expressed as a list of other jobs that must finishsuccessfully before the ultimate job can start. You must provide astep-idfor each job. The ID must be unique within the workflow, but does not need to beunique globally.
gcloud CLI
Use job type and flags inherited fromgcloud dataproc jobs submitto define the job to add to the template. You can optionally use the‑‑start-afterjob-id of another workflow jobflag to have the job start after the completion of one or more other jobs in the workflow.
--max-failures-per-hour and--max-failures-per-hourrestartable job flagsare not supported in Dataproc workflow template jobs.Examples:
Add Hadoop job "foo" to the "my-workflow" template.
gcloud dataproc workflow-templates add-job hadoop \ --region=REGION \ --step-id=foo \ --workflow-template=my-workflow \ --space separated job args
Add job "bar" to the "my-workflow" template, which will be run after workflow job "foo" has completed successfully.
gcloud dataproc workflow-templates add-jobJOB_TYPE \ --region=REGION \ --step-id=bar \ --start-after=foo \ --workflow-template=my-workflow \ --space separated job args
Add another job "baz" to "my-workflow" template to be run after the successful completion of both "foo" and "bar" jobs.
gcloud dataproc workflow-templates add-jobJOB_TYPE \ --region=REGION \ --step-id=baz \ --start-after=foo,bar \ --workflow-template=my-workflow \ --space separated job args
REST API
SeeWorkflowTemplate.OrderedJob. This field is provided as part of acompletedWorkflowTemplatesubmitted with aworkflowTemplates.createorworkflowTemplates.updaterequest.
ThemaxFailuresPerHour andmaxFailuresTotalOrderedJob.JobScheduling fieldsare not supported in Dataproc workflow template jobs.Console
You can view existing workflow templates and instantiated workflows fromthe DataprocWorkflows page in Google Cloud console.
TheMax restarts per hourrestartable job optionis not supported in Dataproc workflow template jobs.Run a workflow
The instantiation of a workflow template runs the workflow defined by thetemplate. Multiple instantiations of a template are supported—youcan run a workflow multiple times.
gcloud command
gcloud dataproc workflow-templates instantiateTEMPLATE_ID \ --region=REGION
The command returns an operation ID, which you can use to track workflow status.
Example command and output:gcloud beta dataproc workflow-templates instantiate my-template-id \ --region=us-central1...WorkflowTemplate [my-template-id] RUNNING...Created cluster: my-template-id-rg544az7mpbfa.Job ID teragen-rg544az7mpbfa RUNNINGJob ID teragen-rg544az7mpbfa COMPLETEDJob ID terasort-rg544az7mpbfa RUNNINGJob ID terasort-rg544az7mpbfa COMPLETEDJob ID teravalidate-rg544az7mpbfa RUNNINGJob ID teravalidate-rg544az7mpbfa COMPLETED...Deleted cluster: my-template-id-rg544az7mpbfa.WorkflowTemplate [my-template-id] DONE
REST API
SeeworkflowTemplates.instantiate.Console
You can view existing workflow templates and instantiated workflows fromthe DataprocWorkflows page in Google Cloud console.
Workflow job failures
A failure in any job in a workflow will cause the workflow to fail.Dataproc will seek to mitigate the effect of failures by causing allconcurrently executing jobs to fail and preventing subsequent jobsfrom starting.
Monitor and list a workflow
gcloud CLI
To monitor a workflow:
gcloud dataproc operations describeOPERATION_ID \ --region=REGION
Note: The operation-id is returned when you instantiate the workflowwithgcloud dataproc workflow-templates instantiate (seeRun a workflow).
To list workflow status:
gcloud dataproc operations list \ --region=REGION \ --filter="labels.goog-dataproc-operation-type=WORKFLOW AND status.state=RUNNING"
REST API
To monitor a workflow, use the Dataprocoperations.get API.
To list running workflows, use the Dataprocoperations.listAPI with a label filter.
Console
You can view existing workflow templates and instantiated workflows fromthe DataprocWorkflows page in Google Cloud console.
Terminate a workflow
You can end a workflow using the Google Cloud CLI or by callingthe Dataproc API.
Note: Ending a workflow cancels running workflow jobs and, if theworkflow runs on a managed (ephemeral) cluster, deletes the managedcluster.gcloud command
gcloud dataproc operations cancelOPERATION_ID \ --region=REGION
gcloud dataproc workflow-templates instantiate (seeRun a workflow).REST API
See theoperations.cancelAPI.
Console
You can view existing workflow templates and instantiated workflows fromthe DataprocWorkflows page in Google Cloud console.
Update a workflow template
Updates don't affect running workflows. The new template version will onlyapply to new workflows.
gcloud CLI
Workflow templates can be updated by issuing newgcloud workflow-templates commands that reference an existing workflow template-id:
REST API
To make an update to a template with the REST API:
- CallworkflowTemplates.get, which returns the current template with the
versionfield filledin with the current server version. - Make updates to the fetched template.
- CallworkflowTemplates.update with the updated template.
workflowTemplate.version field.Console
You can view existing workflow templates and instantiated workflows fromthe DataprocWorkflows page in Google Cloud console.
Delete a workflow template
gcloud CLI
gcloud dataproc workflow-templates deleteTEMPLATE_ID \ --region=REGION
Note: The operation-id that is returned when you instantiate the workflowwithgcloud dataproc workflow-templates instantiate (seeRun a workflow).
REST API
SeeworkflowTemplates.delete.Console
You can view existing workflow templates and instantiated workflows fromthe DataprocWorkflows page in Google Cloud console.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.