Use YAML files with workflows

You can define a workflow template in a YAML file, then instantiate the templateto run the workflow. You can also import and export a workflow template YAMLfile to create and update a Dataproc workflow template resource.

Also seeUsing inline Dataproc workflows for other ways to run a workflow without creating a workflowtemplate resource.

Run a workflow using a YAML file

To run a workflow without first creating a workflow template resource,use thegcloud dataproc workflow-templates instantiate-from-filecommand.

  1. Define your workflow template in a YAML file. The YAML file must include allrequiredWorkflowTemplatefields except theid field, and it must also excludetheversion field and all output-only fields.In the following workflow example, theprerequisiteStepIds list intheterasort step ensures theterasortstep will only begin after theteragen step completessuccessfully.
    jobs:- hadoopJob:    args:    - teragen    - '1000'    - hdfs:///gen/    mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar  stepId: teragen- hadoopJob:    args:    - terasort    - hdfs:///gen/    - hdfs:///sort/    mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar  stepId: terasort  prerequisiteStepIds:    - teragenplacement:  managedCluster:    clusterName: my-managed-cluster    config:      gceClusterConfig:        zoneUri: us-central1-a
  2. Run the workflow:
    gcloud dataproc workflow-templates instantiate-from-file \    --file=TEMPLATE_YAML \    --region=REGION

Instantiate a workflow using a YAML file with Dataproc Auto Zone Placement

  1. Define your workflow template in a YAML file. This YAML file is the same as the previous YAML file, except thezoneUri field is set to the empty string ('') to allow DataprocAuto Zone Placement to select the zone for the cluster.
    jobs:- hadoopJob:    args:    - teragen    - '1000'    - hdfs:///gen/    mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar  stepId: teragen- hadoopJob:    args:    - terasort    - hdfs:///gen/    - hdfs:///sort/    mainJarFileUri: file:///usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar  stepId: terasort  prerequisiteStepIds:    - teragenplacement:  managedCluster:    clusterName: my-managed-cluster    config:      gceClusterConfig:        zoneUri: ''
  2. Run the workflow. When using Auto Placement, you must pass aregion to thegcloud command.
    gcloud dataproc workflow-templates instantiate-from-file \    --file=TEMPLATE_YAML \    --region=REGION

Import and export a workflow template YAML file

You can import and export workflow template YAML files. Typically, a workflowtemplate is first exported as a YAML file, then the YAML is edited, and thenthe edited YAML file is imported to update the template.

  1. Export the workflow templateto a YAML file. During the export operation,theid andversion fields, and all output-only fieldsare filtered from the output and do not appear in theexported YAML file.

    gcloud dataproc workflow-templates exportTEMPLATE_ID or TEMPLATE_NAME \    --destination=TEMPLATE_YAML \    --region=REGION
    You can pass either theWorkflowTemplateid or the fully qualified template resourcename("projects/PROJECT_ID/regions/REGION/workflowTemplates/TEMPLATE_ID") to the command.If you omit the--destination flag,the output is directed tostdout, so the following commandwill also export the template to a YAML file:
    gcloud dataproc workflow-templates exportTEMPLATE_ID or TEMPLATE_NAME \    --region=REGION >TEMPLATE_YAML

  2. Edit the YAML file locally. Note that theid,version,and output-only fields, which were filteredfrom the YAML file when the template was exported, are disallowed in theimported YAML file.

  3. Import the updated workflow templateYAML file:

    gcloud dataproc workflow-templates importTEMPLATE_ID or TEMPLATE_NAME \    --source=TEMPLATE_YAML \    --region=REGION
    You can pass either theWorkflowTemplateid or the fully qualified template resourcename("projects/PROJECT_ID/regions/region/workflowTemplates/TEMPLATE_ID") to the command. The template resource with the same template name will be overwritten (updated)and its version number will be incremented. If a template with the same templatename does not exist, it will be created.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.