Monitor and debug workflows

This page provides information to help you monitor and debugDataprocworkflows.

List workflows

An instantiatedWorkflowTemplate is a called a "workflow" and is modeled as an "operation."

Run the followinggcloud command to list your project's workflows:

gcloud dataproc operations list \    --region=region \    --filter="operationType = WORKFLOW"
...OPERATION_NAME                                                DONEprojects/.../operations/07282b66-2c60-4919-9154-13bd4f03a1f2  Trueprojects/.../operations/1c0b0fd5-839a-4ad4-9a57-bbb011956690  True

Here's a sample request to list all workflows started from a"terasort" template:

gcloud dataproc operations list \    --region=region \    --filter="labels.goog-dataproc-workflow-template-id=terasort"
...OPERATION_NAME                                     DONEprojects/.../07282b66-2c60-4919-9154-13bd4f03a1f2  Trueprojects/.../1c0b0fd5-839a-4ad4-9a57-bbb011956690  True

Note that only the UUID portion ofOPERATION_NAME is used in subsequentqueries.

Using WorkflowMetadata

Theoperation.metadata field provides information to help you diagnoseworkflow failures.

Here's a sampleWorkflowMetadata,including a graph of nodes (jobs), embedded in an operation:

{  "name": "projects/my-project/regions/us-central1/operations/671c1d5d-9d24-4cc7-8c93-846e0f886d6e",  "metadata": {    "@type": "type.googleapis.com/google.cloud.dataproc.v1.WorkflowMetadata",    "template": "terasort",    "version": 1,    "createCluster": {      "operationId": "projects/my-project/regions/us-central1/operations/8d472870-4a8b-4609-9f7d-48daccb028fc",      "Done": true    },    "graph": {      "nodes": [        {          "stepId": "teragen",          "jobId": "teragen-vtrprwcgepyny",          "state": "COMPLETED"        },        {          "stepId": "terasort",          "prerequisiteStepIds": [            "teragen"          ],          "jobId": "terasort-vtrprwcgepyny",          "state": "FAILED",          "error": "Job failed"        },        {          "stepId": "teravalidate",          "prerequisiteStepIds": [            "terasort"          ],          "state": "FAILED",          "error": "Skipped, node terasort failed"        }      ]    },    "deleteCluster": {      "operationId": "projects/my-project/regions/us-central1/operations/9654c67b-2642-4142-a145-ca908e7c81c9",      "Done": true    },    "state": "DONE",    "clusterName": "terasort-cluster-vtrprwcgepyny"  },  "done": true,  "error": {    "message": "Workflow failed"  }}Done!

Retrieve a template

As shown in the previous example, themetadata contains the template idand version.

"template": "terasort","version": 1,

If a template is not deleted, instantiated template versions can be retrievedby a describe-with-version request.

gcloud dataproc workflow-templates describe terasort \    --region=region \    --version=1

List cluster operations started by a template:

gcloud dataproc operations list \    --region=region \    --filter="labels.goog-dataproc-workflow-instance-id = 07282b66-2c60-4919-9154-13bd4f03a1f2"
...OPERATION_NAME                                     DONEprojects/.../cf9ce692-d6c9-4671-a909-09fd62041024  Trueprojects/.../1bbaefd9-7fd9-460f-9adf-ee9bc448b8b7  True

Here's a sample request to list jobs submitted from a template:

gcloud dataproc jobs list \    --region=region \    --filter="labels.goog-dataproc-workflow-template-id = terasort"
...JOB_ID                TYPE     STATUSterasort2-ci2ejdq2ta7l6  pyspark  DONEterasort2-ci2ejdq2ta7l6  pyspark  DONEterasort1-ci2ejdq2ta7l6  pyspark  DONEterasort3-3xwsy6ubbs4ak  pyspark  DONEterasort2-3xwsy6ubbs4ak  pyspark  DONEterasort1-3xwsy6ubbs4ak  pyspark  DONEterasort3-ajov4nptsllti  pyspark  DONEterasort2-ajov4nptsllti  pyspark  DONEterasort1-ajov4nptsllti  pyspark  DONEterasort1-b262xachbv6c4  pyspark  DONEterasort1-cryvid3kreea2  pyspark  DONEterasort1-ndprn46nesbv4  pyspark  DONEterasort1-yznruxam4ppxi  pyspark  DONEterasort1-ttjbhpqmw55t6  pyspark  DONEterasort1-d7svwzloplbni  pyspark  DONE

List jobs submitted from a workflow instance:

gcloud dataproc jobs list \    --region=region \    --filter="labels.goog-dataproc-workflow-instance-id = 07282b66-2c60-4919-9154-13bd4f03a1f2"
...JOB_ID                TYPE     STATUSterasort3-ci2ejdq2ta7l6  pyspark  DONEterasort2-ci2ejdq2ta7l6  pyspark  DONEterasort1-ci2ejdq2ta7l6  pyspark  DONE

Workflow timeouts

You can set a workflow timeout that will cancel the workflow if the workflow'sjobs do not finish within the timeout period. The timeout period applies to theDAG (Directed Acyclic Graph) of jobs in the workflow (the sequence of jobs in the workflow), not to theentire workflow operation. The timeout period starts when the first workflow jobstarts—it does not include the time taken to create amanaged cluster.If any job is running at the end of the timeout period, all running jobs arestopped, the workflow is ended, and if the workflow was running on amanaged cluster, the cluster is deleted.

Benefit: Use this feature to avoid having to manually end aworkflow that does not complete due to stuck jobs.

Set a workflow template timeout

You can set a workflow template timeout period when youcreate a workflow template. You can also add a workflow timeoutto an existing workflow template byupdating the workflow template.

Workflow timeouts apply to any workflowinstantiated after the timeout is set on the template. Existing workflows are not affectedby a new or updated template.

gcloud

To set a workflow timeout on a new template, use the--dag-timeout flag with thegcloud dataproc workflow-templates create command. You can use "s", "m", "h", and "d" suffixesto set second, minute, hour, and day duration values, respectively. The timeoutduration must be from 10 minutes ("10m") to 24 hours ("24h" or "1d").

gcloud dataproc workflow-templates createtemplate-id (such as "my-workflow") \    --region=region \    --dag-timeout=duration (from "10m" to "24h" or "1d"") \    ... other args ...

API

To set a workflow timeout, complete theWorkflowTemplatedagTimeoutfield as part of aworkflowTemplates.create request.

Console

Currently, the Google Cloud console does not support creating aworkflow template.

Update a workflow template timeout

You can update an existing workflow template to change, add, or removea workflow timeout.

Workflow timeouts apply to any workflowinstantiated after the timeout is set on the template. Existing workflows are not affectedby a new or updated template.

gcloud

Adding or changing a workflow timeout

To add or change a workflow timeout on an existing template, use the--dag-timeout flag with thegcloud dataproc workflow-templates set-dag-timeoutcommand. You can use "s", "m", "h", and "d" suffixes to set second, minute,hour, and day duration values, respectively. The timeout duration must be from10 minutes ("10m") to 24 hours ("24h").

gcloud dataproc workflow-templates set-dag-timeouttemplate-id (such as "my-workflow") \    --region=region \    --dag-timeout=duration (from "10m" to "24h" or "1d")

Removing a workflow timeout

To remove a workflow timeout from an existing template, use thegcloud dataproc workflow-templates remove-dag-timeoutcommand.

gcloud dataproc workflow-templates remove-dag-timeouttemplate-id (such as "my-workflow") \    --region=region

API

Adding or changing a workflow timeout

To add or change a workflow timeout on an existing template,update the workflow templateby filling in the template'sdagTimeoutfield with the new or changed timeout value.

Removing a workflow timeout

To remove a workflow timeout from an existing template,update the workflow templateby removing the template'sdagTimeoutfield.

Console

Currently, the Google Cloud console does not support updating aworkflow template.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.