Use Dataflow Runner v2

When you use Dataflow to run your pipeline, theDataflow runner uploads your pipeline code and dependencies to aCloud Storage bucket and creates a Dataflow job. ThisDataflow job runs your pipeline on managed resources inGoogle Cloud Platform.

  • For batch pipelines that use the Apache Beam Java SDK versions2.54.0 or later, Runner v2 is enabled by default.
  • For pipelines that use the Apache Beam Java SDK, Runner v2 is requiredwhen running multi-language pipelines, using custom containers,or using Spanner or Bigtable change stream pipelines. In othercases, use the default runner.
  • For pipelines that use the Apache Beam Python SDK versions2.21.0 or later, Runner v2 is enabled by default. For pipelines that use theApache Beam Python SDK versions 2.45.0 and later, DataflowRunner v2 is the only Dataflow runner available.
  • For the Apache Beam SDK for Go, Dataflow Runner v2 is theonly Dataflow runner available.

Runner v2 uses a services-based architecture that benefitsmany pipelines:

Limitations and restrictions

Dataflow Runner v2 has the following requirements and limitations:

  • Dataflow Runner v2 requiresStreaming Enginefor streaming jobs.
  • Because Dataflow Runner v2 requires Streaming Engine forstreaming jobs, any Apache Beam transform that requiresDataflow Runner v2 also requires the use of Streaming Enginefor streaming jobs. For example, thePub/Sub Lite I/Oconnectorfor the Apache Beam SDK for Python is a cross-language transform thatrequires Dataflow Runner v2. If you try to disable StreamingEngine for a job or template that uses this transform, the job fails.
  • For streaming pipelines that use the Apache Beam Java SDK, the classesMapStateandSetStateare not supported with Runner v2. To use theMapState andSetStateclasses with Java pipelines, enable Streaming Engine, disable Runner v2, anduse the Apache Beam SDK version 2.58.0 or later.
  • For batch and streaming pipelines that use the Apache Beam Java SDK, theclassAfterSynchronizedProcessingTimeisn't supported.
  • While Runner v2 scales better than Runner v1 in many cases, the memory usagemight be higher for fixed sharding.
  • Dataflowclassictemplates can't be runwith a different version of the Dataflow runner than they werebuilt with. This means that Google-provided classic templates can't enableRunner v2. To enable Runner v2 for custom templates, set the--experiments=use_runner_v2 flag when you build the template.
  • Due to a known autoscaling issue, Runner v2 is disabled by default for batchJava pipelines that requirestateful processing.You can still enable Runner v2 for those pipelines (seeEnable Runner v2), but pipelineperformance might be severely bottlenecked.

  • In some pipelines, Runner v2 can increase the frequency of consistencyfailures. You might see the following error in the log files: "Internalconsistency check failed, the output is likely incorrect. Please retry thejob". A possible mitigation is to add aReshuffle transform after theJoin/GrouByKey step. If the failure rate is not tolerable and themitigation does not solve the issue, trydisabling Runner v2.

Enable Runner v2

To enable Dataflow Runner v2, follow the configurationinstructions for your Apache Beam SDK.

Java

Dataflow Runner v2 requires the Apache Beam Java SDKversions 2.30.0 or later, with version 2.44.0 or later being recommended.

For batch pipelines that use the Apache Beam Java SDK versions2.54.0 or later, Runner v2 is enabled by default.

To enable Runner v2, run your job with theuse_runner_v2 experiment. Formore information, seeSet experimental pipeline options.

Python

For pipelines that use the Apache Beam Python SDK versions2.21.0 or later, Runner v2 is enabled by default.

Dataflow Runner v2 isn't supported with the Apache BeamPython SDK versions 2.20.0 and earlier.

In some cases, your pipeline might not use Runner v2 even thoughthe pipeline runs on a supported SDK version. To run the job with Runner v2,set theuse_runner_v2 experiment. For more information, seeSet experimental pipeline options.

Go

Dataflow Runner v2 is the only Dataflow runneravailable for the Apache Beam SDK for Go. Runner v2 is enabled by default.

Disable Runner v2

To disable Dataflow Runner v2, follow the configurationinstructions for your Apache Beam SDK.

Java

To disable Runner v2, set thedisable_runner_v2 experiment. For moreinformation, seeSet experimental pipeline options.

Python

Disabling Runner v2 is not supported with the Apache Beam Python SDKversions 2.45.0 and later.

For earlier versions of the Python SDK, if your job is identified as using theauto_runner_v2 experiment, you can disable Runner v2 by setting thedisable_runner_v2 experiment. For more information, seeSet experimental pipeline options.

Go

Dataflow Runner v2 can't be disabled in Go. Runner v2 is theonly Dataflow runner available for the Apache Beam SDK forGo.

Monitor your job

Use the monitoring interface to viewDataflow job metrics,such as memory utilization, CPU utilization, and more.

Worker VM logs are available through theLogs Explorer and theDataflow monitoring interface.Worker VM logs include logs from the runner harness process and logs from the SDKprocesses. You can use the VM logs to troubleshoot your job.

Troubleshoot Runner v2

To troubleshoot jobs using Dataflow Runner v2, followstandard pipeline troubleshooting steps.The following list provides additional information about howDataflow Runner v2 works:

  • Dataflow Runner v2 jobs run two types of processes on theworker VM: SDK process and the runner harness process. Depending on thepipeline and VM type, there might be one or more SDK processes, but there isonly one runner harness process per VM.
  • SDK processes run user code and other language-specific functions. Therunner harness process manages everything else.
  • The runner harness process waits for all SDK processes to connect to it beforestarting to request work from Dataflow.
  • Jobs might be delayed if the worker VM downloads and installs dependenciesduring the SDK process startup. If issues occur during an SDK process, such aswhen starting up or installing libraries, the worker reports its status asunhealthy. If the startup times increase, enable the Cloud Build API on yourproject and submit your pipeline with the following parameter:--prebuild_sdk_container_engine=cloud_build.
  • Because Dataflow Runner v2 uses checkpointing, each worker mightwait for up to five seconds while buffering changes before sending thechanges for further processing. As a result, latency of approximately sixseconds is expected.
Note: The pre-build feature requires the Apache Beam SDK for Python, version2.25.0 or later.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.