Use Streaming Engine for streaming jobs Stay organized with collections Save and categorize content based on your preferences.
Dataflow's Streaming Enginemoves pipeline execution out of the worker virtual machines (VMs) and into theDataflow service backend. When you don't use Streaming Enginefor streaming jobs, the Dataflow runner executes the steps of yourstreaming pipeline entirely on worker VMs, consuming worker CPU,memory, and Persistent Disk storage.
Streaming Engine is enabled by default for the following pipelines:
- Streaming pipelines that use the Apache Beam Python SDK version 2.21.0 or laterand Python 3.
- Streaming pipelines that use the Apache Beam Go SDK version 2.33.0 or later.
To learn more about the implementation of Streaming Engine, seeStreaming Engine: Execution Model for Highly-Scalable, Low-Latency Data Processing.
Benefits
The Streaming Engine model has the following benefits:
- Reduced CPU, memory, and Persistent Disk storage resource usageon the worker VMs. Streaming Engine works best with smaller worker machinetypes (
n1-standard-2instead ofn1-standard-4). It doesn't requirePersistent Disk beyond a small worker boot disk, leading to less resourceand quota consumption. - More responsiveHorizontal Autoscalingin response to variations in incoming data volume. Streaming Engine offerssmoother, more granular scaling of workers.
- Improved supportability, because you don't need to redeploy your pipelines toapply service updates.
Most of the reduction in worker resources comes from offloading the work to theDataflow service. For that reason, there is acharge associated with the use ofStreaming Engine.
Support and limitations
- For the Java SDK, Streaming Engine requires the Apache Beam SDK version 2.10.0or later.
- For the Python SDK, Streaming Engine requires the Apache Beam SDK version2.16.0 or later.
- For the Go SDK, Streaming Engine requires the Apache Beam SDK version2.33.0 or later.
- You can'tupdate pipelinesthat are already running to use Streaming Engine.If your pipeline is running in production without Streaming Engine and you want to useStreaming Engine, stop your pipeline by using the DataflowDrain option. Then, specifythe Streaming Engine parameter, and rerun your pipeline.
- For jobs that use Streaming Engine, the aggregated input datafor the open windows has a limit of 60 GB per key.Aggregated input data includes bothbuffered elements andcustom state.When a pipeline exceeds this limit, the pipeline becomes stuckwith high system lag, and a message in the job log indicates that the limit has been exceeded.As a best practice, avoid pipeline designs that result in large keys. Formore information, seeWriting Dataflow pipelines with scalability in mind.
- Supportscustomer-managed encryption keys (CMEK)
Use Streaming Engine
This feature is available in all regions where Dataflow is supported. To see available locations, readDataflow locations.
Java
Streaming Engine requires the Apache Beam Java SDK version 2.10.0or later.
To use Streaming Engine for your streaming pipelines, specify the followingparameter:
--enableStreamingEngineif you're using Apache Beam SDK for Javaversions 2.11.0 or later.--experiments=enable_streaming_engineif you're using Apache Beam SDKfor Java version 2.10.0.
If you use Dataflow Streaming Engine for your pipeline, don'tspecify the--zone parameter. Instead, specify the--region parameter andset the value to asupported region.Dataflow auto-selects the zone in the region youspecified. If you do specify the--zone parameter and set it to a zoneoutside of the available regions, Dataflow reports an error.
Streaming Engine works best with smaller core worker machine types. Use thejob type to determine whether to use a high memory worker machine type.Example machine types that we recommend include--workerMachineType=n1-standard-2and--workerMachineType=n1-highmem-2. You can also set--diskSizeGb=30because Streaming Engine only needs space for the worker boot image and locallogs. These values are the default values.
Python
Streaming Engine requires the Apache Beam Python SDK version2.16.0 or later.
Streaming Engine is enabled by default for new Dataflow streamingpipelines when the following conditions are met:
- Pipelines use the Apache Beam Python SDK version 2.21.0or later and Python 3.
- Customer-managed encryption keysare not used.
- Dataflow workers are in the sameregion as yourDataflow job.
In Python SDK version 2.45.0 or later, you cannot disable Streaming Enginefor streaming pipelines. In Python SDK version 2.44.0 or earlier, to disableStreaming Engine, specify the following parameter:
--experiments=disable_streaming_engine
If you use Python 2, to enable Streaming Engine, specify thefollowing parameter:
--enable_streaming_engine
If you use Dataflow Streaming Engine in your pipeline, don'tspecify the--zone parameter. Instead, specify the--region parameter andset the value to asupported region.Dataflow auto-selects the zone in the region youspecified. If you specify the--zone parameter and set it to a zoneoutside of the available regions, Dataflow reports an error.
Streaming Engine works best with smaller core worker machine types. Use thejob type to determine whether to use a high memory worker machine type.Example machine types that we recommend include--workerMachineType=n1-standard-2and--workerMachineType=n1-highmem-2. You can also set--disk_size_gb=30because Streaming Engine only needs space for the worker boot image and locallogs. These values are the default values.
Go
Streaming Engine requires the Apache Beam Go SDK version2.33.0 or later.
Streaming Engine is enabled by default for new Dataflow streamingpipelines that use the Apache Beam Go SDK.
If you want to disable Streaming Engine in your Go streaming pipeline,specify the following parameter. This parameter must be specified everytimeyou want to disable Streaming Engine.
--experiments=disable_streaming_engine
If you use Dataflow Streaming Engine in your pipeline, don'tspecify the--zone parameter. Instead, specify the--region parameter andset the value to asupported region.Dataflow auto-selects the zone in the region youspecified. If you specify the--zone parameter and set it to a zoneoutside of the available regions, Dataflow reports an error.
Streaming Engine works best with smaller core worker machine types. Use thejob type to determine whether to use a high memory worker machine type.Example machine types that we recommend include--workerMachineType=n1-standard-2and--workerMachineType=n1-highmem-2. You can also set--disk_size_gb=30because Streaming Engine only needs space for the worker boot image and locallogs. These values are the default values.
gcloud CLI
When you run your pipeline by using thegcloud dataflow jobs runcommand or thegcloud dataflow flex-template runcommand, to enable Streaming Engine, use the following flag:
--enable-streaming-engine
To disable streaming engine, use the following flag:
--additional-experiments=disable_streaming_engine
REST
When you run your pipeline by using theprojects.locations.jobs.createmethod in the REST API, use theJob resource to enableor disable Streaming Engine. To enable Streaming Engine,underenvironment, set theexperiments field toenable_streaming_engine:
"environment":{"experiments":"enable_streaming_engine"}To disable Streaming Engine,underenvironment, set theexperiments field todisable_streaming_engine:
"environment":{"experiments":"disable_streaming_engine"}Pricing
Dataflow Streaming Engine offers a resource-based billing modelwhere you're billed for the total resources that are consumed by your job.With resource-based billing, the Streaming Engine resourcesconsumed by your job are metered and measured inStreaming Engine Compute Units.You're billed for worker CPU, worker memory, and Streaming Engine Compute Units.
Use resource-based billing
To use resource-based billing, when youstart or update your job, include the followingDataflow service option.
Java
--dataflowServiceOptions=enable_streaming_engine_resource_based_billingPython
--dataflow_service_options=enable_streaming_engine_resource_based_billingGo
--dataflow_service_options=enable_streaming_engine_resource_based_billingData-processed billing (legacy)
Unless youenable resource-based billing,your jobs are billed by using the legacydata-processed billing.
Verify the billing model
Unless you're using Dataflow Prime, when you have jobs that use resource-based billing, the billincludes the SKUStreaming Engine Compute Unit. When you have jobs that usedata-processed billing, the bill includes the SKUStreaming Engine data processed.If you have some jobs that use resource-based billing andother jobs that use data-processed billing,the bill includes both SKUs.
When you use Dataflow Prime with resource-based billing, theData Compute Unit (DCU) SKU is used.
To see which pricing model your job uses, in theDataflow monitoring interface,select your job. If your job uses resource-based billing, theJob info side panel includes aStreaming Engine Compute Units field.
If you have questions about your billing, contactCloud Customer Care.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.