Dynamic work rebalancing

The Dynamic Work Rebalancing feature of the Dataflow service allows theservice to dynamically repartition work based on runtime conditions. Theseconditions might include the following:

Imbalances in work assignments
Workers taking longer than expected to finish
Workers finishing faster than expected

The Dataflow service automatically detects these conditions andcan dynamically assign work to unused or underused workers to decreasethe overall processing time of your job.

Limitations

Dynamic work rebalancing only happens when the Dataflow service isprocessing some input data in parallel: when reading data from an external inputsource, when working with a materialized intermediatePCollection, or whenworking with the result of an aggregation likeGroupByKey. If a large numberof steps in your job arefused, your job has fewerintermediatePCollections, and dynamic work rebalancing islimited to the number of elements in the source materializedPCollection. Ifyou want to ensure that dynamic work rebalancing can be applied to a particularPCollection in your pipeline, you canprevent fusion in a fewdifferent ways to ensure dynamic parallelism.

Dynamic work rebalancing cannot reparallelize data finer than a single record.If your data contains individual records that cause large delays in processingtime, they might still delay your job. Dataflow can'tsubdivide and redistribute an individual "hot" record to multiple workers.

Java

If you set a fixed number of shards for the final output of your pipeline (forexample, by writing data usingTextIO.Write.withNumShards),Dataflow limits parallelization based on the number ofshards that you choose.

Python

If you set a fixed number of shards for the final output of your pipeline (forexample, by writing data usingbeam.io.WriteToText(..., num_shards=...)),Dataflow limits parallelization based on the number ofshards that you choose.

Go

If you set a fixed number of shards for the final output of your pipeline,Dataflow limits parallelization based on the number of shardsthat you choose.

Note: The fixed-shards limitation can be considered temporary, and might besubject to change in future releases of the Dataflow service.

Working with Custom Data Sources

Java

If your pipeline uses a custom data source that you provide, you mustimplement the methodsplitAtFraction to allow your source to work with thedynamic work rebalancing feature.

Caution: Using dynamic work rebalancing with custom data sources is anadvanced use case. If you choose to implementsplitAtFraction, it'scritical that you test your code extensively and with maximum code coverage.

If you implementsplitAtFraction incorrectly, records from your source mightappear to get duplicated or dropped. See the API reference information on RangeTracker for help and tips onimplementingsplitAtFraction.

Python

If your pipeline uses a custom data source that you provide, yourRangeTracker must implementtry_claim,try_split,position_at_fraction, andfraction_consumed to allow your source to workwith the dynamic work rebalancing feature.

See theAPI reference information on RangeTrackerfor more information.

Go

If your pipeline uses a custom data source that you provide, you mustimplement a validRTracker to allow your source to work with the dynamicwork rebalancing feature.

For more information, see theRTracker API reference information.

Dynamic work rebalancing uses the return value of thegetProgress()method of your custom source to activate. The default implementation forgetProgress() returnsnull. To ensure autoscaling activates, make sure your custom source overridesgetProgress() to return an appropriate value.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Dynamic work rebalancing Stay organized with collections Save and categorize content based on your preferences.

Limitations

Java

Python

Go

Working with Custom Data Sources

Java

Python

Go

Dynamic work rebalancing