Dataflow regions

The Dataflow region stores and handles metadata about yourDataflow job and deploys and controls your Dataflowworkers.

Region names follow a standard convention based onCompute Engine region names.For example, the name for the Central US region isus-central1.

This feature is available in all regions where Dataflow is supported. To see available locations, readDataflow locations.

Guidelines for choosing a region

Use the following guidelines to choose an appropriate region for your job.

Security and compliance

You might need to constrain Dataflow job processing to a specificgeographic region in support of the security and compliance needs of your project.

Data locality

You can minimize network latency and network transport costs by running aDataflow job from the same region as its sources, sinks, staging file locations,and temporary file locations. If you use sources, sinks, staging file locations,or temporary file locations that are locatedoutside of your job's region,your data might be sent across regions.

Note: Starting with Beam SDK version 2.44.0, Dataflow does notsupport running jobs with workers in a region that is different from thejob region.

In running a pipeline, user data is only handled by the Dataflow worker pooland the movement of the data is restricted to the network paths that connectthe Dataflow workers in the pool.

Although user data is strictly handled by Dataflow workers in theirassigned geographic region, pipeline log messages are stored inCloud Logging, which has a single global presence inGoogle Cloud.

If you need more control over the location of pipeline log messages, you can do the following:

  1. Create an exclusion filterfor the_Default log router sink to prevent Dataflow logs from being exported to the_Default log bucket.
  2. Create a log bucket in the region of your choice.
  3. Configure a new log router sink that exports your Dataflow logs to your new log bucket.

To learn more about configuring logging, seeRouting and storage overviewandLog routing overview.

Notes about common Dataflow job sources:

  • When using a Cloud Storage bucket as a source, we recommend that youperformread operations in the sameregion as the bucket.
  • Pub/Sub topics, when published to the globalPub/Sub endpoint, are stored in the nearest Google Cloud region.However, you can modify the topic storage policy to a specificregion or a set of regions.Similarly,Pub/Sub Litetopics support only zonal storage.

Resilience and geographic separation

You might want to isolate your normal Dataflow operations fromoutages that could occur in othergeographic regions.Or, you might need to plan alternate sites for business continuity in the eventof a region-wide disaster.

In yourdisaster recovery and business continuity plans,we recommend incorporating details for sources and sinks used with yourDataflow jobs. TheGoogle Cloud sales team canhelp you work towards meeting your requirements.

Regional placement

By default, the region that you select configures the Dataflowworker pool to utilize all available zones within the region. Zone selection iscalculated for each worker at its creation time, optimizing for resourceacquisition and utilization of unusedreservations.

Regional placement offers benefits such as:

  • Improved resource availability: Dataflow jobs are moreresilient tozonal resource availabilityerrors, because workers can continue to be created in other zones withremaining availability.
  • Improved reliability: In the event of a zonal failure, Dataflowjobs can continue to run, because workers are recreated in other zones.

The following limitations apply:

  • Regional placement is supported only for jobs using Streaming Engine orDataflow Shuffle. Jobs that have opted out of Streaming Engineor Dataflow Shuffle cannot use regional placement.
  • Regional placement applies to VMs only, and doesn't apply to backend resources.
  • VMs are not replicated across multiple zones. If a VM becomes unavailable,for example, its work items are considered lost and are reprocessed by another VM.
  • If Compute Engine does not have capacity in the configured region,the Dataflow service cannot create any more VMs.
  • If Compute Engine does not have capacity in one or more zones in theconfigured region, the Dataflow service might fail to start ajob.

View job resource zones

Dataflow jobs depend on internal resources. Some of thesebackend job resources are zonal. If a single zone fails and a zonal resource necessaryfor your Dataflow job is in that zone, the job might fail.

To understand whether a job failed because of a zonal outage,review the service zones that your job's backend resources are using.This feature is only availablefor Streaming Engine jobs.

  • To view the service zones in theGoogle Cloud console, use theService zones field in theJob info panel.

  • To use the API to review the service zones, use theServiceResourcesfield.

The values in this field update throughout the duration ofthe job, because the resources that the job uses change while the job runs.

Automatic zone placement

For jobs not supported for regional placement, the best zone within the regionis selected automatically, based on the availablezone capacity at the time of the job creation request. Automatic zone selectionhelps ensure that job workers run in the best zone for your job.

Because the job is configured to run in a single zone, the operation might failwith azonal resource availabilityerror if sufficient Compute Engine resources are not available.If capacity is exhausted in a region, you might see aZONE_RESOURCE_POOL_EXHAUSTEDerror. You can implement a retry loop to start the job when resources areavailable.

Also, when a zone is unavailable, the streaming backend can also becomeunavailable, which might result in data loss.

Specify a region

Note: Region configuration requires Apache Beam SDK version 2.0.0 or higher.

To specify a region for your job, set the--region option to one ofthesupported regions.The--region option overrides the default region that is set in the metadataserver, your local client, or the environment variables.

TheDataflow command-line interfacealso supports the--region option to specify regions.

Override the worker region or zone

By default, when you submit a job with the--region option,workers are automatically assigned toeitherzones across the region or thesingle best zone within the region, depending on the job type.

In cases where you want to ensure that the workers for yourDataflow job run strictly in a specific zone, you can specifythe zone using the followingpipeline option.This usage pattern is uncommon for Dataflow jobs.

This option only controls the zone used for the Dataflow workers.It doesn't apply to backendresources. Backend resources might be created in any zone within the job region.

Java

--workerZone

Python

--worker_zone

Go

--worker_zone

For all other cases, we don't recommend overriding the worker location. Thecommon scenarios table contains usage recommendationsfor these situations.

Because the job is configured to run in a single zone, the operation might failwith azonal resource availabilityerror if sufficient Compute Engine resources are not available.

Caution: If you override the worker zone and the workers are in a differentregion than the job region, there might be a negative impact on performance,network traffic, network latency, and network cost.

You can run thegcloud compute regions list command to see a listing ofregions and zones that are available for worker deployment.

Common scenarios

The following table contains usage recommendations for common scenarios.

ScenarioRecommendation
I want to use a supported region and have no zone preference within the region. In this case, the best zone is automatically selected based on available capacity.Use--region to specify a job region. This ensures that Dataflow manages your job and processes data within the specified region.
I need worker processing to occur in a specific zone of a region.Specify both--region and either--workerZone or--worker_zone.

Use--region to specify the job region. Use either--workerZone or--worker_zone to specify the specificzone within that region.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.