Request quotas Stay organized with collections Save and categorize content based on your preferences.
The Dataflow service fully manages resources in Google Cloudon a per-job basis. This includes spinning up and shutting downCompute Engine instances (occasionally referred to asworkers orVMs) and accessing your project'sCloud Storage buckets for both I/O and temporary filestaging. However, if your pipeline interacts with Google Cloud datastorage technologies likeBigQuery andPub/Sub, you must manage the resources and quota for thoseservices.
Dataflow uses a user provided location inCloud Storage specifically for staging files. This locationis under your control, and you should ensure that the location's lifetime ismaintained as long as any job is reading from it. You can re-use the samestaging location for multiple job runs, as the SDK's built-in caching can speedup the start time for your jobs.
Caution: Manually altering Dataflow-managedCompute Engine resources associated with a Dataflow job isan unsupported operation. You should not attempt to manually stop, delete, orotherwise control the Compute Engine instances thatDataflow has created to run your job. In addition, you should notalter any persistent disk resources associated with your Dataflowjob.Jobs
You may run up to 25 concurrentDataflow jobs per Google Cloud project; however, thislimit can be increased by contactingGoogle Cloud Platform Support. For moreinformation, seeQuotas.
The Dataflow service is currently limited to processing JSON jobrequests that are 20 MB in size or smaller. The size of the job request isspecifically tied to the JSON representation of your pipeline; a larger pipelinemeans a larger request.
To estimate the size of your pipeline's JSON request, run your pipeline with thefollowing option:
Java
--dataflowJobFile=<path to output file>
Python
--dataflow_job_file=<path to output file>
Go
Estimating the size of a job's JSON payload with a flag is not currentlysupported in Go.
This command writes a JSON representation of your job to a file. The size of theserialized file is a good estimate of the size of the request; the actual sizewill be slightly larger due to some additional information included in therequest.
For more information, see the troubleshooting page for"413 Request Entity Too Large" / "The size of serialized JSON representation of the pipeline exceeds the allowable limit".
In addition, your job's graph size must not exceed 10 MB. For more information,see the troubleshooting page for"The job graph is too large. Please try again with a smaller job graph, or split your job into two or more smaller jobs.".
Workers
The Dataflow service currently allows a maximum of1000Compute Engine instances per job. For batch jobs, the default machinetype isn1-standard-1. For streaming jobs, the default machine type forStreaming Engine-enabled jobs isn1-standard-2 and the default machinetype fornon-Streaming Engine jobs isn1-standard-4. When using thedefault machine types, the Dataflow service can therefore allocateup to 4000 cores per job. If you need more cores for your job, you can select alarger machine type.
You should not attempt to manage or otherwise interact directly with yourCompute Engine Managed Instance Group; the Dataflowservice will take care of that for you. Manually altering anyCompute Engine resources associated with your Dataflowjob is an unsupported operation.
You can use any of the available Compute Engine machine type families as wellas custom machine types. For best results, usen1 machine types. Shared coremachine types, such asf1 andg1 series workers, are not supported under theDataflowService Level Agreement.
To allocate additional memory per worker thread, use a custom machine type withextended memory. For example,custom-2-15360-ext is ann1 machine type with2 CPUs and 15 GB of memory. Dataflow considers the number of CPUsin a machine to determine the number of worker threads per worker VM. If yourpipeline processes memory-intensive work, a custom machine type with extendedmemory can give more memory per worker thread. For more information, seeCreating a custom VM instance.
Dataflow bills by the number of vCPUs and GB of memory in workers.Billing is independent of the machine type family. You can specify a machinetype for your pipeline bysetting the appropriate execution parameter at pipeline creation time.
Caution: Shared core machine types such asf1 andg1 series workers arenot supported under Dataflow'sService Level Agreement.Java
To change the machine type, set the--workerMachineType option.
Python
To change the machine type, set the--worker_machine_type option.
Go
To change the machine type, set the‑‑worker_machine_type option.
Resource quota
The Dataflow service checks to ensure that your Google Cloudproject has the Compute Engine resource quota required to run your job,both to start the job and scale to the maximum number of worker instances. Yourjob will fail to start if there is not enough resource quota available.
If your Dataflow job deploys Compute Engine virtualmachines as a Managed Instance Group, you'll need to ensure your projectsatisfies some additional quota requirements. Specifically, your project willneed one of the following types of quota for each concurrentDataflow job that you want to run:
- One Instance Group per job
- One Managed Instance Group per job
- One Instance Template per job
Caution: Manually changing your Dataflow job's InstanceTemplate or Managed Instance Group isnot recommended or supported. UseDataflow'spipeline configuration options instead.
Dataflow'sHorizontal Autoscaling feature islimited by your project's available Compute Engine quota. If your jobhas sufficient quota when it starts, but another job uses the remainder of yourproject's available quota, the first job will run but not be able to fully scale.
However, the Dataflow servicedoes not manage quota increasesfor jobs that exceed the resource quotas in your project. You are responsiblefor making any necessary requests for additional resource quota, for which youcan use theGoogle Cloud console.
IP addresses
By default, Dataflow assigns both public and private IP addressesto worker VMs. A public IP address satisfies one of thecriteria for internet access, but a public IP address also countsagainst yourquota of external IP addresses.
If your worker VMs don't need access to the public internet, consider using onlyinternal IP addresses, which don't count against your external quota. For moreinformation on configuring IP addresses, see the following resources:
Inactive workers
If the workers for a given job don't exhibit sufficient activity over a one-hourperiod, the job fails. Worker inactivity can result from dependency managementproblems. For example, if a worker encounters an issue while installingdependencies for acustom container image,the worker might fail to start or fail to make progress. The lack of progresscould then cause the job to fail. To learn more, see:
Persistent disk resources
The Dataflow service is limited to15 persistent disksper worker instance when running a streaming job. Each persistent disk islocal to an individual Compute Engine virtual machine. Your job may nothave more workers than persistent disks; a 1:1 ratio between workers and disksis the minimum resource allotment.
Jobs using Streaming Engine use30 GB boot disks. Jobsusing Dataflow Shuffle use25 GBboot disks. For jobs that are not using these offerings, the default size ofeach persistent disk is250 GB in batch mode and400 GB in streamingmode.
Locations
By default, the Dataflow service deploys Compute Engineresources in theus-central1-f zone of theus-central1 region. You canoverride this setting byspecifyingthe--region parameter. If you need to use a specific zone for your resources,use the--zone parameter when you create your pipeline. However, we recommendthat you only specify the region, and leave the zone unspecified. This allowsthe Dataflow service to automatically select the best zone withinthe region based on the available zone capacity at the time of the job creationrequest. For more information, see theDataflow regionsdocumentation.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.