Dataflow components Stay organized with collections Save and categorize content based on your preferences.
The Dataflow components let you submit Apache Beam jobs toDataflow for execution. In Dataflow, aJobresource represents a Dataflow job.
The Google Cloud SDK includes thefollowing operators for creatingJob resources and monitor their execution:
Additionally, the Google Cloud SDK includes theWaitGcpResourcesOpcomponent, which you can use to mitigate costs while runningDataflow jobs.
DataflowFlexTemplateJobOp
TheDataflowFlexTemplateJobOpoperator lets you create aVertex AI Pipelines component to launch aDataflow Flex Template.
In Dataflow, aLaunchFlexTemplateParameterresource represents a Flex Template to launch. This component creates aLaunchFlexTemplateParameter resource and then requests Dataflow tocreate a job by launching the template. If the template is launchedsuccessfully, Dataflow returns aJobresource.
The Dataflow Flex Template component terminates upon receiving aJobresource from Dataflow. The component outputs ajob_id as aserializedgcp_resources proto. Youcan pass this parameter to aWaitGcpResourcesOpcomponent, to wait for the Dataflow job to complete.
DataflowPythonJobOp
TheDataflowPythonJobOpoperator lets you create a Vertex AI Pipelines component that preparesdata by submitting a Python-based Apache Beam job to Dataflow forexecution.
The Python code of the Apache Beam job runs with Dataflow Runner.When you run your pipeline with the Dataflow service, the runneruploads your executable code to the location specified by thepython_module_path parameterand dependencies to a Cloud Storage bucket (specified bytemp_location), and then creates aDataflow job that executes your Apache Beam pipeline on managed resources in Google Cloud.
To learn more about the Dataflow Runner, seeUsing the Dataflow Runner.
The Dataflow Python component accepts a list of argumentsthat are passed using the Beam Runner to your Apache Beamcode. These arguments are specified byargs. For example, you can use thesearguments to set theapache_beam.options.pipeline_options tospecify a network, a subnetwork, customer-managed encryption key (CMEK), andother options when you run Dataflow jobs.
WaitGcpResourcesOp
Dataflow jobs can often take long time to complete. The costs ofabusy-wait container (the container that launches Dataflow job andwait for the result) can become expensive.
After submitting the Dataflow job using the Beam runner,theDataflowPythonJobOpcomponent terminates immediately and returns ajob_id output parameter as aserializedgcp_resources proto. Youcan pass this parameter to aWaitGcpResourcesOp component, to wait for theDataflow job to complete.
dataflow_python_op=DataflowPythonJobOp(project=project_id,location=location,python_module_path=python_file_path,temp_location=staging_dir,requirements_file_path=requirements_file_path,args=['--output',OUTPUT_FILE],)dataflow_wait_op=WaitGcpResourcesOp(gcp_resources=dataflow_python_op.outputs["gcp_resources"])
Vertex AI Pipelines optimizes theWaitGcpResourcesOp to execute it in aserverless fashion, and has zero cost.
IfDataflowPythonJobOp andDataflowFlexTemplateJobOp don't meet yourrequirements, you can also create your own component that outputs thegcp_resources parameter and pass it to theWaitGcpResourcesOp component.
For more information about how to creategcp_resources output parameter, seeWrite a component to show a Google Cloud console link.
API reference
For component reference, see theGoogle Cloud SDK reference for Dataflow components.
For Dataflow resource reference, see the following API reference pages:
LaunchFlexTemplateParameterresourceJobresource
Tutorials
- Get started with the Dataflow Flex Template component
- Get started with the Dataflow Python Job component
- Specify a network and subnetwork
- Using customer-managed encryption keys (CMEK)
Version history and release notes
To learn more about the version history and changes to the Google Cloud Pipeline Components SDK, see theGoogle Cloud Pipeline Components SDK Release Notes.
Technical support contacts
If you have any questions, reach out tokubeflow-pipelines-components@google.com.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.