Build a pipeline Stay organized with collections Save and categorize content based on your preferences.
To learn more, run the "Learn how to use control structures in a Kubeflow pipeline" notebook in one of the following environments:
Open in Colab |Open in Colab Enterprise |Openin Vertex AI Workbench |View on GitHub
Vertex AI Pipelines lets you orchestrate your machine learning (ML)workflows in a serverless manner. Before Vertex AI Pipelines canorchestrate your ML workflow, you must describe your workflow as a pipeline. MLpipelines are portable and scalable ML workflows that are based on containersand Google Cloud services.
This guide describes how to get started building ML pipelines.
Which pipelines SDK should I use?
Vertex AI Pipelines can run pipelines built using any of the following SDKs:
Kubeflow Pipelines SDK v2.0 or later
Note: Install Kubeflow Pipelines SDK v2 to use the code samples provided in the Vertex AI Pipelines documentation.TensorFlow Extended v0.30.0 or later
If you use TensorFlow in an ML workflow that processes terabytes ofstructured data or text data, we recommend that you build your pipelineusing TFX.
- To learn more about building a TFX pipeline,follow theTFX getting started tutorials.
- To learn more about using Vertex AI Pipelines to run aTFX pipeline,follow the TFX onGoogle Cloud tutorials.
For other use cases, we recommend that you build your pipeline using theKubeflow Pipelines SDK. By building a pipeline with the Kubeflow Pipelines SDK, youcan implement your workflow by building custom components or reusingprebuilt components, such as theGoogle Cloud Pipeline Components.Google Cloud Pipeline Components make it easier to use Vertex AI services likeAutoML in your pipeline.
This guide describes how to build pipelines using the Kubeflow Pipelines SDK.
Before you begin
Before you build and run your pipelines, use the following instructions to setup your Google Cloud project and development environment.
To get your Google Cloud project ready to run ML pipelines, follow theinstructions in the guide toconfiguring yourGoogle Cloud project.
Install v2 or later of the Kubeflow Pipelines SDK.
pipinstall--upgrade"kfp>=2,<3"
pip install kfp --upgradeIf an updated version is available, running this command uninstalls KFP and installs the latest version.
To use Vertex AI Python client in your pipelines,install theVertex AI client libraries v1.7 or later.
To use Vertex AI services in your pipelines,install theGoogle Cloud SDK.
Getting started building a pipeline
To orchestrate your ML workflow on Vertex AI Pipelines, youmust first describe your workflow as a pipeline. The following sampledemonstrates how to use theGoogle Cloud Pipeline Components with Vertex AI to create adataset, train a model using AutoML, and deploy the trained model forpredictions.
Before you run the following code sample, you must set up authentication.
How to set up authentication
To set up authentication, you must create a service account key, and set an environment variable for the path to the service account key.
Create a service account:
In the Google Cloud console, go to theCreate service account page.
- In theService account name field, enter a name.
- Optional: In theService account description field, enter a description.
- ClickCreate.
- Click theSelect a role field. UnderAll roles, selectVertex AI >Vertex AI User. Note: The roles you select allow your service account to access resources. You can view and change these roles later by using theGoogle Cloud console. For more information, seeaccess control for Vertex AI.
ClickDone to create the service account.
Do not close your browser window. You will use it in the next step.
Create a service account key for authentication:
- In the Google Cloud console, click the email address for the service account that you created.
- ClickKeys.
- ClickAdd key, thenCreate new key.
- ClickCreate. A JSON key file is downloaded to your computer.
- ClickClose.
- Click to return to the list of service accounts.
Click the name of the service account that you use to run pipelines. TheService account details page appears.
If you followed the instructions in the guide to configuring your project for Vertex AI Pipelines, this is the same service account that you created in theConfigure a service account with granular permissions section. Otherwise, Vertex AI uses the Compute Engine default service account to run pipelines. The Compute Engine default service account is named like the following:
PROJECT_NUMBER-compute@developer.gserviceaccount.com- Click thePermissions tab.
- ClickGrant access. TheAdd principals panel appears.
- In theNew principals box, enter the email address for the service account you created in a previous step.
- In theRole drop-down list, selectService accounts >Service account user.
- ClickSave
Set the environment variableGOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.
Example: Linux or macOS
Replace[PATH] with the path of the JSON file that contains your service account key.
exportGOOGLE_APPLICATION_CREDENTIALS="[PATH]"
For example:
exportGOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"
Example: Windows
Replace[PATH] with the path of the JSON file that contains your service account key, and[FILE_NAME] with the filename.
With PowerShell:
$env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
For example:
$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\[FILE_NAME].json"
With command prompt:
setGOOGLE_APPLICATION_CREDENTIALS=[PATH]
Define your workflow using Kubeflow Pipelines DSL package
Thekfp.dsl package contains the domain-specific language (DSL) that you canuse to define and interact with pipelines and components.
Kubeflow pipeline components are factory functions that create pipelinesteps. Each component describes the inputs, outputs, and implementation of thecomponent. For example, in the code sample below,ds_op is a component.
Components are used to create pipeline steps. When a pipeline runs, stepsare executed as the data they depend on becomes available. For example, atraining component could take a CSV file as an input and useit to train a model.
importkfpfromgoogle.cloudimportaiplatformfromgoogle_cloud_pipeline_components.v1.datasetimportImageDatasetCreateOpfromgoogle_cloud_pipeline_components.v1.automl.training_jobimportAutoMLImageTrainingJobRunOpfromgoogle_cloud_pipeline_components.v1.endpointimportEndpointCreateOp,ModelDeployOpproject_id=PROJECT_IDpipeline_root_path=PIPELINE_ROOT# Define the workflow of the pipeline.@kfp.dsl.pipeline(name="automl-image-training-v2",pipeline_root=pipeline_root_path)defpipeline(project_id:str):# The first step of your workflow is a dataset generator.# This step takes a Google Cloud Pipeline Component, providing the necessary# input arguments, and uses the Python variable `ds_op` to define its# output. Note that here the `ds_op` only stores the definition of the# output but not the actual returned object from the execution. The value# of the object is not accessible at the dsl.pipeline level, and can only be# retrieved by providing it as the input to a downstream component.ds_op=ImageDatasetCreateOp(project=project_id,display_name="flowers",gcs_source="gs://cloud-samples-data/vision/automl_classification/flowers/all_data_v2.csv",import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification,)# The second step is a model training component. It takes the dataset# outputted from the first step, supplies it as an input argument to the# component (see `dataset=ds_op.outputs["dataset"]`), and will put its# outputs into `training_job_run_op`.training_job_run_op=AutoMLImageTrainingJobRunOp(project=project_id,display_name="train-iris-automl-mbsdk-1",prediction_type="classification",model_type="CLOUD",dataset=ds_op.outputs["dataset"],model_display_name="iris-classification-model-mbsdk",training_fraction_split=0.6,validation_fraction_split=0.2,test_fraction_split=0.2,budget_milli_node_hours=8000,)# The third and fourth step are for deploying the model.create_endpoint_op=EndpointCreateOp(project=project_id,display_name="create-endpoint",)model_deploy_op=ModelDeployOp(model=training_job_run_op.outputs["model"],endpoint=create_endpoint_op.outputs['endpoint'],automatic_resources_min_replica_count=1,automatic_resources_max_replica_count=1,)Replace the following:
- PROJECT_ID: The Google Cloud project that this pipeline runs in.
PIPELINE_ROOT_PATH: Specify a Cloud Storage URI that yourpipelines service account can access. The artifacts of yourpipeline runs are stored within the pipeline root. The Cloud Storage URImust start with
gs://.The pipeline root can be set as an argument of the
@kfp.dsl.pipelineannotation on the pipeline function, or it can be set when you callcreate_run_from_job_specto create a pipeline run.
Compile your pipeline into a YAML file
After the workflow of your pipeline is defined, you can proceed to compile thepipeline into YAML format. The YAML file includes all the informationfor executing your pipeline on Vertex AI Pipelines.
fromkfpimportcompilercompiler.Compiler().compile(pipeline_func=pipeline,package_path='image_classif_pipeline.yaml')Submit your pipeline run
After the workflow of your pipeline is compiled into the YAML format, you can usethe Vertex AI Python client to submit and run your pipeline.
importgoogle.cloud.aiplatformasaip# Before initializing, make sure to set the GOOGLE_APPLICATION_CREDENTIALS# environment variable to the path of your service account.aip.init(project=project_id,location=PROJECT_REGION,)# Prepare the pipeline jobjob=aip.PipelineJob(display_name="automl-image-training-v2",template_path="image_classif_pipeline.yaml",pipeline_root=pipeline_root_path,parameter_values={'project_id':project_id})job.submit()Replace the following:
- PROJECT_REGION: The region that this pipeline runs in.
In the preceding example:
- A Kubeflow pipeline is defined as a Python function.The function is annotated with the
@kfp.dsl.pipelinedecorator, whichspecifies the pipeline's name and root path. The pipeline root path is thelocation where the pipeline's artifacts are stored. - The pipeline's workflow steps are created using theGoogle Cloud Pipeline Components. By using the outputs of a componentas an input of another component, you define the pipeline's workflow as agraph. For example:
training_job_run_opdepends on thedatasetoutput ofds_op. - You compile the pipeline using
kfp.compiler.Compiler. - You create a pipeline run on Vertex AI Pipelines using theVertex AI Python client. When you run a pipeline, you can overridethe pipeline name and the pipeline root path. Pipeline runs can be groupedusing the pipeline name. Overriding the pipeline name can help youdistinguish between production and experimental pipeline runs.
To learn more about building pipelines, read thebuilding Kubeflowpipelines section, andfollow the samples andtutorials.
Test a pipeline locally (optional)
After you define your pipelines and components, you can test the component codeby executing the code in your local authoring environment. By executing yourpipeline or a component locally, you can identify and debug potential issuesbefore youcreate a pipeline run in a remoteenvironment, such as Vertex AI Pipelines. For more information aboutlocally executing pipelines and components, seeLocal executionin theKFP documentation.
Note: You can't locally test your pipeline code if any of the components requiresauthentication to use Google Cloud services. These include Google Cloud Pipeline Components.Learn more about the limitations of local execution.This page shows you how to define and run a pipeline that consists of two tasks.
Set up your local environment
Optional:Install Docker.
Note: To verify whether Docker is installed or not, run the following command:docker --version
If the command is unavailable, then you need to install Docker.Use the following code sample to define a simple pipeline:
fromkfpimportdsl# Define a component to add two numbers.@dsl.componentdefadd(a:int,b:int)->int:returna+b# Define a simple pipeline using the component.@dsl.pipelinedefaddition_pipeline(x:int,y:int,z:int)->int:task1=add(a=x,b=y)task2=add(a=task1.output,b=z)returntask2.output
Invoke a local execution
Initialize a local session using thelocal.init() function. When you uselocal.init(), the KFP SDK locally executes your pipelines and components whenyou call them.
When you uselocal.init(), you must specify a runner type. The runner typeindicates how KFP should run each task.
Use the following sample to specify theDockerRunner runner type for running each task in a container. For more information about local runners supportedby KFP, seeLocal runnersin the KFP documentation.
fromkfpimportlocallocal.init(runner=local.DockerRunner())pipeline_task=addition_pipeline(x=1,y=2,z=3)Use the following code to view the output of the pipeline task upon local execution:
print(f'Result:{pipeline_task.output}')Building Kubeflow pipelines
Use the following process to build a pipeline.
Design your pipeline as a series of components. To promote reusability, eachcomponent should have a single responsibility. Whenever possible, designyour pipeline to reuse proven components such as theGoogle Cloud Pipeline Components.
Build any custom components that are required to implement your ML workflowusing Kubeflow Pipelines SDK. Components are self-contained sets of code thatperform a step in your ML workflow. Use the following options to create yourpipeline components.
Package your component's code as a container image. This option lets youinclude code in your pipeline that was written in any language that canbe packaged as a container image.
Implement your component's code as a standalone Python function and usethe Kubeflow Pipelines SDK to package your function as a component. Thisoption makes it easier to build Python-based components.
Build your pipeline as a Python function.
Learn more about defining your pipeline as a Pythonfunction.
Use the Kubeflow Pipelines SDK compiler to compile your pipeline.
fromkfpimportcompilercompiler.Compiler().compile(pipeline_func=PIPELINE_FUNCTION,package_path=PIPELINE_PACKAGE_PATH)Replace the following:
- PIPELINE_FUNCTION: The name of your pipeline's function.
- PIPELINE_PACKAGE_PATH: The path to where to store yourcompiled pipeline.
Accessing Google Cloud resources in a pipeline
If you don't specify a service account when you run a pipeline,Vertex AI Pipelines uses the Compute Engine default serviceaccount to run your pipeline. Vertex AI Pipelines also uses apipeline run's service account to authorize your pipeline to accessGoogle Cloud resources. The Compute Engine default service account hastheProject Editor role by default. This may grant yourpipelines excessive access to Google Cloud resources in your Google Cloud project.
We recommend that youcreate a service account to run your pipelines andthen grant this account granular permissions to the Google Cloud resourcesthat are needed to run your pipeline.
Learn more about using Identity and Access Management tocreate a service account andmanage the access granted to a service account.
Keep your pipelines up-to-date
The SDK clients and container images that you use to build and run pipelines areperiodically updated to new versions to patch security vulnerabilities and addnew functionality. To keep your pipelines up to date with the latest version, werecommend that you do the following:
Review theVertex AI framework support policy andSupported frameworks list.
Subscribe to theVertex AI release notes and thePyPi.org RSS feeds for SDKs you use (Kubeflow Pipelines SDK,Google Cloud Pipeline Components SDK, orTensorFlow Extended SDK) to stay aware of new releases.
If you have a pipeline template or definition that references a container withsecurity vulnerabilities, you should do the following:
Install the latest patched version of the SDK.
Rebuild and recompile your pipeline template or definition.
Re-upload the template or definition to Artifact Registry or Cloud Storage.
What's next
- Read theintroduction to Vertex AI Pipelines to learnmore about orchestrating ML workflows.
- Learn how torun a pipeline.
- Visualize and analyze the results of your pipelineruns.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.
Open in Colab
Open in Colab Enterprise
Openin Vertex AI Workbench
View on GitHub