Build a pipeline

Vertex AI Pipelines lets you orchestrate your machine learning (ML)workflows in a serverless manner. Before Vertex AI Pipelines canorchestrate your ML workflow, you must describe your workflow as a pipeline. MLpipelines are portable and scalable ML workflows that are based on containersand Google Cloud services.

This guide describes how to get started building ML pipelines.

Which pipelines SDK should I use?

Vertex AI Pipelines can run pipelines built using any of the following SDKs:

If you use TensorFlow in an ML workflow that processes terabytes ofstructured data or text data, we recommend that you build your pipelineusing TFX.

For other use cases, we recommend that you build your pipeline using theKubeflow Pipelines SDK. By building a pipeline with the Kubeflow Pipelines SDK, youcan implement your workflow by building custom components or reusingprebuilt components, such as theGoogle Cloud Pipeline Components.Google Cloud Pipeline Components make it easier to use Vertex AI services likeAutoML in your pipeline.

This guide describes how to build pipelines using the Kubeflow Pipelines SDK.

Before you begin

Before you build and run your pipelines, use the following instructions to setup your Google Cloud project and development environment.

  1. To get your Google Cloud project ready to run ML pipelines, follow theinstructions in the guide toconfiguring yourGoogle Cloud project.

  2. Install v2 or later of the Kubeflow Pipelines SDK.

    pipinstall--upgrade"kfp>=2,<3"
Note: To upgrade to the latest version of the Kubeflow Pipelines SDK, run the following command:
pip install kfp --upgrade
If an updated version is available, running this command uninstalls KFP and installs the latest version.
  1. To use Vertex AI Python client in your pipelines,install theVertex AI client libraries v1.7 or later.

  2. To use Vertex AI services in your pipelines,install theGoogle Cloud SDK.

Getting started building a pipeline

To orchestrate your ML workflow on Vertex AI Pipelines, youmust first describe your workflow as a pipeline. The following sampledemonstrates how to use theGoogle Cloud Pipeline Components with Vertex AI to create adataset, train a model using AutoML, and deploy the trained model forpredictions.

Before you run the following code sample, you must set up authentication.

How to set up authentication

To set up authentication, you must create a service account key, and set an environment variable for the path to the service account key.

  1. Create a service account:

    1. In the Google Cloud console, go to theCreate service account page.

      Go to Create service account

    2. In theService account name field, enter a name.
    3. Optional: In theService account description field, enter a description.
    4. ClickCreate.
    5. Click theSelect a role field. UnderAll roles, selectVertex AI >Vertex AI User.
    6. Note: The roles you select allow your service account to access resources. You can view and change these roles later by using theGoogle Cloud console. For more information, seeaccess control for Vertex AI.
    7. ClickDone to create the service account.

      Do not close your browser window. You will use it in the next step.

  2. Create a service account key for authentication:

    1. In the Google Cloud console, click the email address for the service account that you created.
    2. ClickKeys.
    3. ClickAdd key, thenCreate new key.
    4. ClickCreate. A JSON key file is downloaded to your computer.
    5. ClickClose.
  3. Grant your new service account access to the service account that you use to run pipelines.
    1. Click to return to the list of service accounts.
    2. Click the name of the service account that you use to run pipelines. TheService account details page appears.

      If you followed the instructions in the guide to configuring your project for Vertex AI Pipelines, this is the same service account that you created in theConfigure a service account with granular permissions section. Otherwise, Vertex AI uses the Compute Engine default service account to run pipelines. The Compute Engine default service account is named like the following:PROJECT_NUMBER-compute@developer.gserviceaccount.com

    3. Click thePermissions tab.
    4. ClickGrant access. TheAdd principals panel appears.
    5. In theNew principals box, enter the email address for the service account you created in a previous step.
    6. In theRole drop-down list, selectService accounts >Service account user.
    7. ClickSave
  4. Set the environment variableGOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.

    Example: Linux or macOS

    Replace[PATH] with the path of the JSON file that contains your service account key.

    exportGOOGLE_APPLICATION_CREDENTIALS="[PATH]"

    For example:

    exportGOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

    Example: Windows

    Replace[PATH] with the path of the JSON file that contains your service account key, and[FILE_NAME] with the filename.

    With PowerShell:

    $env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

    For example:

    $env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\[FILE_NAME].json"

    With command prompt:

    setGOOGLE_APPLICATION_CREDENTIALS=[PATH]

Define your workflow using Kubeflow Pipelines DSL package

Thekfp.dsl package contains the domain-specific language (DSL) that you canuse to define and interact with pipelines and components.

Kubeflow pipeline components are factory functions that create pipelinesteps. Each component describes the inputs, outputs, and implementation of thecomponent. For example, in the code sample below,ds_op is a component.

Components are used to create pipeline steps. When a pipeline runs, stepsare executed as the data they depend on becomes available. For example, atraining component could take a CSV file as an input and useit to train a model.

importkfpfromgoogle.cloudimportaiplatformfromgoogle_cloud_pipeline_components.v1.datasetimportImageDatasetCreateOpfromgoogle_cloud_pipeline_components.v1.automl.training_jobimportAutoMLImageTrainingJobRunOpfromgoogle_cloud_pipeline_components.v1.endpointimportEndpointCreateOp,ModelDeployOpproject_id=PROJECT_IDpipeline_root_path=PIPELINE_ROOT# Define the workflow of the pipeline.@kfp.dsl.pipeline(name="automl-image-training-v2",pipeline_root=pipeline_root_path)defpipeline(project_id:str):# The first step of your workflow is a dataset generator.# This step takes a Google Cloud Pipeline Component, providing the necessary# input arguments, and uses the Python variable `ds_op` to define its# output. Note that here the `ds_op` only stores the definition of the# output but not the actual returned object from the execution. The value# of the object is not accessible at the dsl.pipeline level, and can only be# retrieved by providing it as the input to a downstream component.ds_op=ImageDatasetCreateOp(project=project_id,display_name="flowers",gcs_source="gs://cloud-samples-data/vision/automl_classification/flowers/all_data_v2.csv",import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification,)# The second step is a model training component. It takes the dataset# outputted from the first step, supplies it as an input argument to the# component (see `dataset=ds_op.outputs["dataset"]`), and will put its# outputs into `training_job_run_op`.training_job_run_op=AutoMLImageTrainingJobRunOp(project=project_id,display_name="train-iris-automl-mbsdk-1",prediction_type="classification",model_type="CLOUD",dataset=ds_op.outputs["dataset"],model_display_name="iris-classification-model-mbsdk",training_fraction_split=0.6,validation_fraction_split=0.2,test_fraction_split=0.2,budget_milli_node_hours=8000,)# The third and fourth step are for deploying the model.create_endpoint_op=EndpointCreateOp(project=project_id,display_name="create-endpoint",)model_deploy_op=ModelDeployOp(model=training_job_run_op.outputs["model"],endpoint=create_endpoint_op.outputs['endpoint'],automatic_resources_min_replica_count=1,automatic_resources_max_replica_count=1,)

Replace the following:

  • PROJECT_ID: The Google Cloud project that this pipeline runs in.
  • PIPELINE_ROOT_PATH: Specify a Cloud Storage URI that yourpipelines service account can access. The artifacts of yourpipeline runs are stored within the pipeline root. The Cloud Storage URImust start withgs://.

    The pipeline root can be set as an argument of the@kfp.dsl.pipelineannotation on the pipeline function, or it can be set when you callcreate_run_from_job_spec to create a pipeline run.

Compile your pipeline into a YAML file

After the workflow of your pipeline is defined, you can proceed to compile thepipeline into YAML format. The YAML file includes all the informationfor executing your pipeline on Vertex AI Pipelines.

fromkfpimportcompilercompiler.Compiler().compile(pipeline_func=pipeline,package_path='image_classif_pipeline.yaml')

Submit your pipeline run

After the workflow of your pipeline is compiled into the YAML format, you can usethe Vertex AI Python client to submit and run your pipeline.

importgoogle.cloud.aiplatformasaip# Before initializing, make sure to set the GOOGLE_APPLICATION_CREDENTIALS# environment variable to the path of your service account.aip.init(project=project_id,location=PROJECT_REGION,)# Prepare the pipeline jobjob=aip.PipelineJob(display_name="automl-image-training-v2",template_path="image_classif_pipeline.yaml",pipeline_root=pipeline_root_path,parameter_values={'project_id':project_id})job.submit()

Replace the following:

  • PROJECT_REGION: The region that this pipeline runs in.

In the preceding example:

  1. A Kubeflow pipeline is defined as a Python function.The function is annotated with the@kfp.dsl.pipeline decorator, whichspecifies the pipeline's name and root path. The pipeline root path is thelocation where the pipeline's artifacts are stored.
  2. The pipeline's workflow steps are created using theGoogle Cloud Pipeline Components. By using the outputs of a componentas an input of another component, you define the pipeline's workflow as agraph. For example:training_job_run_op depends on thedatasetoutput ofds_op.
  3. You compile the pipeline usingkfp.compiler.Compiler.
  4. You create a pipeline run on Vertex AI Pipelines using theVertex AI Python client. When you run a pipeline, you can overridethe pipeline name and the pipeline root path. Pipeline runs can be groupedusing the pipeline name. Overriding the pipeline name can help youdistinguish between production and experimental pipeline runs.

To learn more about building pipelines, read thebuilding Kubeflowpipelines section, andfollow the samples andtutorials.

Test a pipeline locally (optional)

After you define your pipelines and components, you can test the component codeby executing the code in your local authoring environment. By executing yourpipeline or a component locally, you can identify and debug potential issuesbefore youcreate a pipeline run in a remoteenvironment, such as Vertex AI Pipelines. For more information aboutlocally executing pipelines and components, seeLocal executionin theKFP documentation.

Note: You can't locally test your pipeline code if any of the components requiresauthentication to use Google Cloud services. These include Google Cloud Pipeline Components.Learn more about the limitations of local execution.

This page shows you how to define and run a pipeline that consists of two tasks.

Set up your local environment

  1. Optional:Install Docker.

    Note: To verify whether Docker is installed or not, run the following command:
    docker --version
    If the command is unavailable, then you need to install Docker.
  2. Use the following code sample to define a simple pipeline:

    fromkfpimportdsl# Define a component to add two numbers.@dsl.componentdefadd(a:int,b:int)->int:returna+b# Define a simple pipeline using the component.@dsl.pipelinedefaddition_pipeline(x:int,y:int,z:int)->int:task1=add(a=x,b=y)task2=add(a=task1.output,b=z)returntask2.output

Invoke a local execution

Initialize a local session using thelocal.init() function. When you uselocal.init(), the KFP SDK locally executes your pipelines and components whenyou call them.

When you uselocal.init(), you must specify a runner type. The runner typeindicates how KFP should run each task.

Use the following sample to specify theDockerRunner runner type for running each task in a container. For more information about local runners supportedby KFP, seeLocal runnersin the KFP documentation.

fromkfpimportlocallocal.init(runner=local.DockerRunner())pipeline_task=addition_pipeline(x=1,y=2,z=3)

Use the following code to view the output of the pipeline task upon local execution:

print(f'Result:{pipeline_task.output}')

Building Kubeflow pipelines

Use the following process to build a pipeline.

  1. Design your pipeline as a series of components. To promote reusability, eachcomponent should have a single responsibility. Whenever possible, designyour pipeline to reuse proven components such as theGoogle Cloud Pipeline Components.

    Learn more about designing pipelines.

  2. Build any custom components that are required to implement your ML workflowusing Kubeflow Pipelines SDK. Components are self-contained sets of code thatperform a step in your ML workflow. Use the following options to create yourpipeline components.

  3. Build your pipeline as a Python function.

    Learn more about defining your pipeline as a Pythonfunction.

  4. Use the Kubeflow Pipelines SDK compiler to compile your pipeline.

    fromkfpimportcompilercompiler.Compiler().compile(pipeline_func=PIPELINE_FUNCTION,package_path=PIPELINE_PACKAGE_PATH)

    Replace the following:

    • PIPELINE_FUNCTION: The name of your pipeline's function.
    • PIPELINE_PACKAGE_PATH: The path to where to store yourcompiled pipeline.
  5. Run your pipeline using Google Cloud console or Python.

Accessing Google Cloud resources in a pipeline

If you don't specify a service account when you run a pipeline,Vertex AI Pipelines uses the Compute Engine default serviceaccount to run your pipeline. Vertex AI Pipelines also uses apipeline run's service account to authorize your pipeline to accessGoogle Cloud resources. The Compute Engine default service account hastheProject Editor role by default. This may grant yourpipelines excessive access to Google Cloud resources in your Google Cloud project.

We recommend that youcreate a service account to run your pipelines andthen grant this account granular permissions to the Google Cloud resourcesthat are needed to run your pipeline.

Learn more about using Identity and Access Management tocreate a service account andmanage the access granted to a service account.

Keep your pipelines up-to-date

The SDK clients and container images that you use to build and run pipelines areperiodically updated to new versions to patch security vulnerabilities and addnew functionality. To keep your pipelines up to date with the latest version, werecommend that you do the following:

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.