Train custom ML models on Vertex AI Pipelines

This tutorial shows you how to use Vertex AI Pipelines to run an end-to-end ML workflow, including the following tasks:

  • Import and transform data.
  • Train a model using the selected ML framework.
  • Import the trained model to Vertex AI Model Registry.
  • Optional: Deploy the model for online serving with Vertex AI Inference.

Before you begin

  1. Ensure that you've completed the tasks 1-3 inSet up a Google Cloud project and a developmentenvironment.

  2. Install theVertex AI SDK for Pythonand the Kubeflow Pipelines SDK:

    python3-mpipinstall"kfp<2.0.0""google-cloud-aiplatform>=1.16.0"--upgrade--quiet
  3. Ensure that you have the following IAM permissions:

    • **aiplatform.metadataStores.get**
    • **storage.buckets.get**
    • **storage.objects.create**
    • **storage.objects.get**

    You must have these permissions to use Vertex AI Pipelines to runpipelines.

    Note: If this is your first time using Vertex AI Pipelines to createa pipeline run in the project, ensure that you also have the**aiplatform.metadataStores.create** permission.

Run the ML model training pipeline

Choose training objective and ML framework in the following tabs to get sample code thatyou can run in your environment. The sample code does the following:

Copy the code into your development environment and run it.

Tabular classification

TensorFlow

# python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quietfromkfpimportcomponents# %% Loading componentsdownload_from_gcs_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/storage/download/component.yaml")select_columns_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Select_columns/in_CSV_format/component.yaml")fill_all_missing_values_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Fill_all_missing_values/in_CSV_format/component.yaml")binarize_column_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Binarize_column/in_CSV_format/component.yaml")split_rows_into_subsets_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/dataset_manipulation/Split_rows_into_subsets/in_CSV/component.yaml")create_fully_connected_tensorflow_network_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/tensorflow/Create_fully_connected_network/component.yaml")train_model_using_Keras_on_CSV_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/tensorflow/Train_model_using_Keras/on_CSV/component.yaml")predict_with_TensorFlow_model_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/tensorflow/Predict/on_CSV/component.yaml")upload_Tensorflow_model_to_Google_Cloud_Vertex_AI_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Upload_Tensorflow_model/component.yaml")deploy_model_to_endpoint_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/component.yaml")# %% Pipeline definitiondeftrain_tabular_classification_model_using_TensorFlow_pipeline():dataset_gcs_uri="gs://ml-pipeline-dataset/Chicago_taxi_trips/chicago_taxi_trips_2019-01-01_-_2019-02-01_limit=10000.csv"feature_columns=["trip_seconds","trip_miles","pickup_community_area","dropoff_community_area","fare","tolls","extras"]# Excluded "trip_total"label_column="tips"training_set_fraction=0.8# Deploying the model might incur additional costs over timedeploy_model=Falseclassification_label_column="class"all_columns=[label_column]+feature_columnsdataset=download_from_gcs_op(gcs_path=dataset_gcs_uri).outputs["Data"]dataset=select_columns_using_Pandas_on_CSV_data_op(table=dataset,column_names=all_columns,).outputs["transformed_table"]dataset=fill_all_missing_values_using_Pandas_on_CSV_data_op(table=dataset,replacement_value="0",# # Optional:# column_names=None,  # =[...]).outputs["transformed_table"]classification_dataset=binarize_column_using_Pandas_on_CSV_data_op(table=dataset,column_name=label_column,predicate=" > 0",new_column_name=classification_label_column,).outputs["transformed_table"]split_task=split_rows_into_subsets_op(table=classification_dataset,fraction_1=training_set_fraction,)classification_training_data=split_task.outputs["split_1"]classification_testing_data=split_task.outputs["split_2"]network=create_fully_connected_tensorflow_network_op(input_size=len(feature_columns),# Optional:hidden_layer_sizes=[10],activation_name="elu",output_activation_name="sigmoid",# output_size=1,).outputs["model"]model=train_model_using_Keras_on_CSV_op(training_data=classification_training_data,model=network,label_column_name=classification_label_column,# Optional:loss_function_name="binary_crossentropy",number_of_epochs=10,#learning_rate=0.1,#optimizer_name="Adadelta",#optimizer_parameters={},#batch_size=32,#metric_names=["mean_absolute_error"],#random_seed=0,).outputs["trained_model"]predictions=predict_with_TensorFlow_model_on_CSV_data_op(dataset=classification_testing_data,model=model,# label_column_name needs to be set when doing prediction on a dataset that has labelslabel_column_name=classification_label_column,# Optional:# batch_size=1000,).outputs["predictions"]vertex_model_name=upload_Tensorflow_model_to_Google_Cloud_Vertex_AI_op(model=model,).outputs["model_name"]# Deploying the model might incur additional costs over timeifdeploy_model:vertex_endpoint_name=deploy_model_to_endpoint_op(model_name=vertex_model_name,).outputs["endpoint_name"]pipeline_func=train_tabular_classification_model_using_TensorFlow_pipeline# %% Pipeline submissionif__name__=='__main__':fromgoogle.cloudimportaiplatformaiplatform.PipelineJob.from_pipeline_func(pipeline_func=pipeline_func).submit()

PyTorch

# python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quietfromkfpimportcomponents# %% Loading componentsdownload_from_gcs_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/storage/download/component.yaml")select_columns_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Select_columns/in_CSV_format/component.yaml")fill_all_missing_values_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Fill_all_missing_values/in_CSV_format/component.yaml")binarize_column_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Binarize_column/in_CSV_format/component.yaml")create_fully_connected_pytorch_network_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/PyTorch/Create_fully_connected_network/component.yaml")train_pytorch_model_from_csv_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/PyTorch/Train_PyTorch_model/from_CSV/component.yaml")create_pytorch_model_archive_with_base_handler_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/PyTorch/Create_PyTorch_Model_Archive/with_base_handler/component.yaml")upload_PyTorch_model_archive_to_Google_Cloud_Vertex_AI_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Upload_PyTorch_model_archive/component.yaml")deploy_model_to_endpoint_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/component.yaml")# %% Pipeline definitiondeftrain_tabular_classification_model_using_PyTorch_pipeline():dataset_gcs_uri="gs://ml-pipeline-dataset/Chicago_taxi_trips/chicago_taxi_trips_2019-01-01_-_2019-02-01_limit=10000.csv"feature_columns=["trip_seconds","trip_miles","pickup_community_area","dropoff_community_area","fare","tolls","extras"]# Excluded "trip_total"label_column="tips"# Deploying the model might incur additional costs over timedeploy_model=Falseclassification_label_column="class"all_columns=[label_column]+feature_columnstraining_data=download_from_gcs_op(gcs_path=dataset_gcs_uri).outputs["Data"]training_data=select_columns_using_Pandas_on_CSV_data_op(table=training_data,column_names=all_columns,).outputs["transformed_table"]# Cleaning the NaN values.training_data=fill_all_missing_values_using_Pandas_on_CSV_data_op(table=training_data,replacement_value="0",#replacement_type_name="float",).outputs["transformed_table"]classification_training_data=binarize_column_using_Pandas_on_CSV_data_op(table=training_data,column_name=label_column,predicate=" > 0",new_column_name=classification_label_column,).outputs["transformed_table"]network=create_fully_connected_pytorch_network_op(input_size=len(feature_columns),# Optional:hidden_layer_sizes=[10],activation_name="elu",output_activation_name="sigmoid",# output_size=1,).outputs["model"]model=train_pytorch_model_from_csv_op(model=network,training_data=classification_training_data,label_column_name=classification_label_column,loss_function_name="binary_cross_entropy",# Optional:#number_of_epochs=1,#learning_rate=0.1,#optimizer_name="Adadelta",#optimizer_parameters={},#batch_size=32,#batch_log_interval=100,#random_seed=0,).outputs["trained_model"]model_archive=create_pytorch_model_archive_with_base_handler_op(model=model,# Optional:# model_name="model",# model_version="1.0",).outputs["Model archive"]vertex_model_name=upload_PyTorch_model_archive_to_Google_Cloud_Vertex_AI_op(model_archive=model_archive,).outputs["model_name"]# Deploying the model might incur additional costs over timeifdeploy_model:vertex_endpoint_name=deploy_model_to_endpoint_op(model_name=vertex_model_name,).outputs["endpoint_name"]pipeline_func=train_tabular_classification_model_using_PyTorch_pipeline# %% Pipeline submissionif__name__=='__main__':fromgoogle.cloudimportaiplatformaiplatform.PipelineJob.from_pipeline_func(pipeline_func=pipeline_func).submit()

XGBoost

# python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quietfromkfpimportcomponents# %% Loading componentsdownload_from_gcs_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/storage/download/component.yaml")select_columns_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Select_columns/in_CSV_format/component.yaml")fill_all_missing_values_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Fill_all_missing_values/in_CSV_format/component.yaml")binarize_column_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Binarize_column/in_CSV_format/component.yaml")split_rows_into_subsets_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/dataset_manipulation/Split_rows_into_subsets/in_CSV/component.yaml")train_XGBoost_model_on_CSV_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/XGBoost/Train/component.yaml")xgboost_predict_on_CSV_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/XGBoost/Predict/component.yaml")upload_XGBoost_model_to_Google_Cloud_Vertex_AI_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Upload_XGBoost_model/component.yaml")deploy_model_to_endpoint_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/component.yaml")# %% Pipeline definitiondeftrain_tabular_classification_model_using_XGBoost_pipeline():dataset_gcs_uri="gs://ml-pipeline-dataset/Chicago_taxi_trips/chicago_taxi_trips_2019-01-01_-_2019-02-01_limit=10000.csv"feature_columns=["trip_seconds","trip_miles","pickup_community_area","dropoff_community_area","fare","tolls","extras"]# Excluded "trip_total"label_column="tips"training_set_fraction=0.8# Deploying the model might incur additional costs over timedeploy_model=Falseclassification_label_column="class"all_columns=[label_column]+feature_columnsdataset=download_from_gcs_op(gcs_path=dataset_gcs_uri).outputs["Data"]dataset=select_columns_using_Pandas_on_CSV_data_op(table=dataset,column_names=all_columns,).outputs["transformed_table"]dataset=fill_all_missing_values_using_Pandas_on_CSV_data_op(table=dataset,replacement_value="0",# # Optional:# column_names=None,  # =[...]).outputs["transformed_table"]classification_dataset=binarize_column_using_Pandas_on_CSV_data_op(table=dataset,column_name=label_column,predicate="> 0",new_column_name=classification_label_column,).outputs["transformed_table"]split_task=split_rows_into_subsets_op(table=classification_dataset,fraction_1=training_set_fraction,)classification_training_data=split_task.outputs["split_1"]classification_testing_data=split_task.outputs["split_2"]model=train_XGBoost_model_on_CSV_op(training_data=classification_training_data,label_column_name=classification_label_column,objective="binary:logistic",# Optional:#starting_model=None,#num_iterations=10,#booster_params={},#booster="gbtree",#learning_rate=0.3,#min_split_loss=0,#max_depth=6,).outputs["model"]# Predicting on the testing datapredictions=xgboost_predict_on_CSV_op(data=classification_testing_data,model=model,# label_column needs to be set when doing prediction on a dataset that has labelslabel_column_name=classification_label_column,).outputs["predictions"]vertex_model_name=upload_XGBoost_model_to_Google_Cloud_Vertex_AI_op(model=model,).outputs["model_name"]# Deploying the model might incur additional costs over timeifdeploy_model:vertex_endpoint_name=deploy_model_to_endpoint_op(model_name=vertex_model_name,).outputs["endpoint_name"]pipeline_func=train_tabular_classification_model_using_XGBoost_pipeline# %% Pipeline submissionif__name__=='__main__':fromgoogle.cloudimportaiplatformaiplatform.PipelineJob.from_pipeline_func(pipeline_func=pipeline_func).submit()

Scikit-learn

# python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quietfromkfpimportcomponents# %% Loading componentsdownload_from_gcs_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/storage/download/component.yaml")select_columns_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Select_columns/in_CSV_format/component.yaml")fill_all_missing_values_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Fill_all_missing_values/in_CSV_format/component.yaml")binarize_column_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Binarize_column/in_CSV_format/component.yaml")train_logistic_regression_model_using_scikit_learn_from_CSV_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/1f5cf6e06409b704064b2086c0a705e4e6b4fcde/community-content/pipeline_components/ML_frameworks/Scikit_learn/Train_logistic_regression_model/from_CSV/component.yaml")upload_Scikit_learn_pickle_model_to_Google_Cloud_Vertex_AI_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Upload_Scikit-learn_pickle_model/component.yaml")deploy_model_to_endpoint_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/component.yaml")# %% Pipeline definitiondeftrain_tabular_classification_logistic_regression_model_using_Scikit_learn_pipeline():dataset_gcs_uri="gs://ml-pipeline-dataset/Chicago_taxi_trips/chicago_taxi_trips_2019-01-01_-_2019-02-01_limit=10000.csv"feature_columns=["trip_seconds","trip_miles","pickup_community_area","dropoff_community_area","fare","tolls","extras"]# Excluded "trip_total"label_column="tips"# Deploying the model might incur additional costs over timedeploy_model=Falseclassification_label_column="class"all_columns=[label_column]+feature_columnstraining_data=download_from_gcs_op(gcs_path=dataset_gcs_uri).outputs["Data"]training_data=select_columns_using_Pandas_on_CSV_data_op(table=training_data,column_names=all_columns,).outputs["transformed_table"]# Cleaning the NaN values.training_data=fill_all_missing_values_using_Pandas_on_CSV_data_op(table=training_data,replacement_value="0",#replacement_type_name="float",).outputs["transformed_table"]classification_training_data=binarize_column_using_Pandas_on_CSV_data_op(table=training_data,column_name=label_column,predicate="> 0",new_column_name=classification_label_column,).outputs["transformed_table"]model=train_logistic_regression_model_using_scikit_learn_from_CSV_op(dataset=classification_training_data,label_column_name=classification_label_column,# Optional:#penalty="l2",#solver="lbfgs",#max_iterations=100,#multi_class_mode="auto",#random_seed=0,).outputs["model"]vertex_model_name=upload_Scikit_learn_pickle_model_to_Google_Cloud_Vertex_AI_op(model=model,).outputs["model_name"]# Deploying the model might incur additional costs over timeifdeploy_model:sklearn_vertex_endpoint_name=deploy_model_to_endpoint_op(model_name=vertex_model_name,).outputs["endpoint_name"]pipeline_func=train_tabular_classification_logistic_regression_model_using_Scikit_learn_pipeline# %% Pipeline submissionif__name__=='__main__':fromgoogle.cloudimportaiplatformaiplatform.PipelineJob.from_pipeline_func(pipeline_func=pipeline_func).submit()

Tabular regression

TensorFlow

# python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quietfromkfpimportcomponents# %% Loading componentsdownload_from_gcs_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/storage/download/component.yaml")select_columns_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Select_columns/in_CSV_format/component.yaml")fill_all_missing_values_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Fill_all_missing_values/in_CSV_format/component.yaml")split_rows_into_subsets_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/dataset_manipulation/Split_rows_into_subsets/in_CSV/component.yaml")create_fully_connected_tensorflow_network_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/tensorflow/Create_fully_connected_network/component.yaml")train_model_using_Keras_on_CSV_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/tensorflow/Train_model_using_Keras/on_CSV/component.yaml")predict_with_TensorFlow_model_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/tensorflow/Predict/on_CSV/component.yaml")upload_Tensorflow_model_to_Google_Cloud_Vertex_AI_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Upload_Tensorflow_model/component.yaml")deploy_model_to_endpoint_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/component.yaml")# %% Pipeline definitiondeftrain_tabular_regression_model_using_Tensorflow_pipeline():dataset_gcs_uri="gs://ml-pipeline-dataset/Chicago_taxi_trips/chicago_taxi_trips_2019-01-01_-_2019-02-01_limit=10000.csv"feature_columns=["trip_seconds","trip_miles","pickup_community_area","dropoff_community_area","fare","tolls","extras"]# Excluded "trip_total"label_column="tips"training_set_fraction=0.8# Deploying the model might incur additional costs over timedeploy_model=Falseall_columns=[label_column]+feature_columnsdataset=download_from_gcs_op(gcs_path=dataset_gcs_uri).outputs["Data"]dataset=select_columns_using_Pandas_on_CSV_data_op(table=dataset,column_names=all_columns,).outputs["transformed_table"]dataset=fill_all_missing_values_using_Pandas_on_CSV_data_op(table=dataset,replacement_value="0",# # Optional:# column_names=None,  # =[...]).outputs["transformed_table"]split_task=split_rows_into_subsets_op(table=dataset,fraction_1=training_set_fraction,)training_data=split_task.outputs["split_1"]testing_data=split_task.outputs["split_2"]network=create_fully_connected_tensorflow_network_op(input_size=len(feature_columns),# Optional:hidden_layer_sizes=[10],activation_name="elu",# output_activation_name=None,# output_size=1,).outputs["model"]model=train_model_using_Keras_on_CSV_op(training_data=training_data,model=network,label_column_name=label_column,# Optional:#loss_function_name="mean_squared_error",number_of_epochs=10,#learning_rate=0.1,#optimizer_name="Adadelta",#optimizer_parameters={},#batch_size=32,metric_names=["mean_absolute_error"],#random_seed=0,).outputs["trained_model"]predictions=predict_with_TensorFlow_model_on_CSV_data_op(dataset=testing_data,model=model,# label_column_name needs to be set when doing prediction on a dataset that has labelslabel_column_name=label_column,# Optional:# batch_size=1000,).outputs["predictions"]vertex_model_name=upload_Tensorflow_model_to_Google_Cloud_Vertex_AI_op(model=model,).outputs["model_name"]# Deploying the model might incur additional costs over timeifdeploy_model:vertex_endpoint_name=deploy_model_to_endpoint_op(model_name=vertex_model_name,).outputs["endpoint_name"]pipeline_func=train_tabular_regression_model_using_Tensorflow_pipeline# %% Pipeline submissionif__name__=='__main__':fromgoogle.cloudimportaiplatformaiplatform.PipelineJob.from_pipeline_func(pipeline_func=pipeline_func).submit()

PyTorch

# python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quietfromkfpimportcomponents# %% Loading componentsdownload_from_gcs_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/storage/download/component.yaml")select_columns_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Select_columns/in_CSV_format/component.yaml")fill_all_missing_values_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Fill_all_missing_values/in_CSV_format/component.yaml")create_fully_connected_pytorch_network_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/PyTorch/Create_fully_connected_network/component.yaml")train_pytorch_model_from_csv_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/PyTorch/Train_PyTorch_model/from_CSV/component.yaml")create_pytorch_model_archive_with_base_handler_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/PyTorch/Create_PyTorch_Model_Archive/with_base_handler/component.yaml")upload_PyTorch_model_archive_to_Google_Cloud_Vertex_AI_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Upload_PyTorch_model_archive/component.yaml")deploy_model_to_endpoint_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/component.yaml")# %% Pipeline definitiondeftrain_tabular_regression_model_using_PyTorch_pipeline():dataset_gcs_uri="gs://ml-pipeline-dataset/Chicago_taxi_trips/chicago_taxi_trips_2019-01-01_-_2019-02-01_limit=10000.csv"feature_columns=["trip_seconds","trip_miles","pickup_community_area","dropoff_community_area","fare","tolls","extras"]# Excluded "trip_total"label_column="tips"all_columns=[label_column]+feature_columns# Deploying the model might incur additional costs over timedeploy_model=Falsetraining_data=download_from_gcs_op(gcs_path=dataset_gcs_uri).outputs["Data"]training_data=select_columns_using_Pandas_on_CSV_data_op(table=training_data,column_names=all_columns,).outputs["transformed_table"]# Cleaning the NaN values.training_data=fill_all_missing_values_using_Pandas_on_CSV_data_op(table=training_data,replacement_value="0",#replacement_type_name="float",).outputs["transformed_table"]network=create_fully_connected_pytorch_network_op(input_size=len(feature_columns),# Optional:hidden_layer_sizes=[10],activation_name="elu",# output_activation_name=None,# output_size=1,).outputs["model"]model=train_pytorch_model_from_csv_op(model=network,training_data=training_data,label_column_name=label_column,# Optional:#loss_function_name="mse_loss",#number_of_epochs=1,#learning_rate=0.1,#optimizer_name="Adadelta",#optimizer_parameters={},#batch_size=32,#batch_log_interval=100,#random_seed=0,).outputs["trained_model"]model_archive=create_pytorch_model_archive_with_base_handler_op(model=model,# Optional:# model_name="model",# model_version="1.0",).outputs["Model archive"]vertex_model_name=upload_PyTorch_model_archive_to_Google_Cloud_Vertex_AI_op(model_archive=model_archive,).outputs["model_name"]# Deploying the model might incur additional costs over timeifdeploy_model:vertex_endpoint_name=deploy_model_to_endpoint_op(model_name=vertex_model_name,).outputs["endpoint_name"]pipeline_func=train_tabular_regression_model_using_PyTorch_pipeline# %% Pipeline submissionif__name__=='__main__':fromgoogle.cloudimportaiplatformaiplatform.PipelineJob.from_pipeline_func(pipeline_func=pipeline_func).submit()

XGBoost

# python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quietfromkfpimportcomponents# %% Loading componentsdownload_from_gcs_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/storage/download/component.yaml")select_columns_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Select_columns/in_CSV_format/component.yaml")fill_all_missing_values_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Fill_all_missing_values/in_CSV_format/component.yaml")split_rows_into_subsets_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/dataset_manipulation/Split_rows_into_subsets/in_CSV/component.yaml")train_XGBoost_model_on_CSV_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/XGBoost/Train/component.yaml")xgboost_predict_on_CSV_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/XGBoost/Predict/component.yaml")upload_XGBoost_model_to_Google_Cloud_Vertex_AI_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Upload_XGBoost_model/component.yaml")deploy_model_to_endpoint_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/component.yaml")# %% Pipeline definitiondeftrain_tabular_regression_model_using_XGBoost_pipeline():dataset_gcs_uri="gs://ml-pipeline-dataset/Chicago_taxi_trips/chicago_taxi_trips_2019-01-01_-_2019-02-01_limit=10000.csv"feature_columns=["trip_seconds","trip_miles","pickup_community_area","dropoff_community_area","fare","tolls","extras"]# Excluded "trip_total"label_column="tips"training_set_fraction=0.8# Deploying the model might incur additional costs over timedeploy_model=Falseall_columns=[label_column]+feature_columnsdataset=download_from_gcs_op(gcs_path=dataset_gcs_uri).outputs["Data"]dataset=select_columns_using_Pandas_on_CSV_data_op(table=dataset,column_names=all_columns,).outputs["transformed_table"]dataset=fill_all_missing_values_using_Pandas_on_CSV_data_op(table=dataset,replacement_value="0",# # Optional:# column_names=None,  # =[...]).outputs["transformed_table"]split_task=split_rows_into_subsets_op(table=dataset,fraction_1=training_set_fraction,)training_data=split_task.outputs["split_1"]testing_data=split_task.outputs["split_2"]model=train_XGBoost_model_on_CSV_op(training_data=training_data,label_column_name=label_column,# Optional:#starting_model=None,#num_iterations=10,#booster_params={},#objective="reg:squarederror",#booster="gbtree",#learning_rate=0.3,#min_split_loss=0,#max_depth=6,).outputs["model"]# Predicting on the testing datapredictions=xgboost_predict_on_CSV_op(data=testing_data,model=model,# label_column needs to be set when doing prediction on a dataset that has labelslabel_column_name=label_column,).outputs["predictions"]vertex_model_name=upload_XGBoost_model_to_Google_Cloud_Vertex_AI_op(model=model,).outputs["model_name"]# Deploying the model might incur additional costs over timeifdeploy_model:vertex_endpoint_name=deploy_model_to_endpoint_op(model_name=vertex_model_name,).outputs["endpoint_name"]pipeline_func=train_tabular_regression_model_using_XGBoost_pipeline# %% Pipeline submissionif__name__=='__main__':fromgoogle.cloudimportaiplatformaiplatform.PipelineJob.from_pipeline_func(pipeline_func=pipeline_func).submit()

Scikit-learn

# python3 -m pip install "kfp<2.0.0" "google-cloud-aiplatform>=1.16.0" --upgrade --quietfromkfpimportcomponents# %% Loading componentsdownload_from_gcs_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/storage/download/component.yaml")select_columns_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Select_columns/in_CSV_format/component.yaml")fill_all_missing_values_using_Pandas_on_CSV_data_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/pandas/Fill_all_missing_values/in_CSV_format/component.yaml")train_linear_regression_model_using_scikit_learn_from_CSV_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/1f5cf6e06409b704064b2086c0a705e4e6b4fcde/community-content/pipeline_components/ML_frameworks/Scikit_learn/Train_linear_regression_model/from_CSV/component.yaml")upload_Scikit_learn_pickle_model_to_Google_Cloud_Vertex_AI_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Upload_Scikit-learn_pickle_model/component.yaml")deploy_model_to_endpoint_op=components.load_component_from_url("https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/399405402d95f4a011e2d2e967c96f8508ba5688/community-content/pipeline_components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/component.yaml")# %% Pipeline definitiondeftrain_tabular_regression_linear_model_using_Scikit_learn_pipeline():dataset_gcs_uri="gs://ml-pipeline-dataset/Chicago_taxi_trips/chicago_taxi_trips_2019-01-01_-_2019-02-01_limit=10000.csv"feature_columns=["trip_seconds","trip_miles","pickup_community_area","dropoff_community_area","fare","tolls","extras"]# Excluded "trip_total"label_column="tips"all_columns=[label_column]+feature_columns# Deploying the model might incur additional costs over timedeploy_model=Falsetraining_data=download_from_gcs_op(gcs_path=dataset_gcs_uri).outputs["Data"]training_data=select_columns_using_Pandas_on_CSV_data_op(table=training_data,column_names=all_columns,).outputs["transformed_table"]# Cleaning the NaN values.training_data=fill_all_missing_values_using_Pandas_on_CSV_data_op(table=training_data,replacement_value="0",#replacement_type_name="float",).outputs["transformed_table"]model=train_linear_regression_model_using_scikit_learn_from_CSV_op(dataset=training_data,label_column_name=label_column,).outputs["model"]vertex_model_name=upload_Scikit_learn_pickle_model_to_Google_Cloud_Vertex_AI_op(model=model,).outputs["model_name"]# Deploying the model might incur additional costs over timeifdeploy_model:sklearn_vertex_endpoint_name=deploy_model_to_endpoint_op(model_name=vertex_model_name,).outputs["endpoint_name"]pipeline_func=train_tabular_regression_linear_model_using_Scikit_learn_pipeline# %% Pipeline submissionif__name__=='__main__':fromgoogle.cloudimportaiplatformaiplatform.PipelineJob.from_pipeline_func(pipeline_func=pipeline_func).submit()

Note the following about code samples provided:

  • A Kubeflow pipeline is defined as a Python function.
  • The pipeline's workflow steps are created using Kubeflow pipelinecomponents. By using the outputs of a component as an input of anothercomponent, you define the pipeline's workflow as a graph. For example, thefill_all_missing_values_using_Pandas_on_CSV_data_op component task dependson thetransformed_table output from theselect_columns_using_Pandas_on_CSV_data_op component task.
  • You create a pipeline run on Vertex AI Pipelines using theVertex AI SDK for Python.

Monitor the pipeline

In the Google Cloud console, in the Vertex AI section, go to thePipelines page and open theRuns tab.

Go to Pipeline runs

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.