Use Vertex AI TensorBoard with Vertex AI Pipelines

Your training code can be packaged into a custom training component and runin a pipeline job. TensorBoard logs are automatically streamed to yourVertex AI TensorBoard experiment.You can use this integration to monitor your training in near real timeas Vertex AI TensorBoard streams in Vertex AI TensorBoardlogs as they are written to Cloud Storage.

For initial setup seeSet up for Vertex AI TensorBoard.

Changes to your training script

Your training script must be configured to write TensorBoard logs to theCloud Storage bucket, the location of which the Vertex AI TrainingService will automatically make available through a predefined environmentvariableAIP_TENSORBOARD_LOG_DIR.

This can usually be done by providingos.environ['AIP_TENSORBOARD_LOG_DIR']as the log directory to the open source TensorBoard log writing APIs. The locationof theAIP_TENSORBOARD_LOG_DIR is typically set with thestaging_bucketvariable.

To configure your training script in TensorFlow 2.x, create a TensorBoardcallback and set thelog_dir variable toos.environ['AIP_TENSORBOARD_LOG_DIR']The TensorBoard callback is then included in the TensorFlowmodel.fit callbackslist.

tensorboard_callback=tf.keras.callbacks.TensorBoard(log_dir=os.environ['AIP_TENSORBOARD_LOG_DIR'],histogram_freq=1)model.fit(x=x_train,y=y_train,epochs=epochs,validation_data=(x_test,y_test),callbacks=[tensorboard_callback],)

Learn more abouthow Vertex AIsets environment variables in your custom training environment.

Build and run a pipeline

The following example shows how to build and run a pipeline using Kubeflow Pipelines DSL package.For more examples and additional details, seeVertex AI Pipelines documentation.

Create a training component

Package your training code into a custom component, making sure that the codeis configured to write TensorBoard logs to a Cloud Storage bucket.For more examples seeBuild your own pipeline components.

fromkfp.v2.dslimportcomponent@component(base_image="tensorflow/tensorflow:latest",packages_to_install=["tensorflow_datasets"],)deftrain_tensorflow_model_with_tensorboard():importdatetime,osimporttensorflowastf(x_train,y_train),(x_test,y_test)=tf.keras.datasets.mnist.load_data()x_train,x_test=x_train/255.0,x_test/255.0defcreate_model():returntf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28,28)),tf.keras.layers.Dense(512,activation="relu"),])model=create_model()model.compile(optimizer="adam",loss="sparse_categorical_crossentropy",metrics=["accuracy"])tensorboard_callback=tf.keras.callbacks.TensorBoard(log_dir=os.environ['AIP_TENSORBOARD_LOG_DIR'],histogram_freq=1)model.fit(x=x_train,y=y_train,epochs=5,validation_data=(x_test,y_test),callbacks=[tensorboard_callback],)

Build and compile a pipeline

Create a custom training job from the component you've created by specifyingthe component spec increate_custom_training_job_op_from_component.Set thetensorboard_resource_name to your TensorBoard instance,and thestaging_bucket to the location to stage artifacts duringAPI calls (including TensorBoard logs).

Then, build a pipeline to include this job and compile the pipeline to aJSON file.

For more examples and information, seeCustom job components andBuild a pipeline.

fromkfp.v2importcompilerfromgoogle_cloud_pipeline_components.v1.custom_job.utilsimport \create_custom_training_job_op_from_componentfromkfp.v2importdsldefcreate_tensorboard_pipeline_sample(project,location,staging_bucket,display_name,service_account,experiment,tensorboard_resource_name):@dsl.pipeline(pipeline_root=f"{staging_bucket}/pipeline_root",name=display_name,)defpipeline():custom_job_op=create_custom_training_job_op_from_component(component_spec=train_tensorflow_model_with_tensorboard,tensorboard=tensorboard_resource_name,base_output_directory=staging_bucket,service_account=service_account,)custom_job_op(project=project,location=location)compiler.Compiler().compile(pipeline_func=pipeline,package_path=f"{display_name}.json")

Submit a Vertex AI pipeline

Submit your pipeline using the Vertex AI SDK for Python. For more information,seeRun a pipeline.

Python

fromtypingimportAny,Dict,Optionalfromgoogle.cloudimportaiplatformdeflog_pipeline_job_to_experiment_sample(experiment_name:str,pipeline_job_display_name:str,template_path:str,pipeline_root:str,project:str,location:str,parameter_values:Optional[Dict[str,Any]]=None,):aiplatform.init(project=project,location=location)pipeline_job=aiplatform.PipelineJob(display_name=pipeline_job_display_name,template_path=template_path,pipeline_root=pipeline_root,parameter_values=parameter_values,)pipeline_job.submit(experiment=experiment_name)
  • experiment_name: Provide a name for your experiment.
  • pipeline_job_display_name: The display name for the pipeline job.
  • template_path: The path to the compiled pipeline template.
  • pipeline_root: Specify a Cloud Storage URI that your pipelines service account can access. The artifacts of your pipeline runs are stored within the pipeline root.
  • parameter_values: The pipeline parameters to pass to this run. For example, create adict() with the parameter names as the dictionary keys and the parameter values as the dictionary values.
  • project: . The Google Cloud project to run the pipeline in. You can find your IDs in the Google Cloud consolewelcome page.
  • location: The location to run the pipeline in. This should be the same location as the TensorBoard instance you're using.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.