Choose a Vertex AI serverless training method

If you're writing your own training code instead of using AutoML},there areseveral ways of doing Vertex AI serverless training to consider. This document provides abrief overview and comparison of the different ways you can runserverless training.

Serverless training resources on Vertex AI

There are three types of Vertex AI resources you can create totrain custom models on Vertex AI:

When you create acustom job, you specify settings that Vertex AIneeds to run your training code, including:

Within the worker pool(s), you can specify the following settings:

Hyperparameter tuning jobs have additional settings to configure, such as themetric. Learn more abouthyperparameter tuning.

Atraining pipeline orchestrates serverless trainingjobs or hyperparameter tuningjobs with additional steps, such as loading a dataset or uploading the model toVertex AI after the training job is successfully completed.

Serverless training resources

To view existing training pipelines in your project, go to theTrainingPipelines page in theVertex AI section of theGoogle Cloud console.

Go to Training pipelines

Note: TheTraining pipelines page shows AutoML training pipelines, inaddition to serverless training pipelines. You can use theModel type column todistinguish between the two.

To view existing custom jobs in your project, go to theCustom jobs page.

Go to Custom jobs

To view existing hyperparameter tuning jobs in your project, go to theHyperparameter tuning page.

Go to Hyperparameter tuning

Prebuilt and custom containers

Before you submit a serverless training job, hyperparametertuning job, or atraining pipeline to Vertex AI, you need to create aPythontraining application or acustom container to define the training code anddependencies you want to run on Vertex AI. If you create a Pythontraining application using TensorFlow, PyTorch, scikit-learn, or XGBoost, youcan use our prebuilt containers to run your code. If you're not sure which ofthese options to choose, refer to thetraining code requirements to learn more.

Distributed training

You can configure a serverless training job,hyperparameter tuning job, or atraining pipeline for distributed training by specifying multipleworker pools:

  • Use your first worker pool to configure your primary replica, and setthe replica count to 1.
  • Add more worker pools to configure worker replicas, parameter serverreplicas, or evaluator replicas, if your machine learning frameworksupports these additional cluster tasks for distributed training.

Learn more aboutusing distributed training.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.