Use a private IP for Vertex AI serverless training Stay organized with collections Save and categorize content based on your preferences.
Using private IP to connect to your training jobs provides morenetwork security and lower network latency than using public IP. To use privateIP, you useVirtual Private Cloud (VPC) to peer yournetwork with any type ofVertex AI serverless training job.This allows your training code to access private IP addresses inside yourGoogle Cloud or on-premises networks.
This guide shows how to run serverless training jobs in your network after you havealreadyset up VPC Network Peering to peer your networkwith a Vertex AICustomJob,HyperparameterTuningJob, or customTrainingPipeline resource.
Overview
Before you submit a serverless training job using private IP, you mustconfigureprivate services access to create peering connections between your networkand Vertex AI. If you have already set this up,you can use your existing peering connections.
This guide covers the following tasks:
- Understanding which IP ranges to reserve for serverless training.
- Verify the status of your existing peering connections.
- Perform Vertex AI serverless training on your network.
- Check for active training occurring on one network before training onanother network.
- Test that your training code can access private IPs in your network.
Reserve IP ranges for serverless training
When you reserve an IP range for service producers, the range can be used byVertex AI and other services. This table shows the maximum numberof parallel training jobs that you can run with reserved ranges from /16 to /18,assuming the range is used almost exclusively by Vertex AI. If youconnect with other service producers using the same range, allocate a largerrange to accommodate them, in order to avoid IP exhaustion.
| Machine configuration for training job | Reserved range | Maximum number of parallel jobs | |
|---|---|---|---|
| Up to 8 nodes. For example: 1 primary replica in the first worker pool, 6 replicas in thesecond worker pool, and 1 worker in the third worker pool (to act as a parameterserver) | /16 | 63 | |
| /17 | 31 | ||
| /18 | 15 | ||
| Up to 16 nodes. For example: 1 primary replica in the first worker pool, 14 replicas in thesecond worker pool, and 1 worker in the third worker pool (to act as a parameterserver) | /16 | 31 | |
| /17 | 15 | ||
| /18 | 7 | ||
| Up to 32 nodes. For example: 1 primary replica in the first worker pool, 30 replicas in thesecond worker pool, and 1 worker in the third worker pool (to act as a parameterserver) | /16 | 15 | |
| /17 | 7 | ||
| /18 | 3 | ||
Learn more aboutconfiguring worker pools for distributedtraining.
Check the status of existing peering connections
If you have existing peering connections you use with Vertex AI,you can list them to check status:
gcloudcomputenetworkspeeringslist--networkNETWORK_NAMEYou should see that the state of your peering connections areACTIVE.Learn more aboutactive peering connections.
Perform serverless training
When you perform serverless training, you must specify the name of thenetwork that you want Vertex AI to have access to.
Depending on how you perform serverless training, specify the network in one of thefollowing API fields:
If you are creating a
CustomJob, specify theCustomJob.jobSpec.networkfield.If you are using the Google Cloud CLI, then you can use the
--configflag onthegcloud ai custom-jobs createcommand to specify thenetworkfield.Learn more aboutcreating a
CustomJob.If you are creating a
HyperparameterTuningJob,specify theHyperparameterTuningJob.trialJobSpec.networkfield.If you are using the gcloud CLI, then you can use the
--configflag on thegcloud ai hpt-tuning-jobs createcommand to specify thenetworkfield.Learn more aboutcreating a
HyperparameterTuningJob.If you are creating a
TrainingPipelinewithouthyperparameter tuning, specify theTrainingPipeline.trainingTaskInputs.networkfield.Learn more aboutcreating a custom
TrainingPipeline.If you are creating a
TrainingPipelinewith hyperparameter tuning,specify theTrainingPipeline.trainingTaskInputs.trialJobSpec.networkfield.
If you don't specify a network name, then Vertex AI runs yourserverless training without a peering connection, and without access to private IPsin your project.
Example: Creating aCustomJob with the gcloud CLI
The following example shows how to specify a network when you use thegcloud CLI to run aCustomJob that uses a prebuilt container. Ifyou are perform serverless training in a different way, add thenetwork fieldas described for the type of serverless training job you're using.
Create a
config.yamlfile to specify the network. If you're usingShared VPC, use your VPC host project number.Make sure the network name is formatted correctly:
PROJECT_NUMBER=$(gcloudprojectsdescribe$PROJECT_ID--format="value(projectNumber)")cat<<EOF >config.yamlnetwork:projects/PROJECT_NUMBER/global/networks/NETWORK_NAMEEOFCreate a training applicationto run on Vertex AI.
Create the
CustomJob, passing in yourconfig.yamlfile:gcloudaicustom-jobscreate\--region=LOCATION\--display-name=JOB_NAME\--python-package-uris=PYTHON_PACKAGE_URIS\--worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,executor-image-uri=PYTHON_PACKAGE_EXECUTOR_IMAGE_URI,python-module=PYTHON_MODULE\--config=config.yaml
To learn how to replace the placeholders in this command, readCreating customtraining jobs.
Run jobs on different networks
You can't perform serverless training on a new network while you are stillperforming serverless training on another network. Before you switch to a differentnetwork, you must wait for all submittedCustomJob,HyperparameterTuningJob,and customTrainingPipeline resources to finish, or you must cancel them.
Test training job access
This section explains how to test that a serverless training resource can accessprivate IPs in your network.
- Create a Compute Engine instance in your VPC network.
- Check your firewall rules to make sure that they don'trestrict ingress from the Vertex AI network. If so, add arule to ensure the Vertex AI network can access the IP range youreserved for Vertex AI (and other service producers).
- Set up a local server on the VM instance in order to create an endpoint for aVertex AI
CustomJobto access. - Create a Python training application to run on Vertex AI.Instead of model training code, create code that accesses the endpoint youset up in the previous step.
- Follow the previous example to create a
CustomJob.
Common problems
This section lists some common issues for configuring VPC Network Peering withVertex AI.
When you configure Vertex AI to use your network, specify thefull network name:
"projects/YOUR_PROJECT_NUMBER/global/networks/YOUR_NETWORK_NAME"
Make sure you are not performing serverless training on a network beforeperforming serverless training on a different network.
Make sure that you've allocated a sufficient IP range for all serviceproducers your network connects to, including Vertex AI.
For additional troubleshooting information, refer to theVPC Network Peering troubleshooting guide.
What's next
- Learn more aboutVPC Network Peering.
- Seereference architectures and best practicesfor VPC design.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.