CustomJobSpec

Represents the spec of a CustomJob.

Fields
persistentResourceIdstring

Optional. The id of the PersistentResource in the same Project and Location which to run

If this is specified, the job will be run on existing machines held by the PersistentResource instead of on-demand short-live machines. The network and CMEK configs on the job should be consistent with those on the PersistentResource, otherwise, the job will be rejected.

workerPoolSpecs[]object (WorkerPoolSpec)

Required. The spec of the worker pools including machine type and Docker image. All worker pools except the first one are optional and can be skipped by providing an empty value.

schedulingobject (Scheduling)

Scheduling options for a CustomJob.

serviceAccountstring

Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account. If unspecified, theVertex AI Custom code service Agent for the CustomJob's project is used.

networkstring

Optional. The full name of the Compute Enginenetwork to which the Job should be peered. For example,projects/12345/global/networks/myVPC.Format is of the formprojects/{project}/global/networks/{network}. Where {project} is a project number, as in12345, and {network} is a network name.

To specify this field, you must have alreadyconfigured VPC Network Peering for Vertex AI.

If this field is left unspecified, the job is not peered with any network.

reservedIpRanges[]string

Optional. A list of names for the reserved ip ranges under the VPC network that can be used for this job.

If set, we will deploy the job within the provided ip ranges. Otherwise, the job will be deployed to any ip ranges under the provided VPC network.

Example: ['vertex-ai-ip-range'].

pscInterfaceConfigobject (PscInterfaceConfig)

Optional. Configuration for PSC-I for CustomJob.

baseOutputDirectoryobject (GcsDestination)

The Cloud Storage location to store the output of this CustomJob or HyperparameterTuningJob. For HyperparameterTuningJob, the baseOutputDirectory of each child CustomJob backing a Trial is set to a subdirectory of nameid under its parent HyperparameterTuningJob's baseOutputDirectory.

The following Vertex AI environment variables will be passed to containers or python modules when this field is set:

For CustomJob:

  • AIP_MODEL_DIR =<baseOutputDirectory>/model/
  • AIP_CHECKPOINT_DIR =<baseOutputDirectory>/checkpoints/
  • AIP_TENSORBOARD_LOG_DIR =<baseOutputDirectory>/logs/

For CustomJob backing a Trial of HyperparameterTuningJob:

  • AIP_MODEL_DIR =<baseOutputDirectory>/<trial_id>/model/
  • AIP_CHECKPOINT_DIR =<baseOutputDirectory>/<trial_id>/checkpoints/
  • AIP_TENSORBOARD_LOG_DIR =<baseOutputDirectory>/<trial_id>/logs/
protectedArtifactLocationIdstring

The id of the location to store protected artifacts. e.g. us-central1. Populate only when the location is different than CustomJob location. List of supported locations:https://cloud.google.com/vertex-ai/docs/general/locations

tensorboardstring

Optional. The name of a Vertex AITensorboard resource to which this CustomJob will upload Tensorboard logs. Format:projects/{project}/locations/{location}/tensorboards/{tensorboard}

enableWebAccessboolean

Optional. Whether you want Vertex AI to enableinteractive shell access to training containers.

If set totrue, you can access interactive shells at the URIs given byCustomJob.web_access_uris orTrial.web_access_uris (withinHyperparameterTuningJob.trials).

enableDashboardAccessboolean

Optional. Whether you want Vertex AI to enable access to the customized dashboard in training chief container.

If set totrue, you can access the dashboard at the URIs given byCustomJob.web_access_uris orTrial.web_access_uris (withinHyperparameterTuningJob.trials).

experimentstring

Optional. The Experiment associated with this job. Format:projects/{project}/locations/{location}/metadataStores/{metadataStores}/contexts/{experiment-name}

experimentRunstring

Optional. The Experiment Run associated with this job. Format:projects/{project}/locations/{location}/metadataStores/{metadataStores}/contexts/{experiment-name}-{experiment-run-name}

models[]string

Optional. The name of the Model resources for which to generate a mapping to artifact URIs. Applicable only to some of the Google-provided custom jobs. Format:projects/{project}/locations/{location}/models/{model}

In order to retrieve a specific version of the model, also provide the version id or version alias. Example:projects/{project}/locations/{location}/models/{model}@2 orprojects/{project}/locations/{location}/models/{model}@golden If no version id or alias is specified, the "default" version will be returned. The "default" version alias is created for the first version of the model, and can be moved to other versions later on. There will be exactly one default version.

JSON representation
{"persistentResourceId":string,"workerPoolSpecs":[{object (WorkerPoolSpec)}],"scheduling":{object (Scheduling)},"serviceAccount":string,"network":string,"reservedIpRanges":[string],"pscInterfaceConfig":{object (PscInterfaceConfig)},"baseOutputDirectory":{object (GcsDestination)},"protectedArtifactLocationId":string,"tensorboard":string,"enableWebAccess":boolean,"enableDashboardAccess":boolean,"experiment":string,"experimentRun":string,"models":[string]}

WorkerPoolSpec

Represents the spec of a worker pool in a job.

Fields
machineSpecobject (MachineSpec)

Optional. Immutable. The specification of a single machine.

replicaCountstring (int64 format)

Optional. The number of worker replicas to use for this worker pool.

nfsMounts[]object (NfsMount)

Optional. List of NFS mount spec.

lustreMounts[]object (LustreMount)

Optional. List of Lustre mounts.

diskSpecobject (DiskSpec)

Disk spec.

taskUnion type
The custom task to be executed in this worker pool.task can be only one of the following:
containerSpecobject (ContainerSpec)

The custom container task.

pythonPackageSpecobject (PythonPackageSpec)

The Python packaged task.

JSON representation
{"machineSpec":{object (MachineSpec)},"replicaCount":string,"nfsMounts":[{object (NfsMount)}],"lustreMounts":[{object (LustreMount)}],"diskSpec":{object (DiskSpec)},// task"containerSpec":{object (ContainerSpec)},"pythonPackageSpec":{object (PythonPackageSpec)}// Union type}

ContainerSpec

The spec of a Container.

Fields
imageUristring

Required. The URI of a container image in the Artifact Registry that is to be run on each worker replica.

command[]string

The command to be invoked when the container is started. It overrides the entrypoint instruction in Dockerfile when provided.

args[]string

The arguments to be passed when starting the container.

env[]object (EnvVar)

Environment variables to be passed to the container. Maximum limit is 100.

JSON representation
{"imageUri":string,"command":[string],"args":[string],"env":[{object (EnvVar)}]}

PythonPackageSpec

The spec of a Python packaged code.

Fields
executorImageUristring

Required. The URI of a container image in Artifact Registry that will run the provided Python package. Vertex AI provides a wide range of executor images with pre-installed packages to meet users' various use cases. See the list ofpre-built containers for training. You must use an image from this list.

packageUris[]string

Required. The Google Cloud Storage location of the Python package files which are the training program and its dependent packages. The maximum number of package URIs is 100.

pythonModulestring

Required. The Python module name to run after installing the packages.

args[]string

Command line arguments to be passed to the Python task.

env[]object (EnvVar)

Environment variables to be passed to the python module. Maximum limit is 100.

JSON representation
{"executorImageUri":string,"packageUris":[string],"pythonModule":string,"args":[string],"env":[{object (EnvVar)}]}

LustreMount

Represents a mount configuration for Lustre file system.

Fields
instanceIpstring

Required. IP address of the Lustre instance.

volumeHandlestring

Required. The unique identifier of the Lustre volume.

filesystemstring

Required. The name of the Lustre filesystem.

mountPointstring

Required. Destination mount path. The Lustre file system will be mounted for the user under /mnt/lustre/

JSON representation
{"instanceIp":string,"volumeHandle":string,"filesystem":string,"mountPoint":string}

Scheduling

All parameters related to queuing and scheduling of custom jobs.

Fields
timeoutstring (Duration format)

Optional. The maximum job running time. The default is 7 days.

A duration in seconds with up to nine fractional digits, ending with 's'. Example:"3.5s".

restartJobOnWorkerRestartboolean

Optional. Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.

strategyenum (Strategy)

Optional. This determines which type of scheduling strategy to use.

disableRetriesboolean

Optional. Indicates if the job should retry for internal errors after the job starts running. If true, overridesScheduling.restart_job_on_worker_restart to false.

maxWaitDurationstring (Duration format)

Optional. This is the maximum duration that a job will wait for the requested resources to be provisioned if the scheduling strategy is set to [Strategy.DWS_FLEX_START]. If set to 0, the job will wait indefinitely. The default is 24 hours.

A duration in seconds with up to nine fractional digits, ending with 's'. Example:"3.5s".

JSON representation
{"timeout":string,"restartJobOnWorkerRestart":boolean,"strategy":enum (Strategy),"disableRetries":boolean,"maxWaitDuration":string}

Strategy

Optional. This determines which type of scheduling strategy to use. Right now users have two options such as STANDARD which will use regular on demand resources to schedule the job, the other is SPOT which would leverage spot resources alongwith regular resources to schedule the job.

Enums
STRATEGY_UNSPECIFIEDStrategy will default to STANDARD.
ON_DEMAND

Deprecated. Regular on-demand provisioning strategy.

This item is deprecated!

LOW_COST

Deprecated. Low cost by making potential use of spot resources.

This item is deprecated!

STANDARDStandard provisioning strategy uses regular on-demand resources.
SPOTSpot provisioning strategy uses spot resources.
FLEX_STARTFlex Start strategy uses DWS to queue for resources.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.