Run Vertex AI serverless training jobs on a persistent resource Stay organized with collections Save and categorize content based on your preferences.
This page shows you how to run a serverless trainingjob on a persistent resource byusing the Google Cloud CLI, Vertex AI SDK for Python, and the REST API.
Normally, when youcreate a serverless training job, you need tospecify compute resources that the job creates and runs on. After you create apersistent resource, you can instead configure theserverless trainingjob to run onone or more resource pools of that persistent resource. Running a customtraining job on a persistent resource significantly reduces the job startup timethat's otherwise needed for compute resource creation.
Required roles
To get the permission that you need to run serverless trainingjobs on a persistent resource, ask your administrator to grant you theVertex AI User (roles/aiplatform.user) IAM role on your project. For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains the aiplatform.customJobs.create permission, which is required to run serverless trainingjobs on a persistent resource.
You might also be able to get this permission withcustom roles or otherpredefined roles.
Create a training job that runs on a persistent resource
To create a serverless training job that runs on a persistent resource, make thefollowing modifications to the standard instructions forcreating a serverless training job:
gcloud
- Specify the
--persistent-resource-idflag and set the value to the ID of the persistent resource (PERSISTENT_RESOURCE_ID) that you want to use. - Specify the
--worker-pool-specflag such that the values formachine-typeanddisk-typematches exactly with a corresponding resource pool from the persistent resource. Specify one--worker-pool-specfor single node training and multiple for distributed training. - Specify a
replica-countless than or equal to thereplica-countormax-replica-countof the corresponding resource pool.
Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
defcreate_custom_job_on_persistent_resource_sample(project:str,location:str,staging_bucket:str,display_name:str,container_uri:str,persistent_resource_id:str,service_account:Optional[str]=None,)->None:aiplatform.init(project=project,location=location,staging_bucket=staging_bucket)worker_pool_specs=[{ "machine_spec": { "machine_type": "n1-standard-4", "accelerator_type": "NVIDIA_TESLA_K80", "accelerator_count": 1, }, "replica_count": 1, "container_spec": { "image_uri": container_uri, "command": [],"args":[],},}]custom_job=aiplatform.CustomJob(display_name=display_name,worker_pool_specs=worker_pool_specs,persistent_resource_id=persistent_resource_id,)custom_job.run(service_account=service_account)REST
- Specify the
persistent_resource_idparameter and set the value to the ID of the persistent resource (PERSISTENT_RESOURCE_ID) that you want to use. - Specify the
worker_pool_specsparameter such that the values ofmachine_specanddisk_specfor each resource pool matches exactly with a corresponding resource pool from the persistent resource. Specify onemachine_specfor single node training and multiple for distributed training. - Specify a
replica_countless than or equal to thereplica_countormax_replica_countof the corresponding resource pool, excluding the replica count of any other jobs running on that resource pool.
What's next
- Learn about persistent resource.
- Create and use a persistent resource.
- Get information about a persistent resource.
- Reboot a persistent resource.
- Delete a persistent resource.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.