Create a persistent resource Stay organized with collections Save and categorize content based on your preferences.
When you create a persistent resource, the training service first findsresources from the Compute Engine resource pool based on the specificationsyou provided, and then provisions a long-running cluster for you. This pageshows you how to create a persistent resource for running yourserverless training jobs by using the Google Cloud console,Google Cloud CLI, and the REST API.
Required roles
To get the permission that you need to create a persistent resource, ask your administrator to grant you theVertex AI Administrator (roles/aiplatform.admin) IAM role on your project. For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains the aiplatform.persistentResources.create permission, which is required to create a persistent resource.
You might also be able to get this permission withcustom roles or otherpredefined roles.
Create a persistent resource
Select one of the following tabs for instructions on how to create a persistentresource.
Console
To create a persistent resource by using the Google Cloud console, do the following:
In the Google Cloud console, go to thePersistent resources page.
ClickCreate cluster.
Configure the cluster as follows:
- Name: Enter a name for the cluster.
- Description: (Optional) Enter a description of the cluster.
- Region: Select the region where you want to create the cluster.
ClickContinue.
Configure the compute resources for the cluster as follows:
- ClickWorker pool 1.
Select the tab of the machine family that you want to use and configure the worker pool as follows:
General purpose
General purpose VMs offer the best price-performance ratio fora variety of workloads.
- Series: Select a machine series.
- Machine type: Select a machine type.
- Disk type: SelectStandard disk orSSD disk.
- Disk size: Enter the size of the disk you want.
- Minimum replica count: Enter the minimum number of replicasto have in the worker pool.
- Maximum replica count: (Optional) Enter the maximum numberof replicas allowed in the worker pool. If specified, the workerpool automatically scales the number of replicas up to theconfigured maximum replica count as needed.
Compute optimized
Compute-optimized VMs offer the highest performance per core and areoptimized for compute-intensive workloads.
- Series: Select a machine series.
- Machine type: Select a machine type.
- Disk type: SelectStandard disk orSSD disk.
- Disk size: Enter the size of the disk you want.
- Minimum replica count: Enter the minimum number of replicasto have in the worker pool.
- Maximum replica count: (Optional) Enter the maximum numberof replicas allowed in the worker pool. If specified, the workerpool automatically scales the number of replicas up to theconfigured maximum replica count as needed.
Memory optimized
Memory-optimized VMs are ideal for memory-intensive workloads,offering more memory per core than other machine families, with upto 12 TB of memory.
- Series: Select a machine series.
- Machine type: Select a machine type.
- Disk type: SelectStandard disk orSSD disk.
- Disk size: Enter the size of the disk you want.
- Minimum replica count: Enter the minimum number of replicasto have in the worker pool.
- Maximum replica count: (Optional) Enter the maximum numberof replicas allowed in the worker pool. If specified, the workerpool automatically scales the number of replicas up to theconfigured maximum replica count as needed.
GPUs
These accelerator-optimized VMs are ideal for massively parallelizedCompute Unified Device Architecture (CUDA) compute workloads, suchas machine learning (ML) and high performance computing (HPC). Thisfamily is the best option for workloads that require GPUs.
- GPU type: Select the type of GPU that you want to use.
- Number of GPUs: Enter the number of GPUs you want to use.
- Series: Select a machine series.
- Machine type: Select a machine type.
- Disk type: SelectStandard disk orSSD disk.
- Disk size: Enter the size of the disk you want.
- Minimum replica count: Enter the minimum number of replicasto have in the worker pool.
- Maximum replica count: (Optional) Enter the maximum numberof replicas allowed in the worker pool. If specified, the workerpool automatically scales the number of replicas up to theconfigured maximum replica count as needed.
ClickDone.
(Optional) To add additional worker pools, clickAdd worker pool.
ClickCreate.
gcloud
A persistent resource can have one or more resource pools. To create multipleresource pools in a persistent resource, specify multiple--resource-pool-spec flags.
Each resource pool can have autoscaling either enabled or disabled. To enableautoscaling, specifymin_replica_count andmax_replica_count.
You can specify all resource pool configurations as part of the command-lineor use the--config flag to specify the path to a YAML file thatcontains the configurations.
Before using any of the command data below, make the following replacements:
- PROJECT_ID: The Project ID of the Google Cloud project where you want to create the persistent resource.
- LOCATION: The region where you want to create the persistent resource. For a list of supported regions, seeFeature availability.
- PERSISTENT_RESOURCE_ID: The ID of the persistent resource.
- DISPLAY_NAME: (Optional) The display name of the persistent resource.
- MACHINE_TYPE: The type of VM to use. For a list of supported VMs, seeMachine types. This field corresponds to the
machineSpec.machineTypefield in theResourcePoolAPI message. - ACCELERATOR_TYPE: (Optional) The type of GPU to attach to each VM in the resource pool. For a list of supported GPUs, seeGPUs. This field corresponds to the
machineSpec.acceleratorTypefield in theResourcePoolAPI message. - ACCELERATOR_COUNT: (Optional) The number of GPUs to attach to each VM in the resource pool. The default the value is
1. This field corresponds to themachineSpec.acceleratorCountfield inResourcePoolAPI message. - REPLICA_COUNT: The number of replicas to create when creating this resource pool. This field corresponds to the
replicaCountfield in theResourcePoolAPI message. This field is required if you're not specifyingMIN_REPLICA_COUNT andMAX_REPLICA_COUNT. - MIN_REPLICA_COUNT: (Optional) The minimum number of replicas that autoscaling can scale down to for this resource pool. BothMIN_REPLICA_COUNT andMAX_REPLICA_COUNT are required to enable autoscaling on this resource pool.
- MAX_REPLICA_COUNT: (Optional) The maximum number of replicas that autoscaling can scale up to for this resource pool. BothMIN_REPLICA_COUNT andMAX_REPLICA_COUNT are required to enable autoscaling on this resource pool.
- BOOT_DISK_TYPE: (Optional) The type of disk to use for as the boot disk of each VM in the resource pool. This field corresponds to the
diskSpec.bootDiskTypefield in theResourcePoolAPI message. Acceptable values include the following:pd-standard(default)pd-ssd
- BOOT_DISK_SIZE_GB: (Optional) The disk size in GiB for the boot disk of each VM in the resource pool. Acceptable values are
100(default) to64000. This field corresponds to thediskSpec.bootDiskSizeGbfield in theResourcePoolAPI message. - CONFIG: Path to the persistent resource YAML configuration file. This file should contain a list of ResourcePool. If an option is specified in both the configuration file and the command-line arguments, the command-line arguments override the configuration file. Note that keys with underscores are invalid.
Example YAML configuration file:
resourcePoolSpecs: machineSpec: machineType: n1-standard-4 replicaCount: 1
Execute the following command:
Linux, macOS, or Cloud Shell
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaipersistent-resourcescreate\--persistent-resource-id=PERSISTENT_RESOURCE_ID\--display-name=DISPLAY_NAME\--project=PROJECT_ID\--region=LOCATION\--resource-pool-spec="replica-count=REPLICA_COUNT,min-replica-count=MIN_REPLICA_COUNT,max-replica-count=MAX_REPLICA_COUNT,machine-type=MACHINE_TYPE,accelerator-type=ACCELERATOR_TYPE,accelerator-count=ACCELERATOR_COUNT,disk-type=BOOT_DISK_TYPE,disk-size=BOOT_DISK_SIZE_GB"
Windows (PowerShell)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaipersistent-resourcescreate`--persistent-resource-id=PERSISTENT_RESOURCE_ID`--display-name=DISPLAY_NAME`--project=PROJECT_ID`--region=LOCATION`--resource-pool-spec="replica-count=REPLICA_COUNT,min-replica-count=MIN_REPLICA_COUNT,max-replica-count=MAX_REPLICA_COUNT,machine-type=MACHINE_TYPE,accelerator-type=ACCELERATOR_TYPE,accelerator-count=ACCELERATOR_COUNT,disk-type=BOOT_DISK_TYPE,disk-size=BOOT_DISK_SIZE_GB"
Windows (cmd.exe)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaipersistent-resourcescreate^--persistent-resource-id=PERSISTENT_RESOURCE_ID^--display-name=DISPLAY_NAME^--project=PROJECT_ID^--region=LOCATION^--resource-pool-spec="replica-count=REPLICA_COUNT,min-replica-count=MIN_REPLICA_COUNT,max-replica-count=MAX_REPLICA_COUNT,machine-type=MACHINE_TYPE,accelerator-type=ACCELERATOR_TYPE,accelerator-count=ACCELERATOR_COUNT,disk-type=BOOT_DISK_TYPE,disk-size=BOOT_DISK_SIZE_GB"
You should receive a response similar to the following:
Using endpoint [https://us-central1-aiplatform.googleapis.com/]Operation to create PersistentResource [projects/123456789012/locations/us-central1/persistentResources/mypersistentresource/operations/1234567890123456789] is submitted successfully.You may view the status of your PersistentResource create operation with the command $ gcloud ai operations describe projects/sample-project/locations/us-central1/operations/1234567890123456789
Examplegcloud command:
gcloud ai persistent-resources create \ --persistent-resource-id=my-persistent-resource \ --region=us-central1 \ --resource-pool-spec="min-replica-count=4,max-replica-count=12,machine-type=n1-highmem-2,accelerator-type=NVIDIA_TESLA_T4,accelerator-count=1,disk-type=pd-standard,disk-size=200" \ --resource-pool-spec="replica-count=4,machine-type=n1-standard-4"
Advancedgcloud configurations
If you want to specify configuration options that are not available in the preceding examples, you can use the--config flag to specify the path to aconfig.yaml file in your local environment that contains the fields ofpersistentResources. For example:
gcloud ai persistent-resources create \ --persistent-resource-id=PERSISTENT_RESOURCE_ID \ --project=PROJECT_ID \ --region=LOCATION \ --config=CONFIG
Python
Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
To create apersistent resource that you can use with a pipeline run, set theenable_custom_service_accountparameter toTrue in theResourceRuntimeSpec object while creating thepersistent resource.fromgoogle.cloud.aiplatform.previewimportpersistent_resourcefromgoogle.cloud.aiplatform_v1beta1.types.persistent_resourceimportResourcePoolfromgoogle.cloud.aiplatform_v1beta1.types.machine_resourcesimportMachineSpec# Create the persistent resource. This method returns the created resource.my_example_resource=persistent_resource.PersistentResource.create(persistent_resource_id='PERSISTENT_RESOURCE_ID',display_name='DISPLAY_NAME',resource_pools=[ResourcePool(machine_spec=MachineSpec(machine_type='MACHINE_TYPE'),replica_count=REPLICA_COUNT)],enable_custom_service_account=True,)# Setting `sync` to `False` makes the method is non-blocking and the resource# object returned syncs when the method completes.SYNC=FalseifnotSYNC:my_example_resource.wait()
Replace the following:
- PERSISTENT_RESOURCE_ID: A unique, user-defined ID for thepersistent resource. It must start with a letter, end with a letter ornumber, and contain only lowercase letters, numbers, and hyphens (-).
- DISPLAY_NAME: Optional. The display name of thepersistent resource.
- MACHINE_TYPE: The type of virtual machine (VM)to use. For a list of supported VMs, seeMachine types.This field corresponds to the
machineSpec.machineTypefield in the - REPLICA_COUNT: The number of replicas to createwhen creating this resource pool.
REST
A persistent resource can have one or more resource pools(machine_spec), and each resource pool can have autoscaling eitherenabled or disabled.
Before using any of the request data, make the following replacements:
- PROJECT_ID: The Project ID of the Google Cloud project where you want to create the persistent resource.
- LOCATION: The region where you want to create the persistent resource. For a list of supported regions, seeFeature availability.
- PERSISTENT_RESOURCE_ID: The ID of the persistent resource.
- DISPLAY_NAME: (Optional) The display name of the persistent resource.
- MACHINE_TYPE: The type of VM to use. For a list of supported VMs, seeMachine types. This field corresponds to the
machineSpec.machineTypefield in theResourcePoolAPI message. - ACCELERATOR_TYPE: (Optional) The type of GPU to attach to each VM in the resource pool. For a list of supported GPUs, seeGPUs. This field corresponds to the
machineSpec.acceleratorTypefield in theResourcePoolAPI message. - ACCELERATOR_COUNT: (Optional) The number of GPUs to attach to each VM in the resource pool. The default the value is
1. This field corresponds to themachineSpec.acceleratorCountfield inResourcePoolAPI message. - REPLICA_COUNT: The number of replicas to create when creating this resource pool. This field corresponds to the
replicaCountfield in theResourcePoolAPI message. This field is required if you're not specifyingMIN_REPLICA_COUNT andMAX_REPLICA_COUNT. - MIN_REPLICA_COUNT: (Optional) The minimum number of replicas that autoscaling can scale down to for this resource pool. BothMIN_REPLICA_COUNT andMAX_REPLICA_COUNT are required to enable autoscaling on this resource pool.
- MAX_REPLICA_COUNT: (Optional) The maximum number of replicas that autoscaling can scale up to for this resource pool. BothMIN_REPLICA_COUNT andMAX_REPLICA_COUNT are required to enable autoscaling on this resource pool.
- BOOT_DISK_TYPE: (Optional) The type of disk to use for as the boot disk of each VM in the resource pool. This field corresponds to the
diskSpec.bootDiskTypefield in theResourcePoolAPI message. Acceptable values include the following:pd-standard(default)pd-ssd
- BOOT_DISK_SIZE_GB: (Optional) The disk size in GiB for the boot disk of each VM in the resource pool. Acceptable values are
100(default) to64000. This field corresponds to thediskSpec.bootDiskSizeGbfield in theResourcePoolAPI message.
HTTP method and URL:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/persistentResources?persistent_resource_id=PERSISTENT_RESOURCE_ID
Request JSON body:
{ "display_name": "DISPLAY_NAME", "resource_pools": [ { "machine_spec": { "machine_type": "MACHINE_TYPE", "accelerator_type": "ACCELERATOR_TYPE", "accelerator_count":ACCELERATOR_COUNT }, "replica_count":REPLICA_COUNT, "autoscaling_spec": { "min_replica_count":MIN_REPLICA_COUNT, "max_replica_count":MAX_REPLICA_COUNT }, "disk_spec": { "boot_disk_type": "BOOT_DISK_TYPE", "boot_disk_size_gb":BOOT_DISK_SIZE_GB } } ]}To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json. Run the following command in the terminal to create or overwrite this file in the current directory:
cat > request.json<< 'EOF'{ "display_name": "DISPLAY_NAME", "resource_pools": [ { "machine_spec": { "machine_type": "MACHINE_TYPE", "accelerator_type": "ACCELERATOR_TYPE", "accelerator_count":ACCELERATOR_COUNT }, "replica_count":REPLICA_COUNT, "autoscaling_spec": { "min_replica_count":MIN_REPLICA_COUNT, "max_replica_count":MAX_REPLICA_COUNT }, "disk_spec": { "boot_disk_type": "BOOT_DISK_TYPE", "boot_disk_size_gb":BOOT_DISK_SIZE_GB } } ]}EOFThen execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/persistentResources?persistent_resource_id=PERSISTENT_RESOURCE_ID"
PowerShell (Windows)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json. Run the following command in the terminal to create or overwrite this file in the current directory:
@'{ "display_name": "DISPLAY_NAME", "resource_pools": [ { "machine_spec": { "machine_type": "MACHINE_TYPE", "accelerator_type": "ACCELERATOR_TYPE", "accelerator_count":ACCELERATOR_COUNT }, "replica_count":REPLICA_COUNT, "autoscaling_spec": { "min_replica_count":MIN_REPLICA_COUNT, "max_replica_count":MAX_REPLICA_COUNT }, "disk_spec": { "boot_disk_type": "BOOT_DISK_TYPE", "boot_disk_size_gb":BOOT_DISK_SIZE_GB } } ]}'@ | Out-File -FilePath request.json -Encoding utf8Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/persistentResources?persistent_resource_id=PERSISTENT_RESOURCE_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/123456789012/locations/us-central1/persistentResources/mypersistentresource/operations/1234567890123456789", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreatePersistentResourceOperationMetadata", "genericMetadata": { "createTime": "2023-02-08T21:17:15.009668Z", "updateTime": "2023-02-08T21:17:15.009668Z" } }}Resource stockout
There could be stockout for scarce resources like A100 GPUs, which can lead topersistent resource creation failure when no resource is available in the regionyou specified. In this case, you can try to reduce the number of replicas,change to different accelerator type, try again during non-peak hours, or tryanother region.
Shared Responsibility
Securing your workloads on Vertex AI is a shared responsibility. While Vertex AI regularly upgrades infrastructure configurations to address security vulnerabilities, Vertex AI doesn't automatically upgrade your existing Ray on Vertex AI clusters and persistent resources to avoid preempting running workloads. Therefore, you're responsible for tasks such as the following:
- Periodically delete and recreate your Ray on Vertex AI clusters and persistent resources to use the latest infrastructure versions. Vertex AI recommends recreating your clusters and persistent resources at least once every 30 days.
- Properly configure any custom images you use.
For more information, seeShared responsibility.
What's next
- Run training jobs on a persistent resource.
- Learn about persistent resource.
- Get information about a persistent resource.
- Reboot a persistent resource.
- Delete a persistent resource.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.