Use a rolling deployment to replace a deployed model Stay organized with collections Save and categorize content based on your preferences.
Preview
This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
In arolling deployment, a deployed model is replaced with a new version ofthe same model. The new model reuses the compute resources from the previousone.
In the rolling deployment request, the traffic split anddedicatedResourcesvalues are the same as for the previous deployment. After the rolling deploymentcompletes, the traffic split is updated to show that all of the traffic from thepreviousDeployedModel has migrated to the new deployment.
Other configurable fields inDeployedModel (such asserviceAccount,disableContainerLogging, andenableAccessLogging) are set to the same valuesas for the previousDeployedModel by default. However, you can optionallyspecify new values for these fields.
When a model is deployed using a rolling deployment, a newDeployedModel iscreated. The newDeployedModel receives a new ID that is different from thatof the previous one. It also receives a newrevisionNumber value in therolloutOptions field.
If there are multiple rolling deployments targeting the same backing resources,theDeployedModel with the highestrevisionNumber is treated as theintended final state.
As the rolling deployment progresses, all the existing replicas for the previousDeployedModel are replaced with replicas of the newDeployedModel. Thishappens quickly, and replicas are updated whenever the deployment has enoughavailable replicas or enough surge capacity to bring up additional replicas.
Additionally, as the rolling deployment progresses, the traffic for the oldDeployedModel is gradually migrated to the newDeployedModel. The trafficis load-balanced in proportion to the number of ready-to-serve replicas of eachDeployedModel.
If the rolling deployment's new replicas never become ready because their healthroute consistently returns a non-200 response code, traffic isn't sentto those unready replicas. In this case, the rolling deployment eventuallyfails, and the replicas are reverted to the previousDeployedModel.
Start a rolling deployment
To start a rolling deployment, include therolloutOptions field in the modeldeployment request as shown in the following example.
REST
Before using any of the request data, make the following replacements:
- LOCATION_ID: The region where you are using Vertex AI.
- PROJECT_ID: Yourproject ID.
- ENDPOINT_ID: The ID for the endpoint.
- MODEL_ID: The ID for the model to be deployed.
- PREVIOUS_DEPLOYED_MODEL: The
DeployedModelID of a model on the same endpoint. Thisspecifies theDeployedModelwhose backing resources are to be reused.You can callGetEndpointto get a list of deployed models on an endpoint along with their numeric IDs. - MAX_UNAVAILABLE_REPLICAS: The number of model replicas that can be taken down during therolling deployment.
- MAX_SURGE_REPLICAS: The number of additional model replicas that can be brought upduring the rolling deployment. If this is set to zero, then only the existing capacity is used.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel
Request JSON body:
{ "deployedModel": { "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID", "rolloutOptions": { "previousDeployedModel": "PREVIOUS_DEPLOYED_MODEL", "maxUnavailableReplicas": "MAX_UNAVAILABLE_REPLICAS", "maxSurgeReplicas": "MAX_SURGE_REPLICAS" } }}To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"
PowerShell (Windows)
Save the request body in a file namedrequest.json, and execute the following command:
$headers = @{ }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand ContentYou should receive a successful status code (2xx) and an empty response.
If desired, you can replacemaxSurgeReplicas andmaxUnavailableReplicas,or both, with percentage values, as shown in the following example.
REST
Before using any of the request data, make the following replacements:
- MAX_UNAVAILABLE_PERCENTAGE: The percentage of model replicas that can be taken downduring the rolling deployment.
- MAX_SURGE_PERCENTAGE: The percentage of additional model replicas that can be brought upduring the rolling deployment. If this is set to zero, then only the existing capacity is used.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel
Request JSON body:
{ "deployedModel": { "model": "projects/PROJECT/locations/LOCATION_ID/models/MODEL_ID", "rolloutOptions": { "previousDeployedModel": "PREVIOUS_DEPLOYED_MODEL", "maxUnavailablePercentage": "MAX_UNAVAILABLE_PERCENTAGE", "maxSurgePercentage": "MAX_SURGE_PERCENTAGE" } }}To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"
PowerShell (Windows)
Save the request body in a file namedrequest.json, and execute the following command:
$headers = @{ }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand ContentYou should receive a successful status code (2xx) and an empty response.
Roll back a rolling deployment
To roll back a rolling deployment, start a new rolling deployment of theprevious model, using the ongoing rolling deployment'sDeployedModel ID as thepreviousDeployedModel.
To get theDeployedModel ID for an ongoing deployment, set the parameterallDeploymentStates=true in the call toGetEndpoint, as shown in thefollowing example.
REST
Before using any of the request data, make the following replacements:
- LOCATION_ID: The region where you are using Vertex AI.
- PROJECT_ID: Yourproject ID.
- ENDPOINT_ID: The ID for the endpoint.
HTTP method and URL:
GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID?allDeploymentStates=true
To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID?allDeploymentStates=true"
PowerShell (Windows)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID?allDeploymentStates=true" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID", "displayName": "rolling-deployments-endpoint", "deployedModels": [ { "id": "2718281828459045", "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID@1", "displayName": "rd-test-model", "createTime": "2024-09-11T21:37:48.522692Z", "dedicatedResources": { "machineSpec": { "machineType": "e2-standard-2" }, "minReplicaCount": 5, "maxReplicaCount": 5 }, "modelVersionId": "1", "state": "BEING_DEPLOYED" } ], "etag": "AMEw9yMs3TdZMn8CUg-3DY3wS74bkIaTDQhqJ7-Ld_Zp7wgT8gsEfJlrCOyg67lr9dwn", "createTime": "2024-09-11T21:22:36.588538Z", "updateTime": "2024-09-11T21:27:28.563579Z", "dedicatedEndpointEnabled": true, "dedicatedEndpointDns": "ENDPOINT_ID.LOCATION_ID-PROJECT_ID.prediction.vertexai.goog"}Constraints and limitations
- The previous
DeployedModelmust be on the same endpoint as the newDeployedModel. - You can't create multiple rolling deployments with the same
previousDeployedModel. - You can't create rolling deployments on top of a
DeployedModelthat isn'tfully deployed. Exception: IfpreviousDeployedModelis itself an in-progressrolling deployment, then a new rolling deployment can be created on top of it.This allows for rolling back deployments that start to fail. - Previous models don't automatically undeploy after a rolling deploymentcompletes successfully. You canundeploy the model manually.
- For rolling deployments on shared public endpoints, the
predictRouteandhealthRoutefor the new model must be the same as forthe previous model. - Rolling deployments aren't compatible withmodel cohosting.
- Rolling deployments can't be used for models that requireonline explanations.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.