Use a rolling deployment to replace a deployed model

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

In arolling deployment, a deployed model is replaced with a new version ofthe same model. The new model reuses the compute resources from the previousone.

In the rolling deployment request, the traffic split anddedicatedResourcesvalues are the same as for the previous deployment. After the rolling deploymentcompletes, the traffic split is updated to show that all of the traffic from thepreviousDeployedModel has migrated to the new deployment.

Other configurable fields inDeployedModel (such asserviceAccount,disableContainerLogging, andenableAccessLogging) are set to the same valuesas for the previousDeployedModel by default. However, you can optionallyspecify new values for these fields.

When a model is deployed using a rolling deployment, a newDeployedModel iscreated. The newDeployedModel receives a new ID that is different from thatof the previous one. It also receives a newrevisionNumber value in therolloutOptions field.

If there are multiple rolling deployments targeting the same backing resources,theDeployedModel with the highestrevisionNumber is treated as theintended final state.

As the rolling deployment progresses, all the existing replicas for the previousDeployedModel are replaced with replicas of the newDeployedModel. Thishappens quickly, and replicas are updated whenever the deployment has enoughavailable replicas or enough surge capacity to bring up additional replicas.

Additionally, as the rolling deployment progresses, the traffic for the oldDeployedModel is gradually migrated to the newDeployedModel. The trafficis load-balanced in proportion to the number of ready-to-serve replicas of eachDeployedModel.

If the rolling deployment's new replicas never become ready because their healthroute consistently returns a non-200 response code, traffic isn't sentto those unready replicas. In this case, the rolling deployment eventuallyfails, and the replicas are reverted to the previousDeployedModel.

Start a rolling deployment

To start a rolling deployment, include therolloutOptions field in the modeldeployment request as shown in the following example.

REST

Before using any of the request data, make the following replacements:

  • LOCATION_ID: The region where you are using Vertex AI.
  • PROJECT_ID: Yourproject ID.
  • ENDPOINT_ID: The ID for the endpoint.
  • MODEL_ID: The ID for the model to be deployed.
  • PREVIOUS_DEPLOYED_MODEL: TheDeployedModel ID of a model on the same endpoint. Thisspecifies theDeployedModel whose backing resources are to be reused.You can callGetEndpoint to get a list of deployed models on an endpoint along with their numeric IDs.
  • MAX_UNAVAILABLE_REPLICAS: The number of model replicas that can be taken down during therolling deployment.
  • MAX_SURGE_REPLICAS: The number of additional model replicas that can be brought upduring the rolling deployment. If this is set to zero, then only the existing capacity is used.

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

Request JSON body:

{  "deployedModel": {    "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",    "rolloutOptions": {      "previousDeployedModel": "PREVIOUS_DEPLOYED_MODEL",      "maxUnavailableReplicas": "MAX_UNAVAILABLE_REPLICAS",      "maxSurgeReplicas": "MAX_SURGE_REPLICAS"    }  }}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell (Windows)

Save the request body in a file namedrequest.json, and execute the following command:

$headers = @{  }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

You should receive a successful status code (2xx) and an empty response.

If desired, you can replacemaxSurgeReplicas andmaxUnavailableReplicas,or both, with percentage values, as shown in the following example.

REST

Before using any of the request data, make the following replacements:

  • MAX_UNAVAILABLE_PERCENTAGE: The percentage of model replicas that can be taken downduring the rolling deployment.
  • MAX_SURGE_PERCENTAGE: The percentage of additional model replicas that can be brought upduring the rolling deployment. If this is set to zero, then only the existing capacity is used.

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

Request JSON body:

{  "deployedModel": {    "model": "projects/PROJECT/locations/LOCATION_ID/models/MODEL_ID",    "rolloutOptions": {      "previousDeployedModel": "PREVIOUS_DEPLOYED_MODEL",      "maxUnavailablePercentage": "MAX_UNAVAILABLE_PERCENTAGE",      "maxSurgePercentage": "MAX_SURGE_PERCENTAGE"    }  }}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell (Windows)

Save the request body in a file namedrequest.json, and execute the following command:

$headers = @{  }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

You should receive a successful status code (2xx) and an empty response.

Roll back a rolling deployment

To roll back a rolling deployment, start a new rolling deployment of theprevious model, using the ongoing rolling deployment'sDeployedModel ID as thepreviousDeployedModel.

To get theDeployedModel ID for an ongoing deployment, set the parameterallDeploymentStates=true in the call toGetEndpoint, as shown in thefollowing example.

REST

Before using any of the request data, make the following replacements:

  • LOCATION_ID: The region where you are using Vertex AI.
  • PROJECT_ID: Yourproject ID.
  • ENDPOINT_ID: The ID for the endpoint.

HTTP method and URL:

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID?allDeploymentStates=true

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID?allDeploymentStates=true"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID?allDeploymentStates=true" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID",  "displayName": "rolling-deployments-endpoint",  "deployedModels": [    {      "id": "2718281828459045",      "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID@1",      "displayName": "rd-test-model",      "createTime": "2024-09-11T21:37:48.522692Z",      "dedicatedResources": {        "machineSpec": {          "machineType": "e2-standard-2"        },        "minReplicaCount": 5,        "maxReplicaCount": 5      },      "modelVersionId": "1",      "state": "BEING_DEPLOYED"    }  ],  "etag": "AMEw9yMs3TdZMn8CUg-3DY3wS74bkIaTDQhqJ7-Ld_Zp7wgT8gsEfJlrCOyg67lr9dwn",  "createTime": "2024-09-11T21:22:36.588538Z",  "updateTime": "2024-09-11T21:27:28.563579Z",  "dedicatedEndpointEnabled": true,  "dedicatedEndpointDns": "ENDPOINT_ID.LOCATION_ID-PROJECT_ID.prediction.vertexai.goog"}

Constraints and limitations

  • The previousDeployedModel must be on the same endpoint as the newDeployedModel.
  • You can't create multiple rolling deployments with the samepreviousDeployedModel.
  • You can't create rolling deployments on top of aDeployedModel that isn'tfully deployed. Exception: IfpreviousDeployedModel is itself an in-progressrolling deployment, then a new rolling deployment can be created on top of it.This allows for rolling back deployments that start to fail.
  • Previous models don't automatically undeploy after a rolling deploymentcompletes successfully. You canundeploy the model manually.
  • For rolling deployments on shared public endpoints, thepredictRoute andhealthRoute for the new model must be the same as forthe previous model.
  • Rolling deployments aren't compatible withmodel cohosting.
  • Rolling deployments can't be used for models that requireonline explanations.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.