Deploy a model by using the gcloud CLI or Vertex AI API Stay organized with collections Save and categorize content based on your preferences.
To deploy a model to apublic endpointby using the gcloud CLI or Vertex AI API, you need to get theendpoint ID for an existing endpoint and then deploy the model to it.
Get the endpoint ID
You need the endpoint ID to deploy the model.
gcloud
The following example uses thegcloud ai endpoints listcommand:
gcloudaiendpointslist\--region=LOCATION_ID\--filter=display_name=ENDPOINT_NAMEReplace the following:
- LOCATION_ID: The region where you are using Vertex AI.
- ENDPOINT_NAME: The display name for the endpoint.
Note the number that appears in theENDPOINT_ID column. Use this ID in thefollowing step.
REST
Before using any of the request data, make the following replacements:
- LOCATION_ID: The region where you are using Vertex AI.
- PROJECT_ID: Yourproject ID.
- ENDPOINT_NAME: The display name for the endpoint.
HTTP method and URL:
GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME
To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"
PowerShell (Windows)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "endpoints": [ { "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID", "displayName": "ENDPOINT_NAME", "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx", "createTime": "2020-04-17T18:31:11.585169Z", "updateTime": "2020-04-17T18:35:08.568959Z" } ]}Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
Replace the following:
- PROJECT_ID: Your project ID.
- LOCATION_ID: The region where you are using Vertex AI.
- ENDPOINT_NAME: The display name for the endpoint.
fromgoogle.cloudimportaiplatformPROJECT_ID="PROJECT_ID"LOCATION="LOCATION_ID"ENDPOINT_NAME="ENDPOINT_NAME"aiplatform.init(project=PROJECT_ID,location=LOCATION,)endpoint=aiplatform.Endpoint.list(filter='display_name=ENDPOINT_NAME',)endpoint_id=endpoint.name.split("/")[-1]Deploy the model
When you deploy a model, you give the deployed model an ID to distinguish it from other models deployed to the endpoint.
Select the tab below for your language or environment:
gcloud
The following examples use thegcloud ai endpoints deploy-model command.
The following example deploys aModel to anEndpoint without using GPUs to accelerate prediction serving and without splitting traffic between multipleDeployedModel resources:
Before using any of the command data below, make the following replacements:
- ENDPOINT_ID: The ID for the endpoint.
- LOCATION_ID: The region where you are using Vertex AI.
- MODEL_ID: The ID for the model to be deployed.
- DEPLOYED_MODEL_NAME: A name for the
DeployedModel. You can use the display name of theModelfor theDeployedModelas well. - MIN_REPLICA_COUNT: The minimum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to the maximum number of nodes and never fewer than this number of nodes.
- MAX_REPLICA_COUNT: The maximum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to this number of nodes and never fewer than the minimum number of nodes. If you omit the
--max-replica-countflag, then maximum number of nodes is set to the value of--min-replica-count.
Execute thegcloud ai endpoints deploy-model command:
Linux, macOS, or Cloud Shell
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION_ID\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT\--max-replica-count=MAX_REPLICA_COUNT\--traffic-split=0=100
Windows (PowerShell)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID`--region=LOCATION_ID`--model=MODEL_ID`--display-name=DEPLOYED_MODEL_NAME`--min-replica-count=MIN_REPLICA_COUNT`--max-replica-count=MAX_REPLICA_COUNT`--traffic-split=0=100
Windows (cmd.exe)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID^--region=LOCATION_ID^--model=MODEL_ID^--display-name=DEPLOYED_MODEL_NAME^--min-replica-count=MIN_REPLICA_COUNT^--max-replica-count=MAX_REPLICA_COUNT^--traffic-split=0=100
Splitting traffic
The--traffic-split=0=100 flag in the preceding examples sends 100% of predictiontraffic that theEndpoint receives to the newDeployedModel, which isrepresented by the temporary ID0. If yourEndpoint already has otherDeployedModel resources, then you can split traffic between the newDeployedModel and the old ones.For example, to send 20% of traffic to the newDeployedModel and 80% to an older one,run the following command.
Before using any of the command data below, make the following replacements:
- OLD_DEPLOYED_MODEL_ID: the ID of the existing
DeployedModel.
Execute thegcloud ai endpoints deploy-model command:
Linux, macOS, or Cloud Shell
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION_ID\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT\--max-replica-count=MAX_REPLICA_COUNT\--traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
Windows (PowerShell)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID`--region=LOCATION_ID`--model=MODEL_ID`--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT`--max-replica-count=MAX_REPLICA_COUNT`--traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
Windows (cmd.exe)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID^--region=LOCATION_ID^--model=MODEL_ID^--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT^--max-replica-count=MAX_REPLICA_COUNT^--traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
REST
Deploy the model.
Before using any of the request data, make the following replacements:
- LOCATION_ID: The region where you are using Vertex AI.
- PROJECT_ID: Yourproject ID.
- ENDPOINT_ID: The ID for the endpoint.
- MODEL_ID: The ID for the model to be deployed.
- DEPLOYED_MODEL_NAME: A name for the
DeployedModel. You can use the display name of theModelfor theDeployedModelas well. - MACHINE_TYPE: Optional. The machine resources used for each node of thisdeployment. Its default setting is
n1-standard-2.Learn more about machine types. - ACCELERATOR_TYPE: The type of accelerator to be attached to the machine. Optional ifACCELERATOR_COUNT is not specified or is zero. Not recommended for AutoML models or custom-trained models that are using non-GPU images.Learn more.
- ACCELERATOR_COUNT: The number of accelerators for each replica to use. Optional. Should be zero or unspecified for AutoML models or custom-trained models that are using non-GPU images.
- MIN_REPLICA_COUNT: The minimum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to the maximum number of nodes and never fewer than this number of nodes. This value must be greater than or equal to 1.
- MAX_REPLICA_COUNT: The maximum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to this number of nodes and never fewer than the minimum number of nodes.
- REQUIRED_REPLICA_COUNT: Optional. The required number of nodes for this deployment to bemarked as successful. Must be greater than or equal to 1 and fewer than or equal to the minimum numberof nodes. If not specified, the default value is the minimum number of nodes.
- TRAFFIC_SPLIT_THIS_MODEL: The percentage of the prediction traffic to this endpoint to be routed to the model being deployed with this operation. Defaults to 100. All traffic percentages must add up to 100.Learn more about traffic splits.
- DEPLOYED_MODEL_ID_N: Optional. If other models are deployed to this endpoint, you must update their traffic split percentages so that all percentages add up to 100.
- TRAFFIC_SPLIT_MODEL_N: The traffic split percentage value for the deployed model id key.
- PROJECT_NUMBER: Your project's automatically generatedproject number
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel
Request JSON body:
{ "deployedModel": { "model": "projects/PROJECT/locations/us-central1/models/MODEL_ID", "displayName": "DEPLOYED_MODEL_NAME", "dedicatedResources": { "machineSpec": { "machineType": "MACHINE_TYPE", "acceleratorType": "ACCELERATOR_TYPE", "acceleratorCount": "ACCELERATOR_COUNT" }, "minReplicaCount":MIN_REPLICA_COUNT, "maxReplicaCount":MAX_REPLICA_COUNT, "requiredReplicaCount":REQUIRED_REPLICA_COUNT }, }, "trafficSplit": { "0":TRAFFIC_SPLIT_THIS_MODEL, "DEPLOYED_MODEL_ID_1":TRAFFIC_SPLIT_MODEL_1, "DEPLOYED_MODEL_ID_2":TRAFFIC_SPLIT_MODEL_2 },}To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"
PowerShell (Windows)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata", "genericMetadata": { "createTime": "2020-10-19T17:53:16.502088Z", "updateTime": "2020-10-19T17:53:16.502088Z" } }}Java
Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.aiplatform.v1.DedicatedResources;importcom.google.cloud.aiplatform.v1.DeployModelOperationMetadata;importcom.google.cloud.aiplatform.v1.DeployModelResponse;importcom.google.cloud.aiplatform.v1.DeployedModel;importcom.google.cloud.aiplatform.v1.EndpointName;importcom.google.cloud.aiplatform.v1.EndpointServiceClient;importcom.google.cloud.aiplatform.v1.EndpointServiceSettings;importcom.google.cloud.aiplatform.v1.MachineSpec;importcom.google.cloud.aiplatform.v1.ModelName;importjava.io.IOException;importjava.util.HashMap;importjava.util.Map;importjava.util.concurrent.ExecutionException;publicclassDeployModelCustomTrainedModelSample{publicstaticvoidmain(String[]args)throwsIOException,ExecutionException,InterruptedException{// TODO(developer): Replace these variables before running the sample.Stringproject="PROJECT";StringendpointId="ENDPOINT_ID";StringmodelName="MODEL_NAME";StringdeployedModelDisplayName="DEPLOYED_MODEL_DISPLAY_NAME";deployModelCustomTrainedModelSample(project,endpointId,modelName,deployedModelDisplayName);}staticvoiddeployModelCustomTrainedModelSample(Stringproject,StringendpointId,Stringmodel,StringdeployedModelDisplayName)throwsIOException,ExecutionException,InterruptedException{EndpointServiceSettingssettings=EndpointServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();Stringlocation="us-central1";// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(EndpointServiceClientclient=EndpointServiceClient.create(settings)){MachineSpecmachineSpec=MachineSpec.newBuilder().setMachineType("n1-standard-2").build();DedicatedResourcesdedicatedResources=DedicatedResources.newBuilder().setMinReplicaCount(1).setMachineSpec(machineSpec).build();StringmodelName=ModelName.of(project,location,model).toString();DeployedModeldeployedModel=DeployedModel.newBuilder().setModel(modelName).setDisplayName(deployedModelDisplayName)// `dedicated_resources` must be used for non-AutoML models.setDedicatedResources(dedicatedResources).build();// key '0' assigns traffic for the newly deployed model// Traffic percentage values must add up to 100// Leave dictionary empty if endpoint should not accept any trafficMap<String,Integer>trafficSplit=newHashMap<>();trafficSplit.put("0",100);EndpointNameendpoint=EndpointName.of(project,location,endpointId);OperationFuture<DeployModelResponse,DeployModelOperationMetadata>response=client.deployModelAsync(endpoint,deployedModel,trafficSplit);// You can use OperationFuture.getInitialFuture to get a future representing the initial// response to the request, which contains information while the operation is in progress.System.out.format("Operation name: %s\n",response.getInitialFuture().get().getName());// OperationFuture.get() will block until the operation is finished.DeployModelResponsedeployModelResponse=response.get();System.out.format("deployModelResponse: %s\n",deployModelResponse);}}}Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
defdeploy_model_with_dedicated_resources_sample(project,location,model_name:str,machine_type:str,endpoint:Optional[aiplatform.Endpoint]=None,deployed_model_display_name:Optional[str]=None,traffic_percentage:Optional[int]=0,traffic_split:Optional[Dict[str, int]]=None,min_replica_count:int=1,max_replica_count:int=1,accelerator_type:Optional[str]=None,accelerator_count:Optional[int]=None,explanation_metadata:Optional[explain.ExplanationMetadata]=None,explanation_parameters:Optional[explain.ExplanationParameters]=None,metadata:Optional[Sequence[Tuple[str, str]]]=(),sync:bool=True,):""" model_name: A fully-qualified model resource name or model ID. Example: "projects/123/locations/us-central1/models/456" or "456" when project and location are initialized or passed. """aiplatform.init(project=project,location=location)model=aiplatform.Model(model_name=model_name)#Theexplanation_metadataandexplanation_parametersshouldonlybe#providedforacustomtrainedmodelandnotanAutoMLmodel.model.deploy(endpoint=endpoint,deployed_model_display_name=deployed_model_display_name,traffic_percentage=traffic_percentage,traffic_split=traffic_split,machine_type=machine_type,min_replica_count=min_replica_count,max_replica_count=max_replica_count,accelerator_type=accelerator_type,accelerator_count=accelerator_count,explanation_metadata=explanation_metadata,explanation_parameters=explanation_parameters,metadata=metadata,sync=sync,)model.wait()print(model.display_name)print(model.resource_name)returnmodelNode.js
Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
constautoml=require('@google-cloud/automl');constclient=newautoml.v1beta1.AutoMlClient();/** * Demonstrates using the AutoML client to create a model. * TODO(developer): Uncomment the following lines before running the sample. */// const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project";// const computeRegion = '[REGION_NAME]' e.g., "us-central1";// const datasetId = '[DATASET_ID]' e.g., "TBL2246891593778855936";// const tableId = '[TABLE_ID]' e.g., "1991013247762825216";// const columnId = '[COLUMN_ID]' e.g., "773141392279994368";// const modelName = '[MODEL_NAME]' e.g., "testModel";// const trainBudget = '[TRAIN_BUDGET]' e.g., "1000",// `Train budget in milli node hours`;// A resource that represents Google Cloud Platform location.constprojectLocation=client.locationPath(projectId,computeRegion);// Get the full path of the column.constcolumnSpecId=client.columnSpecPath(projectId,computeRegion,datasetId,tableId,columnId);// Set target column to train the model.consttargetColumnSpec={name:columnSpecId};// Set tables model metadata.consttablesModelMetadata={targetColumnSpec:targetColumnSpec,trainBudgetMilliNodeHours:trainBudget,};// Set datasetId, model name and model metadata for the dataset.constmyModel={datasetId:datasetId,displayName:modelName,tablesModelMetadata:tablesModelMetadata,};// Create a model with the model metadata in the region.client.createModel({parent:projectLocation,model:myModel}).then(responses=>{constinitialApiResponse=responses[1];console.log(`Training operation name:${initialApiResponse.name}`);console.log('Training started...');}).catch(err=>{console.error(err);});Learn how tochange thedefault settings for inference logging.
Get operation status
Some requests start long-running operations that require time to complete. Theserequests return an operation name, which you can use to view the operation'sstatus or cancel the operation. Vertex AI provides helper methodsto make calls against long-running operations. For more information, seeWorking with long-runningoperations.
What's next
- Learn how toget an online inference.
- Learn aboutprivate endpoints.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.