Deploy a model by using the Google Cloud console Stay organized with collections Save and categorize content based on your preferences.
In the Google Cloud console, you can create apublic endpointand deploy a model to it.
Models can be deployed from theOnline prediction page or the Model Registrypage.
Deploy a model from the Online prediction page
In the Online prediction page, you can create an endpoint and deployone or more models to it as follows:
In the Google Cloud console, in the Vertex AI section, goto theOnline prediction page.
ClickCreate.
In theNew endpoint pane:
Enter theEndpoint name.
SelectStandard for the access type.
To create a dedicated (not shared) public endpoint, select theEnable dedicated DNS checkbox.
ClickContinue.
In theModel settings pane:
Select your model from the drop-down list.
Choose the model version from the drop-down list.
Enter theTraffic split percentage for the model.
ClickDone.
Repeat these steps for any additional models to be deployed.
Deploy a model from the Model Registry page
In the Model Registry page, you can deploy a model to oneor more new or existing endpoints as follows:
In the Google Cloud console, in the Vertex AI section, goto theModels page.
Click the name and version ID of the model you want to deploy to openits details page.
Select theDeploy & Test tab.
If your model is already deployed to any endpoints, they are listed in theDeploy your model section.
ClickDeploy to endpoint.
To deploy your model to a new endpoint:
- SelectCreate new endpoint
- Provide a name for the new endpoint.
- To create a dedicated (not shared) public endpoint, select theEnable dedicated DNS checkbox.
- ClickContinue.
To deploy your model to an existing endpoint:
- SelectAdd to existing endpoint.
- Select the endpoint from the drop-down list.
- ClickContinue.
You can deploy multiple models to an endpoint, or you can deploy thesame model to multiple endpoints.
If you deploy your model to an existing endpoint that has one or moremodels deployed to it, you must update theTraffic split percentagefor the model you are deploying and the already deployed models so that allof the percentages add up to 100%.
If you're deploying your model to a new endpoint, accept 100 for theTraffic split. Otherwise, adjust the traffic split values forall models on the endpoint so they add up to 100.
Enter theMinimum number of compute nodes you want to provide foryour model.
This is the number of nodes that need to be available to the model at all times.
You are charged for the nodes used, whether to handle inference load or forstandby (minimum) nodes, even without inference traffic. See thepricing page.
The number of compute nodes can increase if needed to handle inferencetraffic, but it will never go higher than the maximum number of nodes.
To use autoscaling, enter theMaximum number of compute nodes youwant Vertex AI to scale up to.
Select yourMachine type.
Larger machine resources increase your inference performance andincrease costs.Compare the available machine types.
Select anAccelerator type and anAccelerator count.
If you enabled accelerator use when youimportedor created the model, this option displays.
For the accelerator count, refer to theGPUtable to check for valid numbersof GPUs that you can use with each CPU machine type. The acceleratorcount refers to the number of accelerators per node, not the totalnumber of accelerators in your deployment.
If you want to use acustom serviceaccount for the deployment, selecta service account in theService account drop-down box.
Learn how tochange thedefault settings for inference logging.
ClickDone for your model, and when all theTraffic splitpercentages are correct, clickContinue.
The region where your model deploys is displayed. This must be the region where you created your model.
ClickDeploy to deploy your model to the endpoint.
What's next
- Learn how toget an online inference.
- Learn how tochange thedefault settings for inference logging.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.