Use dedicated public endpoints for online inference Stay organized with collections Save and categorize content based on your preferences.
Adedicated public endpoint is a public endpoint for online inference. Itoffers the following benefits:
- Dedicated networking: When you send an inference request to a dedicatedpublic endpoint, it is isolated from other users' traffic.
- Optimized network latency
- Larger payload support: Up to 10 MB.
- Longer request timeouts: Configurable up to 1 hour.
- Generative AI-ready: Streaming and gRPC are supported. Inferencetimeout is configurable up to 1 hour.
For these reasons, dedicated public endpoints are recommended as a bestpractice for serving Vertex AI online inferences.
Note: Tuned Gemini models can only be deployed to shared publicendpoints.To learn more, seeChoose an endpoint type.
Create a dedicated public endpoint and deploy a model to it
You can create a dedicated endpoint and deploy a model to it by using theGoogle Cloud console. For details, seeDeploy a model by using the Google Cloud console.
You can also create a dedicated public endpoint and deploy a model to it byusing the Vertex AI API as follows:
- Create a dedicated public endpoint.Configuration of the inference timeout and request-response logging settingsis supported at the time of endpoint creation.
- Deploy the model by using the Vertex AI API.
Get online inferences from a dedicated public endpoint
Dedicated endpoints support both HTTP and gRPC communication protocols. For gRPCrequests, the x-vertex-ai-endpoint-id header must be included for properendpoint identification. The following APIs are supported:
- Predict
- RawPredict
- StreamRawPredict
- Chat Completion (Model Garden only)
You can send online inference requests to a dedicated public endpoint by usingthe Vertex AI SDK for Python. For details, seeSend an online inference request to a dedicated public endpoint.
Tutorial
To learn more, run the "Vertex AI Model Garden - Gemma (Deployment)" notebook in one of the following environments:
Open in Colab |Open in Colab Enterprise |Openin Vertex AI Workbench |View on GitHub
Limitations
- Deployment of tuned Gemini models isn't supported.
- VPC Service Controls isn't supported. Use a Private Service Connectendpoint instead.
What's next
- Learn about Vertex AI online inferenceendpoint types.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.
Open in Colab
Open in Colab Enterprise
Openin Vertex AI Workbench
View on GitHub