REST Resource: projects.locations.endpoints

Resource: Endpoint

Models are deployed into it, and afterwards Endpoint is called to obtain predictions and explanations.

Fields
namestring

Output only. The resource name of the Endpoint.

displayNamestring

Required. The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters.

descriptionstring

The description of the Endpoint.

deployedModels[]object (DeployedModel)

Output only. The models deployed in this Endpoint. To add or remove DeployedModels useEndpointService.DeployModel andEndpointService.UndeployModel respectively.

trafficSplitmap (key: string, value: integer)

A map from a DeployedModel's id to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel.

If a DeployedModel's id is not listed in this map, then it receives no traffic.

The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.

etagstring

Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.

labelsmap (key: string, value: string)

The labels with user-defined metadata to organize your endpoints.

label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.

Seehttps://goo.gl/xmQnxf for more information and examples of labels.

createTimestring (Timestamp format)

Output only. timestamp when this Endpoint was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples:"2014-10-02T15:01:23Z","2014-10-02T15:01:23.045123456Z" or"2014-10-02T15:01:23+05:30".

updateTimestring (Timestamp format)

Output only. timestamp when this Endpoint was last updated.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples:"2014-10-02T15:01:23Z","2014-10-02T15:01:23.045123456Z" or"2014-10-02T15:01:23+05:30".

encryptionSpecobject (EncryptionSpec)

Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.

networkstring

Optional. The full name of the Google Compute Enginenetwork to which the Endpoint should be peered.

Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network.

Only one of the fields,network orenablePrivateServiceConnect, can be set.

Format:projects/{project}/global/networks/{network}. Where{project} is a project number, as in12345, and{network} is network name.

privateServiceConnectConfigobject (PrivateServiceConnectConfig)

Optional. Configuration for private service connect.

network andprivateServiceConnectConfig are mutually exclusive.

modelDeploymentMonitoringJobstring

Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled byJobService.CreateModelDeploymentMonitoringJob. Format:projects/{project}/locations/{location}/modelDeploymentMonitoringJobs/{modelDeploymentMonitoringJob}

predictRequestResponseLoggingConfigobject (PredictRequestResponseLoggingConfig)

Configures the request-response logging for online prediction.

dedicatedEndpointEnabledboolean

If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users' traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won't be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitation will be removed soon.

dedicatedEndpointDnsstring

Output only. DNS of the dedicated endpoint. Will only be populated if dedicatedEndpointEnabled is true. Depending on the features enabled, uid might be a random number or a string. For example, if fast_tryout is enabled, uid will be fasttryout. Format:https://{endpointId}.{region}-{uid}.prediction.vertexai.goog.

clientConnectionConfigobject (ClientConnectionConfig)

Configurations that are applied to the endpoint for online prediction.

satisfiesPzsboolean

Output only. reserved for future use.

satisfiesPziboolean

Output only. reserved for future use.

genAiAdvancedFeaturesConfigobject (GenAiAdvancedFeaturesConfig)

Optional. Configuration for GenAiAdvancedFeatures. If the endpoint is serving GenAI models, advanced features like native RAG integration can be configured. Currently, only Model Garden models are supported.

JSON representation
{"name":string,"displayName":string,"description":string,"deployedModels":[{object (DeployedModel)}],"trafficSplit":{string:integer,...},"etag":string,"labels":{string:string,...},"createTime":string,"updateTime":string,"encryptionSpec":{object (EncryptionSpec)},"network":string,"enablePrivateServiceConnect":boolean,"privateServiceConnectConfig":{object (PrivateServiceConnectConfig)},"modelDeploymentMonitoringJob":string,"predictRequestResponseLoggingConfig":{object (PredictRequestResponseLoggingConfig)},"dedicatedEndpointEnabled":boolean,"dedicatedEndpointDns":string,"clientConnectionConfig":{object (ClientConnectionConfig)},"satisfiesPzs":boolean,"satisfiesPzi":boolean,"genAiAdvancedFeaturesConfig":{object (GenAiAdvancedFeaturesConfig)}}

DeployedModel

A deployment of a Model. endpoints contain one or more DeployedModels.

Fields
idstring

Immutable. The id of the DeployedModel. If not provided upon deployment, Vertex AI will generate a value for this id.

This value should be 1-10 characters, and valid characters are/[0-9]/.

modelstring

The resource name of the Model that this is the deployment of. Note that the Model may be in a different location than the DeployedModel's Endpoint.

The resource name may contain version id or version alias to specify the version. Example:projects/{project}/locations/{location}/models/{model}@2 orprojects/{project}/locations/{location}/models/{model}@golden if no version is specified, the default version will be deployed.

gdcConnectedModelstring

GDC pretrained / Gemini model name. The model name is a plain model name, e.g. gemini-1.5-flash-002.

modelVersionIdstring

Output only. The version id of the model that is deployed.

displayNamestring

The display name of the DeployedModel. If not provided upon creation, the Model's displayName is used.

createTimestring (Timestamp format)

Output only. timestamp when the DeployedModel was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples:"2014-10-02T15:01:23Z","2014-10-02T15:01:23.045123456Z" or"2014-10-02T15:01:23+05:30".

explanationSpecobject (ExplanationSpec)

Explanation configuration for this DeployedModel.

When deploying a Model usingEndpointService.DeployModel, this value overrides the value ofModel.explanation_spec. All fields ofexplanationSpec are optional in the request. If a field ofexplanationSpec is not populated, the value of the same field ofModel.explanation_spec is inherited. If the correspondingModel.explanation_spec is not populated, all fields of theexplanationSpec will be used for the explanation configuration.

disableExplanationsboolean

If true, deploy the model without explainable feature, regardless the existence ofModel.explanation_spec orexplanationSpec.

serviceAccountstring

The service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project.

Users deploying the Model must have theiam.serviceAccounts.actAs permission on this service account.

disableContainerLoggingboolean

For custom-trained Models and AutoML Tabular Models, the container of the DeployedModel instances will sendstderr andstdout streams to Cloud Logging by default. Please note that the logs incur cost, which are subject toCloud Logging pricing.

user can disable container logging by setting this flag to true.

enableAccessLoggingboolean

If true, online prediction access logs are sent to Cloud Logging. These logs are like standard server access logs, containing information like timestamp and latency for each prediction request.

Note that logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option.

privateEndpointsobject (PrivateEndpoints)

Output only. Provide paths for users to send predict/explain/health requests directly to the deployed model services running on Cloud via private services access. This field is populated ifnetwork is configured.

fasterDeploymentConfigobject (FasterDeploymentConfig)

Configuration for faster model deployment.

statusobject (Status)

Output only. Runtime status of the deployed model.

systemLabelsmap (key: string, value: string)

System labels to apply to Model Garden deployments. System labels are managed by Google for internal use only.

checkpointIdstring

The checkpoint id of the model.

speculativeDecodingSpecobject (SpeculativeDecodingSpec)

Optional. Spec for configuring speculative decoding.

prediction_resourcesUnion type
The prediction (for example, the machine) resources that the DeployedModel uses. The user is billed for the resources (at least their minimal amount) even if the DeployedModel receives no traffic. Not all Models support all resources types. SeeModel.supported_deployment_resources_types. Required except for Large Model Deploy use cases.prediction_resources can be only one of the following:
dedicatedResourcesobject (DedicatedResources)

A description of resources that are dedicated to the DeployedModel, and that need a higher degree of manual configuration.

automaticResourcesobject (AutomaticResources)

A description of resources that to large degree are decided by Vertex AI, and require only a modest additional configuration.

sharedResourcesstring

The resource name of the shared DeploymentResourcePool to deploy on. Format:projects/{project}/locations/{location}/deploymentResourcePools/{deploymentResourcePool}

JSON representation
{"id":string,"model":string,"gdcConnectedModel":string,"modelVersionId":string,"displayName":string,"createTime":string,"explanationSpec":{object (ExplanationSpec)},"disableExplanations":boolean,"serviceAccount":string,"disableContainerLogging":boolean,"enableAccessLogging":boolean,"privateEndpoints":{object (PrivateEndpoints)},"fasterDeploymentConfig":{object (FasterDeploymentConfig)},"status":{object (Status)},"systemLabels":{string:string,...},"checkpointId":string,"speculativeDecodingSpec":{object (SpeculativeDecodingSpec)},// prediction_resources"dedicatedResources":{object (DedicatedResources)},"automaticResources":{object (AutomaticResources)},"sharedResources":string// Union type}

PrivateEndpoints

PrivateEndpoints proto is used to provide paths for users to send requests privately. To send request via private service access, use predictHttpUri, explainHttpUri or healthHttpUri. To send request via private service connect, use serviceAttachment.

Fields
predictHttpUristring

Output only. Http(s) path to send prediction requests.

explainHttpUristring

Output only. Http(s) path to send explain requests.

healthHttpUristring

Output only. Http(s) path to send health check requests.

serviceAttachmentstring

Output only. The name of the service attachment resource. Populated if private service connect is enabled.

JSON representation
{"predictHttpUri":string,"explainHttpUri":string,"healthHttpUri":string,"serviceAttachment":string}

FasterDeploymentConfig

Configuration for faster model deployment.

Fields
fastTryoutEnabledboolean

If true, enable fast tryout feature for this deployed model.

JSON representation
{"fastTryoutEnabled":boolean}

Status

Runtime status of the deployed model.

Fields
messagestring

Output only. The latest deployed model's status message (if any).

lastUpdateTimestring (Timestamp format)

Output only. The time at which the status was last updated.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples:"2014-10-02T15:01:23Z","2014-10-02T15:01:23.045123456Z" or"2014-10-02T15:01:23+05:30".

availableReplicaCountinteger

Output only. The number of available replicas of the deployed model.

JSON representation
{"message":string,"lastUpdateTime":string,"availableReplicaCount":integer}

SpeculativeDecodingSpec

Configuration for Speculative Decoding.

Fields
speculativeTokenCountinteger

The number of speculative tokens to generate at each step.

speculationUnion type
The type of speculation method to use.speculation can be only one of the following:
draftModelSpeculationobject (DraftModelSpeculation)

draft model speculation.

ngramSpeculationobject (NgramSpeculation)

N-Gram speculation.

JSON representation
{"speculativeTokenCount":integer,// speculation"draftModelSpeculation":{object (DraftModelSpeculation)},"ngramSpeculation":{object (NgramSpeculation)}// Union type}

DraftModelSpeculation

Draft model speculation works by using the smaller model to generate candidate tokens for speculative decoding.

Fields
draftModelstring

Required. The resource name of the draft model.

JSON representation
{"draftModel":string}

NgramSpeculation

N-Gram speculation works by trying to find matching tokens in the previous prompt sequence and use those as speculation for generating new tokens.

Fields
ngramSizeinteger

The number of last N input tokens used as ngram to search/match against the previous prompt sequence. This is equal to the N in N-Gram. The default value is 3 if not specified.

JSON representation
{"ngramSize":integer}

PredictRequestResponseLoggingConfig

Configuration for logging request-response to a BigQuery table.

Fields
enabledboolean

If logging is enabled or not.

samplingRatenumber

Percentage of requests to be logged, expressed as a fraction in range(0,1].

bigqueryDestinationobject (BigQueryDestination)

BigQuery table for logging. If only given a project, a new dataset will be created with namelogging_<endpoint-display-name>_<endpoint-id> where will be made BigQuery-dataset-name compatible (e.g. most special characters will become underscores). If no table name is given, a new table will be created with namerequest_response_logging

JSON representation
{"enabled":boolean,"samplingRate":number,"bigqueryDestination":{object (BigQueryDestination)}}

ClientConnectionConfig

Configurations (e.g. inference timeout) that are applied on your endpoints.

Fields
inferenceTimeoutstring (Duration format)

Customizable online prediction request timeout.

A duration in seconds with up to nine fractional digits, ending with 's'. Example:"3.5s".

JSON representation
{"inferenceTimeout":string}

GenAiAdvancedFeaturesConfig

Configuration for GenAiAdvancedFeatures.

Fields
ragConfigobject (RagConfig)

Configuration for Retrieval Augmented Generation feature.

JSON representation
{"ragConfig":{object (RagConfig)}}

RagConfig

Configuration for Retrieval Augmented Generation feature.

Fields
enableRagboolean

If true, enable Retrieval Augmented Generation in ChatCompletion request. Once enabled, the endpoint will be identified as GenAI endpoint and Arthedain router will be used.

JSON representation
{"enableRag":boolean}

Methods

create

Creates an Endpoint.

delete

Deletes an Endpoint.

deployModel

Deploys a Model into this Endpoint, creating a DeployedModel within it.

directPredict

Perform an unary online prediction request to a gRPC model server for Vertex first-party products and frameworks.

directRawPredict

Perform an unary online prediction request to a gRPC model server for custom containers.

explain

Perform an online explanation.

get

Gets an Endpoint.

list

Lists Endpoints in a Location.

mutateDeployedModel

Updates an existing deployed model.

patch

Updates an Endpoint.

predict

Perform an online prediction.

predictLongRunning

rawPredict

Perform an online prediction with an arbitrary HTTP payload.

serverStreamingPredict

Perform a server-side streaming online prediction request for Vertex LLM streaming.

streamRawPredict

Perform a streaming online prediction with an arbitrary HTTP payload.

undeployModel

Undeploys a Model from an Endpoint, removing a DeployedModel from it, and freeing all resources it's using.

update

Updates an Endpoint with a long running operation.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-17 UTC.