Create a public endpoint

To deploy a model by using the gcloud CLI or Vertex AI API, youfirst need to create apublic endpoint.

If you already have an existing public endpoint, you can skip this step andproceed toDeploy a model by using the gcloud CLI or Vertex AI API.

This document describes the process for creating a new public endpoint.

Create a dedicated public endpoint (recommended)

The default request timeout for a dedicated public endpoint is 10 minutes.In the Vertex AI API API and Vertex AI SDK for Python, you can optionallyspecify a different request timeout byadding aclientConnectionConfig object containing a newinferenceTimeoutvalue, as shown in the following example. The maximum timeout value is3600 seconds (1 hour).

Google Cloud console

In the Google Cloud console, in the Vertex AI section, go to theOnline prediction page.
Go to the Online prediction page
ClickCreate.
In theNew endpoint pane:

Enter theEndpoint name.
SelectStandard for the access type.
Select theEnable dedicated DNS checkbox.
ClickContinue.

ClickDone.

REST

Before using any of the request data, make the following replacements:

LOCATION_ID: Your region.
PROJECT_ID: Yourproject ID.
ENDPOINT_NAME: The display name for the endpoint.
INFERENCE_TIMEOUT_SECS: (Optional) Number of seconds in the optionalinferenceTimeout field.

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

Request JSON body:

{  "display_name": "ENDPOINT_NAME"  "dedicatedEndpointEnabled": true,  "clientConnectionConfig": {    "inferenceTimeout": {      "seconds":INFERENCE_TIMEOUT_SECS    }  }}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by running gcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by running gcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",  "metadata": {    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",    "genericMetadata": {      "createTime": "2020-11-05T17:45:42.812656Z",      "updateTime": "2020-11-05T17:45:42.812656Z"    }  }}

You can poll for the status of the operation until the response includes"done":true.

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Replace the following:

PROJECT_ID: Your project ID.
LOCATION_ID: The region where you are using Vertex AI.
ENDPOINT_NAME: The display name for the endpoint.
INFERENCE_TIMEOUT_SECS: (Optional) Number of seconds in the optionalinference_timeoutvalue.

fromgoogle.cloudimportaiplatformPROJECT_ID="PROJECT_ID"LOCATION="LOCATION_ID"ENDPOINT_NAME="ENDPOINT_NAME"INFERENCE_TIMEOUT_SECS="INFERENCE_TIMEOUT_SECS"aiplatform.init(project=PROJECT_ID,location=LOCATION,api_endpoint=ENDPOINT_NAME,)dedicated_endpoint=aiplatform.Endpoint.create(display_name=DISPLAY_NAME,dedicated_endpoint_enabled=True,sync=True,inference_timeout=INFERENCE_TIMEOUT_SECS,)

Inference timeout configuration

The default timeout duration for inference requests is 600 seconds (10 minutes).This timeout will be applied if an explicit inference timeout is not specifiedduring endpoint creation. The maximum permissible timeout value is one hour.

To configure the inference timeout during endpoint creation, use theinference_timeout parameter as demonstrated in the following code snippet:

timeout_endpoint=aiplatform.Endpoint.create(display_name="dedicated-endpoint-with-timeout",dedicated_endpoint_enabled=True,inference_timeout=1800,# Unit: Seconds)

Modifications to the inference timeout setting after endpoint creation can beperformed using theEndpointService.UpdateEndpointLongRunning method. TheEndpointService.UpdateEndpoint method does not support this modification.

Request-response Logging

The request-response logging feature captures API interactions. However, tocomply with BigQuery limitations, payloads exceeding 10MB in size willbe excluded from the logs.

To enable and configure request-response logging during endpoint creation,use the following parameters as illustrated in the subsequent code snippet:

logging_endpoint=aiplatform.Endpoint.create(display_name="dedicated-endpoint-with-logging",dedicated_endpoint_enabled=True,enable_request_response_logging=True,request_response_logging_sampling_rate=1.0,# Default: 0.0request_response_logging_bq_destination_table="bq://test_logging",# If not set, a new BigQuery table will be created with the name:# bq://{project_id}.logging_{endpoint_display_name}_{endpoint_id}.request_response_logging)

Modifications to the request-response logging settings after endpointcreation can be performed using theEndpointService.UpdateEndpointLongRunningmethod. TheEndpointService.UpdateEndpoint method does not support thismodification.

Create a shared public endpoint

Google Cloud console

In the Google Cloud console, in the Vertex AI section, go to theOnline prediction page.
Go to the Online prediction page
ClickCreate.
In theNew endpoint pane:

Enter theEndpoint name.
SelectStandard for the access type.
ClickContinue.

ClickDone.

gcloud

The following example uses thegcloud ai endpoints createcommand:

gcloudaiendpointscreate\--region=LOCATION_ID\--display-name=ENDPOINT_NAME

Replace the following:

LOCATION_ID: The region where you are using Vertex AI.
ENDPOINT_NAME: The display name for the endpoint.

The Google Cloud CLI tool might take a few seconds to create the endpoint.

REST

Before using any of the request data, make the following replacements:

LOCATION_ID: Your region.
PROJECT_ID: Yourproject ID.
ENDPOINT_NAME: The display name for the endpoint.

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

Request JSON body:

{  "display_name": "ENDPOINT_NAME"}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell (Windows)

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",  "metadata": {    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",    "genericMetadata": {      "createTime": "2020-11-05T17:45:42.812656Z",      "updateTime": "2020-11-05T17:45:42.812656Z"    }  }}

You can poll for the status of the operation until the response includes"done":true.

Terraform

The following sample uses thegoogle_vertex_ai_endpoint Terraform resource to create an endpoint.

To learn how to apply or remove a Terraform configuration, seeBasic Terraform commands.

# Endpoint name must be unique for the projectresource"random_id""endpoint_id"{byte_length=4}resource"google_vertex_ai_endpoint""default"{name=substr(random_id.endpoint_id.dec,0,10)display_name="sample-endpoint"description="A sample Vertex AI endpoint"location="us-central1"labels={label-one="value-one"}}

Java

Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.aiplatform.v1.CreateEndpointOperationMetadata;importcom.google.cloud.aiplatform.v1.Endpoint;importcom.google.cloud.aiplatform.v1.EndpointServiceClient;importcom.google.cloud.aiplatform.v1.EndpointServiceSettings;importcom.google.cloud.aiplatform.v1.LocationName;importjava.io.IOException;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;publicclassCreateEndpointSample{publicstaticvoidmain(String[]args)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringendpointDisplayName="YOUR_ENDPOINT_DISPLAY_NAME";createEndpointSample(project,endpointDisplayName);}staticvoidcreateEndpointSample(Stringproject,StringendpointDisplayName)throwsIOException,InterruptedException,ExecutionException,TimeoutException{EndpointServiceSettingsendpointServiceSettings=EndpointServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(EndpointServiceClientendpointServiceClient=EndpointServiceClient.create(endpointServiceSettings)){Stringlocation="us-central1";LocationNamelocationName=LocationName.of(project,location);Endpointendpoint=Endpoint.newBuilder().setDisplayName(endpointDisplayName).build();OperationFuture<Endpoint,CreateEndpointOperationMetadata>endpointFuture=endpointServiceClient.createEndpointAsync(locationName,endpoint);System.out.format("Operation name: %s\n",endpointFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");EndpointendpointResponse=endpointFuture.get(300,TimeUnit.SECONDS);System.out.println("Create Endpoint Response");System.out.format("Name: %s\n",endpointResponse.getName());System.out.format("Display Name: %s\n",endpointResponse.getDisplayName());System.out.format("Description: %s\n",endpointResponse.getDescription());System.out.format("Labels: %s\n",endpointResponse.getLabelsMap());System.out.format("Create Time: %s\n",endpointResponse.getCreateTime());System.out.format("Update Time: %s\n",endpointResponse.getUpdateTime());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME';// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';// Imports the Google Cloud Endpoint Service Client libraryconst{EndpointServiceClient}=require('@google-cloud/aiplatform');// Specifies the location of the api endpointconstclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstendpointServiceClient=newEndpointServiceClient(clientOptions);asyncfunctioncreateEndpoint(){// Configure the parent resourceconstparent=`projects/${project}/locations/${location}`;constendpoint={displayName:endpointDisplayName,};constrequest={parent,endpoint,};// Get and print out a list of all the endpoints for this resourceconst[response]=awaitendpointServiceClient.createEndpoint(request);console.log(`Long running operation :${response.name}`);// Wait for operation to completeawaitresponse.promise();constresult=response.result;console.log('Create endpoint response');console.log(`\tName :${result.name}`);console.log(`\tDisplay name :${result.displayName}`);console.log(`\tDescription :${result.description}`);console.log(`\tLabels :${JSON.stringify(result.labels)}`);console.log(`\tCreate time :${JSON.stringify(result.createTime)}`);console.log(`\tUpdate time :${JSON.stringify(result.updateTime)}`);}createEndpoint();

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

def create_endpoint_sample(    project: str,    display_name: str,    location: str,):    aiplatform.init(project=project, location=location)    endpoint = aiplatform.Endpoint.create(        display_name=display_name,        project=project,        location=location,    )    print(endpoint.display_name)    print(endpoint.resource_name)    return endpoint

What's next

Deploy a model by using thegcloud CLI or Vertex AI API.
Learn how toget an online inference.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換

Create a public endpoint Stay organized with collections Save and categorize content based on your preferences.

Create a dedicated public endpoint (recommended)

Google Cloud console

REST

curl (Linux, macOS, or Cloud Shell)

PowerShell (Windows)

Python

Inference timeout configuration

Request-response Logging

Create a shared public endpoint

Google Cloud console

gcloud

REST

curl (Linux, macOS, or Cloud Shell)

PowerShell (Windows)

Terraform

Java

Node.js

Python

What's next

Create a public endpoint