Use dedicated private endpoints based on Private Service Connect for online inference

The information in this page applies to custom-trained models andAutoML models. For Model Garden deployment, seeUse models in Model Garden.

Private Service Connect lets you deploy your custom-trainedVertex AI model and serve onlineinferences securely to multiple consumer projects and VPC networkswithout the need for public IP addresses, public internet access, or anexplicitly peered internal IP address range.

We recommend Private Service Connect for online inference usecases that have the following requirements:

  • Require private and secure connections
  • Require low latency
  • Don't need to be publicly accessible
Note: Tuned Gemini models can only be deployed to shared publicendpoints, not Private Service Connect endpoints.

Private Service Connect uses a forwarding rule in yourVPC network to send traffic unidirectionally to theVertex AI online inference service. The forwarding rule connects to aservice attachmentthat exposes the Vertex AI service to your VPC network.For more information, seeAbout accessing Vertex AI services throughPrivate Service Connect.To learn more about setting up Private Service Connect, see thePrivate Service Connect overview in theVirtual Private Cloud (VPC) documentation.

Dedicated private endpoints support both HTTP and gRPC communication protocols. For gRPCrequests, the x-vertex-ai-endpoint-id header must be included for properendpoint identification. The following APIs are supported:

  • Predict
  • RawPredict
  • StreamRawPredict
  • Chat Completion (Model Garden only)

You can send online inference requests to a dedicated private endpoint by usingthe Vertex AI SDK for Python. For details, seeGet online inferences.

Required roles

To get the permission that you need to create a Private Service Connect endpoint, ask your administrator to grant you theVertex AI User (roles/aiplatform.user) IAM role on your project. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the aiplatform.endpoints.create permission, which is required to create a Private Service Connect endpoint.

You might also be able to get this permission withcustom roles or otherpredefined roles.

For more information about Vertex AI roles and permissions, seeVertex AI access control with IAM andVertex AI IAM permissions.

Create the online inference endpoint

Use one of the following methods to create an online inference endpoint withPrivate Service Connect enabled.

The default request timeout for a Private Service Connectendpoint is 10 minutes.In the Vertex AI SDK for Python, you can optionally specify a different requesttimeout by specifying a newinference_timeoutvalue, as shown in the following example. The maximum timeout value is3600 seconds (1 hour).

Console

  1. In the Google Cloud console, in Vertex AI, go to theOnline prediction page.

    Go to Online prediction

  2. ClickCreate.

  3. Provide a display name for the endpoint.

  4. SelectPrivate.

  5. SelectPrivate Service Connect.

  6. ClickSelect project IDs.

  7. Select projects to add to the allowlist for the endpoint.

  8. ClickContinue.

  9. Choose your model specifications. For more information, seeDeploy a model to an endpoint.

  10. ClickCreate to create your endpoint and deploy your model to it.

  11. Make a note of the endpoint ID in the response.

API

REST

Before using any of the request data, make the following replacements:

  • VERTEX_AI_PROJECT_ID: the ID of the Google Cloud project where you're creating the online prediction endpoint.
  • REGION: the region where you're using Vertex AI.
  • VERTEX_AI_ENDPOINT_NAME: the display name for the online prediction endpoint.
  • ALLOWED_PROJECTS: a comma-separated list of Google Cloud project IDs, each enclosed in quotation marks, for example,["PROJECTID1", "PROJECTID2"]. If a project isn't contained in this list, you won't be able to send prediction requests to the Vertex AI endpoint from it. Make sure to includeVERTEX_AI_PROJECT_ID in this list so that you can call the endpoint from the same project it's in.
  • INFERENCE_TIMEOUT_SECS: (Optional) Number of seconds in the optionalinferenceTimeout field.

HTTP method and URL:

POST https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints

Request JSON body:

{  "displayName": "VERTEX_AI_ENDPOINT_NAME",  "privateServiceConnectConfig": {    "enablePrivateServiceConnect": true,    "projectAllowlist": ["ALLOWED_PROJECTS"],    "clientConnectionConfig": {      "inferenceTimeout": {        "seconds":INFERENCE_TIMEOUT_SECS      }    }  }}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/VERTEX_AI_PROJECT_NUMBER/locations/REGION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",  "metadata": {    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",    "genericMetadata": {      "createTime": "2020-11-05T17:45:42.812656Z",      "updateTime": "2020-11-05T17:45:42.812656Z"    }  }}
Make a note of theENDPOINT_ID.

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Replace the following:

  • VERTEX_AI_PROJECT_ID: the ID of the Google Cloud projectwhere you're creating the online inference endpoint
  • REGION: the region where you're using Vertex AI
  • VERTEX_AI_ENDPOINT_NAME: the display name for theonline inference endpoint
  • ALLOWED_PROJECTS: a comma-separated list of Google Cloudproject IDs, each enclosed in quotation marks. For example,["PROJECTID1", "PROJECTID2"].If a project isn't contained in this list,you won't be able to send inference requests to theVertex AI endpoint from it. Make sure to includeVERTEX_AI_PROJECT_ID in this list so that you can call theendpoint from the same project it's in.
  • INFERENCE_TIMEOUT_SECS: (Optional) Number of seconds in the optionalinference_timeoutvalue.
PROJECT_ID="VERTEX_AI_PROJECT_ID"REGION="REGION"VERTEX_AI_ENDPOINT_NAME="VERTEX_AI_ENDPOINT_NAME"INFERENCE_TIMEOUT_SECS="INFERENCE_TIMEOUT_SECS"fromgoogle.cloudimportaiplatformaiplatform.init(project=PROJECT_ID,location=REGION)# Create the forwarding rule in the consumer projectpsc_endpoint=aiplatform.PrivateEndpoint.create(display_name=VERTEX_AI_ENDPOINT_NAME,project=PROJECT_ID,location=REGION,private_service_connect_config=aiplatform.PrivateEndpoint.PrivateServiceConnectConfig(project_allowlist=["ALLOWED_PROJECTS"],),inference_timeout=INFERENCE_TIMEOUT_SECS,)

Make a note of theENDPOINT_ID at the end of the returnedendpoint URI:

INFO:google.cloud.aiplatform.models:TousethisPrivateEndpointinanothersession:INFO:google.cloud.aiplatform.models:endpoint=aiplatform.PrivateEndpoint('projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID')

Create the online inference endpoint with PSC automation (Preview)

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Online inference integrates withservice connectivity automation,which lets you configure inference endpoints with PSC automation. Thissimplifies the process by automatically creating PSC endpoints, and isparticularly beneficial for ML developers who lack permissions to createnetwork resources such as forwarding rules within a project.

To get started, your network administrator must establish aservice connectionpolicy. This policy is a one-timeconfiguration per project and network that letsVertex AI (service classgcp-vertexai) generate PSC endpointswithin your projects and networks.

Next, you can create endpoints using the PSC automation configuration and thendeploy your models. Once the deployment is complete, the relevant PSC endpointinformation is accessible within the endpoints.

Limitations

  • VPC Service Controls aren't supported.
  • A regional limit of 500 endpoints applies to PSC automation configurations.
  • PSC automation results are purged when no model is deployed or is in theprocess of being deployed to the endpoint. Upon cleanup and subsequent modeldeployment, new automation results feature distinct IP addresses andforwarding rules.

Create a service connection policy

You must be a network administrator to create theservice connection policy.A Service connection policy is required to let Vertex AIcreate PSC endpoints in your networks. Without a valid policy, the automationfails with aCONNECTION_POLICY_MISSING error.

  1. Create your service connection policy.

    • POLICY_NAME: A user-specified name for the policy.
    • PROJECT_ID: The ID of the service project where you arecreating Vertex AI resources.

    • VPC_PROJECT: The project ID where your client VPC is located.For single VPC setup, this is the same as$PROJECT. ForShared VPC setup, this is the VPC host project.

    • NETWORK_NAME: The name of the network to deploy to.

    • REGION: The network's region.

    • PSC_SUBNETS: The Private Service Connectsubnets to use.

    gcloudnetwork-connectivityservice-connection-policiescreatePOLICY_NAME\--project=VPC_PROJECT\--network=projects/PROJECT_ID/global/networks/NETWORK_NAME\--service-class=gcp-vertexai--region=REGION--subnets=PSC_SUBNETS
  2. View your service connection policy.

    gcloudnetwork-connectivityservice-connection-policieslist\--project=VPC_PROJECT-region=REGION

    For a single VPC setup, a sample looks like this:

    gcloudnetwork-connectivityservice-connection-policiescreatetest-policy\--network=default\--project=YOUR_PROJECT_ID\--region=us-central1\--service-class=gcp-vertexai\--subnets=default\--psc-connection-limit=500\--description=test

Create the online inference endpoint with PSC automation config

In thePSCAutomationConfig, check to be sure that theprojectId is inthe allowlist.

REST

Before using any of the request data, make the following replacements:

  • REGION: The region where you're using Vertex AI.
  • VERTEX_AI_PROJECT_ID: The ID of the Google Cloud project where you're creating the online inference endpoint.
  • VERTEX_AI_ENDPOINT_NAME: The display name for the online prediction endpoint.
  • NETWORK_NAME: the full resource name, including the project ID, instead of the project number.

HTTP method and URL:

POST https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints

Request JSON body:

{  {    displayName: "VERTEX_AI_ENDPOINT_NAME",    privateServiceConnectConfig: {      enablePrivateServiceConnect: true,      projectAllowlist: ["VERTEX_AI_PROJECT_ID"],      pscAutomationConfigs: [        { "project_id": "VERTEX_AI_PROJECT_ID", "network": "projects/VERTEX_AI_PROJECT_ID/global/networks/NETWORK_NAME" },      ],    },  },

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/VERTEX_AI_PROJECT_NUMBER/locations/REGION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",  "metadata": {    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",    "genericMetadata": {      "createTime": "2020-11-05T17:45:42.812656Z",      "updateTime": "2020-11-05T17:45:42.812656Z"    }  }}
Make a note of theENDPOINT_ID.

Python

Before trying this sample, follow thePython setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIPython API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Replace the following:

  • VERTEX_AI_PROJECT_ID: the ID of the Google Cloud projectwhere you're creating the online inference endpoint
  • REGION: the region where you're using Vertex AI
  • VERTEX_AI_ENDPOINT_NAME: the display name for theonline inference endpoint
  • NETWORK_NAME: the full resource name, including the projectID, instead of the project number.
PROJECT_ID="VERTEX_AI_PROJECT_ID"REGION="REGION"VERTEX_AI_ENDPOINT_NAME="VERTEX_AI_ENDPOINT_NAME"fromgoogle.cloudimportaiplatformaiplatform.init(project=PROJECT_ID,location=REGION)config=aiplatform.compat.types.service_networking.PrivateServiceConnectConfig(enable_private_service_connect=True,project_allowlist="VERTEX_AI_PROJECT_ID"psc_automation_configs=[aiplatform.compat.types.service_networking.PSCAutomationConfig(project_id="VERTEX_AI_PROJECT_ID"network=projects/"VERTEX_AI_PROJECT_ID"/global/networks/"NETWORK_NAME",)])psc_endpoint=aiplatform.PrivateEndpoint.create(display_name="VERTEX_AI_ENDPOINT_NAME"private_service_connect_config=config,)

Deploy the model

After you create your online inference endpoint withPrivate Service Connect enabled, deploy your model to it, followingthe steps outlined inDeploy a model to an endpoint.

Create PSC Endpoint Manually

Get the service attachment URI

When you deploy your model, a service attachment is created for the onlineinference endpoint. This service attachment represents theVertex AI online inference service that's being exposed to yourVPC network. Run thegcloud ai endpoints describe commandto get the service attachment URI.

  1. List only theserviceAttachment value from the endpoint details:

    gcloudaiendpointsdescribeENDPOINT_ID\--project=VERTEX_AI_PROJECT_ID\--region=REGION\|grep-iserviceAttachment

    Replace the following:

    • ENDPOINT_ID: the ID of your online inference endpoint
    • VERTEX_AI_PROJECT_ID: the ID of the Google Cloud project whereyou created your online inference endpoint
    • REGION: the region for this request

    The output is similar to the following:

    serviceAttachment: projects/ac74a9f84c2e5f2a1-tp/regions/us-central1/serviceAttachments/gkedpm-c6e6a854a634dc99472bb802f503c1
  2. Make a note of the entire string in theserviceAttachment field. This isthe service attachment URI.

Create a forwarding rule

You can reserve an internal IP address andcreate a forwarding rulewith that address. To create the forwarding rule, you need theservice attachment URI from the previous step.

  1. To reserve an internal IP address for the forwarding rule, use thegcloud compute addresses create command:

    gcloudcomputeaddressescreateADDRESS_NAME\--project=VPC_PROJECT_ID\--region=REGION\--subnet=SUBNETWORK\--addresses=INTERNAL_IP_ADDRESS

    Replace the following:

    • ADDRESS_NAME: a name for the internal IP address
    • VPC_PROJECT_ID: the ID of the Google Cloud project that hostsyour VPC network. If your online inference endpoint andyour Private Service Connect forwarding rule are hosted in the sameproject, useVERTEX_AI_PROJECT_ID for this parameter.
    • REGION: the Google Cloud region where thePrivate Service Connect forwarding rule is to be created
    • SUBNETWORK: the name of the VPC subnet thatcontains the IP address
    • INTERNAL_IP_ADDRESS: the internal IP address toreserve. This parameter is optional.

      • If this parameter is specified, the IP address must be within thesubnet's primary IP address range. The IP address can be anRFC 1918 addressor a subnet with non-RFC ranges.
      • If this parameter is omitted, an internal IP addressis allocated automatically.
      • For more information, seeReserve a new static internal IPv4 or IPv6 address.
  2. To verify that the IP address is reserved, use thegcloud compute addresses list command:

    gcloudcomputeaddresseslist--filter="name=(ADDRESS_NAME)"\--project=VPC_PROJECT_ID

    In the response, verify that aRESERVED status appears for the IP address.

  3. To create the forwarding rule and point it to theonline inference service attachment, use thegcloud compute forwarding-rules create command:

    gcloudcomputeforwarding-rulescreatePSC_FORWARDING_RULE_NAME\--address=ADDRESS_NAME\--project=VPC_PROJECT_ID\--region=REGION\--network=VPC_NETWORK_NAME\--target-service-attachment=SERVICE_ATTACHMENT_URI

    Replace the following:

    • PSC_FORWARDING_RULE_NAME: a name for the forwarding rule
    • VPC_NETWORK_NAME: the name of the VPC network wherethe endpoint is to be created
    • SERVICE_ATTACHMENT_URI: the service attachment that you made anote of earlier
  4. To verify that the service attachment accepts the endpoint, use thegcloud compute forwarding-rules describe command:

    gcloudcomputeforwarding-rulesdescribePSC_FORWARDING_RULE_NAME\--project=VPC_PROJECT_ID\--region=REGION

    In the response, verify that anACCEPTED status appears in thepscConnectionStatus field.

Get the internal IP address

If you didn't specify a value forINTERNAL_IP_ADDRESS when youcreated the forwarding rule, you can get the addressthat was allocated automatically by using thegcloud compute forwarding-rules describe command:

gcloudcomputeforwarding-rulesdescribePSC_FORWARDING_RULE_NAME\--project=VERTEX_AI_PROJECT_ID\--region=REGION\|grep-iIPAddress

Replace the following:

  • VERTEX_AI_PROJECT_ID: your project ID
  • REGION: the region name for this request

Optional: Get PSC endpoint from PSC automation result

You can get the generated IP address and forwarding rule from the inferenceendpoint. Here's an example:

"privateServiceConnectConfig":{"enablePrivateServiceConnect":true,"projectAllowlist":["your-project-id",],"pscAutomationConfigs":[{"projectId":"your-project-id","network":"projects/your-project-id/global/networks/default","ipAddress":"10.128.15.209","forwardingRule":"https://www.googleapis.com/compute/v1/projects/your-project-id/regions/us-central1/forwardingRules/sca-auto-fr-47b0d6a4-eaff-444b-95e6-e4dc1d10101e","state":"PSC_AUTOMATION_STATE_SUCCESSFUL"},]}

Here are some error handling details.

  • Automation failure doesn't affect the outcome of the model deployment.
  • The success or failure of the operation is indicated in the state.
    • If successful, the IP address and forwarding rule is displayed.
    • If unsuccessful, an error message is displayed.
  • Automation configurations are removed when no models are deployed or in theprocess of being deployed to the endpoint. This results in a change to the IPaddress and forwarding rule if a model is deployed later.
  • Failed automation won't recover. In case of failure, you can still createthe PSC endpoint manually, seeCreate PSC Endpoint Manually.

Get online inferences

Getting online inferences from an endpoint with Private Service Connectis similar togetting online inferencesfrom public endpoints, except for the following considerations:

  • The request must be sent from a project that was specified in theprojectAllowlist when the online inference endpoint was created.
  • Ifglobal accessisn't enabled, the request must be sent from the same region.
  • There are two ports open, 443 with TLS using self-signed certificate and 80without TLS. Both ports support HTTP and gRPC. All traffic remains on yourprivate network and doesn't traverse the public internet.
  • The best practice is to use HTTPS with the self-signed certificate obtainedfrom Vertex AI online inference or to deploy your own self-signedcertificate.
  • To obtain inferences, a connection must be established using the endpoint'sstatic IP address, unless a DNS record is created for the internal IP address.For example, send thepredict requests tothe following endpoint:

    http://INTERNAL_IP_ADDRESS/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict

    ReplaceINTERNAL_IP_ADDRESS with the internal IP address that youreserved earlier.

  • For gRPC requests:To ensure proper endpoint identification for gRPC requests, it is necessary toinclude the headerx-vertex-ai-endpoint-id. This is required as endpointinformation is not conveyed within the request path for gRPC communication.

Create a DNS record for the internal IP address

We recommend that you create a DNS record so that you can get online inferencesfrom your endpoint without needing to specify the internal IP address.

For more information, seeOther ways to configure DNS.

  1. Create a private DNS zone by using thegcloud dns managed-zones create command. This zone is associated with the VPC network thatthe forwarding rule was created in.

    DNS_NAME_SUFFIX="prediction.p.vertexai.goog."# DNS names have "." at the end.gclouddnsmanaged-zonescreateZONE_NAME\--project=VPC_PROJECT_ID\--dns-name=$DNS_NAME_SUFFIX\--networks=VPC_NETWORK_NAME\--visibility=private\--description="A DNS zone for Vertex AI endpoints using Private Service Connect."

    Replace the following:

    • ZONE_NAME: the name of the DNS zone
  2. To create aDNS record in the zone, use thegcloud dns record-sets create command:

    DNS_NAME=ENDPOINT_ID-REGION-VERTEX_AI_PROJECT_NUMBER.$DNS_NAME_SUFFIXgclouddnsrecord-setscreate$DNS_NAME\--rrdatas=INTERNAL_IP_ADDRESS\--zone=ZONE_NAME\--type=A\--ttl=60\--project=VPC_PROJECT_ID

    Replace the following:

    • VERTEX_AI_PROJECT_NUMBER: the project number for yourVERTEX_AI_PROJECT_ID project. You can locate thisproject number in the Google Cloud console. For more information, seeIdentifying projects.
    • INTERNAL_IP_ADDRESS: the internal IP address of youronline inference endpoint

    Now you can send yourpredict requests to:

    http://ENDPOINT_ID-REGION-VERTEX_AI_PROJECT_NUMBER.prediction.p.vertexai.goog/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict

Support for Transport Layer Security (TLS) certificates

Vertex AI online inference is secured using a self-signed certificate.Because this certificate is not signed by a trusted CA (certificate authority),clients attempting to establish an HTTPS connection must be explicitlyconfigured to trust it. The self-signed certificate obtained from theVertex AI online inference endpoint is valid for 10 years. Because thiscertificate is not unique to a specific endpoint, a single certificate may beused for all trust store integrations. Following are the high-level steps required toestablish an HTTPS connection to Vertex AI online inference:

  1. Configure DNS: The self-signed certificate includes the subjectalternative name (SAN)*.prediction.p.vertexai.goog. You must create a DNSrecord in your network that matches this format.

    For implementation details, seeCreate a DNS record for the internal IP address.

  2. Establish client trust: The client must download the self-signedcertificate and add it to its local trust store.

  3. Establish HTTPS connection: Establishing an HTTPS connection toVertex AI online inference requires using the fully qualified domainname (FQDN). This is necessary because the service uses a wildcard SSLcertificate valid for the domain*.prediction.p.vertexai.goog.

The following steps demonstrate how to download the Vertex AI onlineinference certificate and add it to the local trust store on Debian-based Linuxsystems such as Debian 11 and Ubuntu.

  1. Update OS packages and install OpenSSL:

    sudoaptupdate &&sudoaptinstallopenssl
  2. Run the following command from your home directory to download theVertex AI online inference certificate and save it to a file namedvertex_certificate.crt in your current directory. Make the followingreplacements:

    • ENDPOINT_ID: ID of the deployed model endpoint
    • REGION: the region where your endpoint resides
    • VERTEX_AI_PROJECT_NUMBER: the project number for your project.You can locate this project number in the Google Cloud console. For moreinformation, seeIdentifying projects.
    openssls_client-showcerts-connect\ENDPOINT_ID-REGION-VERTEX_AI_PROJECT_NUMBER.prediction.p.vertexai.goog:443</dev/null|\opensslx509-outformpem-outvertex_certificate.crt
  3. Move the certificate to the system trust store:

    sudomvvertex_certificate.crt/usr/local/share/ca-certificates
  4. Update the Certificate Manager's list of trusted CAs. You should seeoutput confirming that one certificate was added.

    sudoupdate-ca-certificates
  5. Send apredict request to the following URL, making these replacements:

    • INTERNAL_IP_ADDRESS: the internal IP address of your online inference endpoint
    • VERTEX_AI_PROJECT_ID: the project ID for your project
    https://ENDPOINT_ID-REGION-VERTEX_AI_PROJECT_NUMBER.prediction.p.vertexai.goog/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict

Custom certificate support for TLS

For organizations needing precise control over certificate management androtation for Vertex AI online inference endpoints, you can use acustomer-managed certificate with a regional Google CloudApplication Load Balancer (HTTPS).

This architecture works with the default Vertex AI certificate, givingyou direct control.

Following are the high-level deployment steps:

  1. Create a customer-managed certificate:

    • Generate a self-signed certificate (or use your CA) for a custom domain.
    • This domain must use the suffix.prediction.p.vertexai.goog to ensure itmatches the Vertex AI wildcard certificate, for example,my-endpoint.prediction.vertexai.goog.
  2. Deploy a regional Application Load Balancer (HTTPS):

  3. Configure DNS:

    • Create a DNS A record in your DNS zone.
    • This record must map your fully qualified custom domain (for example,my-endpoint.prediction.p.vertexai.goog) to the IP address of the regionalApplication Load Balancer.
  4. Update the local trust store:

    • For the client to authenticate the server, the Application Load Balancer'scertificate (or its issuing CA) must be imported into the local trust store.

Now you can send your predict requests to the fully qualified custom domain:

https://MY_ENDPOINT.prediction.p.vertexai.goog/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict

Examples of getting online inferences

The following sections provide examples of how you can send thepredictrequest using Python.

First example

psc_endpoint=aiplatform.PrivateEndpoint("projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID")REQUEST_FILE="PATH_TO_INPUT_FILE"importjsonimporturllib3urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)withopen(REQUEST_FILE)asjson_file:data=json.load(json_file)response=psc_endpoint.predict(instances=data["instances"],endpoint_override=INTERNAL_IP_ADDRESS)print(response)

ReplacePATH_TO_INPUT_FILE with a path to a JSON filecontaining the request input.

Second example

importjsonimportrequestsimporturllib3importgoogle.auth.transport.requestsurllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)REQUEST_FILE="PATH_TO_INPUT_FILE"# Programmatically get credentials and generate an access tokencreds,project=google.auth.default()auth_req=google.auth.transport.requests.Request()creds.refresh(auth_req)access_token=creds.token# Note: the credential lives for 1 hour by default# After expiration, it must be refreshed# See https://cloud.google.com/docs/authentication/token-types#access-tokens# for token lifetimes.withopen(REQUEST_FILE)asjson_file:data=json.load(json_file)url="https://INTERNAL_IP_ADDRESS/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict"headers={"Content-Type":"application/json","Authorization":f"Bearer{access_token}"# Add access token to headers}payload={"instances":data["instances"],}response=requests.post(url,headers=headers,json=payload,verify=False)print(response.json())

Third example

The following is an example of how you can send thepredictrequest to the DNS zone using Python:

REQUEST_FILE="PATH_TO_INPUT_FILE"importjsonimporturllib3urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)withopen(REQUEST_FILE)asjson_file:data=json.load(json_file)response=psc_endpoint.predict(instances=data["instances"],endpoint_override=DNS_NAME)print(response)

ReplaceDNS_NAME with the DNS name that you specified in thegcloud dns record-sets create command.

Best practices

When a new endpoint is deployed, the service attachment might be updated. Alwayscheck the service attachment and PSC endpoint status before making the inferencecall. Following are best practices for doing this:

  • If an endpoint has no active deployments, Vertex AI might deletethe service attachment and recreate it. Make sure the PSC endpoint is in aconnected state (by recreating the forwarding rule) when the serviceattachment is recreated.
  • When an endpoint has an active deployed model, the service attachment doesn'tchange. To keep the service attachment, create a traffic split and graduallymigrate traffic to the new model version, before undeploying the earlierversion.
  • Vertex AI allows up to 1,000 connections per service attachment.
  • The forwarding rule has a quota limit as well. For details, seeCloud Load Balancing quotas and limits.

Limitations

Vertex AI endpoints with Private Service Connect are subject tothe following limitations:

  • Deployment of tuned Gemini models isn't supported.
  • Private egress from within the endpoint isn't supported. BecausePrivate Service Connect forwarding rules are unidirectional,other privateGoogle Cloud workloads aren't accessible inside your container.
  • An endpoint'sprojectAllowlist value can't be changed.
  • Vertex Explainable AI isn't supported.
  • Before you delete an endpoint, you must undeploy your model from that endpoint.
  • If all models are undeployed for more than 10minutes, the service attachment might be deleted. Check thePrivate Service Connect connection status;if it'sCLOSED, recreate the forwarding rule.
  • After you've deleted your endpoint, you won't be able to reuse that endpoint name for up to 7 days.
  • A project can have up to 10 differentprojectAllowlist values in itsPrivate Service Connect configurations.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.