Query Private Services Access (VPC Network Peering) or Private Service Connect indexes

Once you've deployed a VPC Network Peering orPrivate Service Connect index endpoint, querying it differsslightly depending on how it was deployed:

Deployed with Private Service Connect automation

ForIndexEndpointsdeployed with Private Service Connect automation,the Python SDK will automatically map the Private Service Connectnetwork to the appropriate endpoint. If not using the Python SDK, you mustdirectly connect to the created IP address for your endpoint, following theinstructions forquerying a Private Service Connect manual deployment.

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

defvector_search_match_psc_automation(project:str,location:str,index_endpoint_name:str,deployed_index_id:str,queries:List[List[float]],num_neighbors:int,psc_network:str,)->List[List[aiplatform.matching_engine.matching_engine_index_endpoint.MatchNeighbor]]:"""Query the vector search index deployed with PSC automation.    Args:        project (str): Required. Project ID        location (str): Required. The region name        index_endpoint_name (str): Required. Index endpoint to run the query        against. The endpoint must be a private endpoint.        deployed_index_id (str): Required. The ID of the DeployedIndex to run        the queries against.        queries (List[List[float]]): Required. A list of queries. Each query is        a list of floats, representing a single embedding.        num_neighbors (int): Required. The number of neighbors to return.        ip_address (str): Required. The IP address of the PSC endpoint. Obtained        from the created compute address used in the fordwarding rule to the        endpoint's service attachment.        psc_network (str): The network the endpoint was deployed to via PSC        automation configuration. The format is        projects/{project_id}/global/networks/{network_name}.    Returns:        List[List[aiplatform.matching_engine.matching_engine_index_endpoint.MatchNeighbor]] - A list of nearest neighbors for each query.    """#InitializetheVertexAIclientaiplatform.init(project=project,location=location)#Createtheindexendpointinstancefromanexistingendpoint.my_index_endpoint=aiplatform.MatchingEngineIndexEndpoint(index_endpoint_name=index_endpoint_name)#Querytheindexendpointformatches.resp=my_index_endpoint.match(deployed_index_id=deployed_index_id,queries=queries,num_neighbors=num_neighbors,psc_network=psc_network)returnresp

Deployed with Private Service Connect manual configuration

For Private Service ConnectIndexEndpointsdeployed with a manually configured connection,your endpoint is accessed using the IP address of the compute address forwardedto your endpoint's Private Service Connect service attachment.

If not already known, you can obtain the IP address forwarded to the serviceattachment URI using thegcloud ai index-endpoints describeandgcloud compute forwarding-rules listcommands.

Make the following replacements:

  • INDEX_ENDPOINT_ID: Fully qualified index endpoint ID.
  • REGION: The region where your index endpoint is deployed.
SERVICE_ATTACHMENT_URI=`gcloudaiindex-endpointsdescribeINDEX_ENDPOINT_ID\--region=REGION\--format="value(deployedIndexes.privateEndpoints.serviceAttachment)"`gcloudcomputeforwarding-ruleslist--filter="TARGET:${SERVICE_ATTACHMENT_URI}"

The output will include the internal IP address to use when querying theIndexEndpoint.

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

defvector_search_match_psc_manual(project:str,location:str,index_endpoint_name:str,deployed_index_id:str,queries:List[List[float]],num_neighbors:int,ip_address:str,)->List[List[aiplatform.matching_engine.matching_engine_index_endpoint.MatchNeighbor]]:"""Query the vector search index deployed with PSC manual configuration.    Args:        project (str): Required. Project ID        location (str): Required. The region name        index_endpoint_name (str): Required. Index endpoint to run the query        against. The endpoint must be a private endpoint.        deployed_index_id (str): Required. The ID of the DeployedIndex to run        the queries against.        queries (List[List[float]]): Required. A list of queries. Each query is        a list of floats, representing a single embedding.        num_neighbors (int): Required. The number of neighbors to return.        ip_address (str): Required. The IP address of the PSC endpoint. Obtained        from the created compute address used in the forwarding rule to the        endpoint's service attachment.    Returns:        List[List[aiplatform.matching_engine.matching_engine_index_endpoint.MatchNeighbor]] - A list of nearest neighbors for each query.    """#InitializetheVertexAIclientaiplatform.init(project=project,location=location)#Createtheindexendpointinstancefromanexistingendpoint.my_index_endpoint=aiplatform.MatchingEngineIndexEndpoint(index_endpoint_name=index_endpoint_name)#SettheIPaddressofthePSCendpoint.my_index_endpoint.private_service_connect_ip_address=ip_address#Querytheindexendpointformatches.resp=my_index_endpoint.match(deployed_index_id=deployed_index_id,queries=queries,num_neighbors=num_neighbors)returnresp

Command-line

To query aDeployedIndex, connect to itsTARGET_IP at port10000 and call theMatch orBatchMatch method. Additionally, you can query using an specific embedding ID.

The following examples use the open source toolgrpc_cli to send gRPC requests to the deployed index server.

In the first example, you send a single query using theMatch method.

./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]'

In the second example, you combine two separate queries into the sameBatchMatch request.

./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.BatchMatch 'requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]}, {deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.2,..]}]}]'

You must make calls to these APIs from a client running in the sameVPC that the service was peered with.

To run a query using anembedding_id, use the following example.

./grpc_cli call ${TARGET_IP}:10000  google.cloud.aiplatform.container.v1.MatchService.Match "deployed_index_id:'"test_index1"',embedding_id: '"606431"'"

In this example, you send a query usingtoken and numeric restricts.

./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [1, 1], "sparse_embedding": {"values": [111.0,111.1,111.2], "dimensions": [10,20,30]}, numeric_restricts: [{name: "double-ns", value_double: 0.3, op: LESS_EQUAL}, {name: "double-ns", value_double: -1.2, op: GREATER}, {name: "double-ns", value_double: 0., op: NOT_EQUAL}], restricts: [{name: "color", allow_tokens: ["red"]}]'

To learn more, seeClient libraries explained.

Console

Use these instructions to query a VPC index from the console.

  1. In the Vertex AI section of the Google Cloud console, go to theDeploy and Use section. SelectVector Search

    Go to Vector Search

  2. Select the VPC index you want to query. TheIndex info page opens.
  3. Scroll down to theDeployed indexes section and select the deployed index you want to query. TheDeployed index info page opens.
  4. From theQuery index section, select your query parameters. You can choose to query by a vector, or a specific data point.
  5. Execute the query using the open source tool grpc_cli, or by using the Vertex AI SDK for Python.

Deployed with VPC Network Peering

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

Note: The Python SDK automatically looks up the IP address for anIndexEndpoint deployed with VPC Network Peering.

defvector_search_match_hybrid_queries(project:str,location:str,index_endpoint_name:str,deployed_index_id:str,num_neighbors:int,)->List[List[aiplatform.matching_engine.matching_engine_index_endpoint.MatchNeighbor]]:"""Query the vector search index.Args:project(str):Required.ProjectIDlocation(str):Required.Theregionnameindex_endpoint_name(str):Required.Indexendpointtorunthequeryagainst.Theendpointmustbeaprivateendpoint.deployed_index_id(str):Required.TheIDoftheDeployedIndextorunthequeriesagainst.num_neighbors(int):Required.Thenumberofneighborstoreturn.Returns:List[List[aiplatform.matching_engine.matching_engine_index_endpoint.MatchNeighbor]]-Alistofnearestneighborsforeachquery."""#InitializetheVertexAIclientaiplatform.init(project=project,location=location)#Createtheindexendpointinstancefromanexistingendpoint.my_index_endpoint=aiplatform.MatchingEngineIndexEndpoint(index_endpoint_name=index_endpoint_name)#Examplequeriescontaininghybriddatapoints,sparse-onlydatapoints,and#dense-onlydatapoints.hybrid_queries=[aiplatform.matching_engine.matching_engine_index_endpoint.HybridQuery(dense_embedding=[1,2,3],sparse_embedding_dimensions=[10,20,30],sparse_embedding_values=[1.0,1.0,1.0],rrf_ranking_alpha=0.5,),aiplatform.matching_engine.matching_engine_index_endpoint.HybridQuery(dense_embedding=[1,2,3],sparse_embedding_dimensions=[10,20,30],sparse_embedding_values=[0.1,0.2,0.3],),aiplatform.matching_engine.matching_engine_index_endpoint.HybridQuery(sparse_embedding_dimensions=[10,20,30],sparse_embedding_values=[0.1,0.2,0.3],),aiplatform.matching_engine.matching_engine_index_endpoint.HybridQuery(dense_embedding=[1,2,3]),]#Querytheindexendpointformatches.resp=my_index_endpoint.match(deployed_index_id=deployed_index_id,queries=hybrid_queries,num_neighbors=num_neighbors,)returnresp

Command-line

EachDeployedIndex has aTARGET_IP, which you can retrieve in yourlist ofIndexEndpoints.

To query aDeployedIndex, connect to itsTARGET_IP at port10000 and call theMatch orBatchMatch method. Additionally, you can query using an specific embedding ID.

The following examples use the open source toolgrpc_cli to send gRPC requests to the deployed index server.

In the first example, you send a single query using theMatch method.

./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]'

In the second example, you combine two separate queries into the sameBatchMatch request.

./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.BatchMatch 'requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", requests: [{deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.1,..]}, {deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [-0.2,..]}]}]'

You must make calls to these APIs from a client running in the sameVPC that the service was peered with.

To run a query using anembedding_id, use the following example.

./grpc_cli call ${TARGET_IP}:10000  google.cloud.aiplatform.container.v1.MatchService.Match "deployed_index_id:'"test_index1"',embedding_id: '"606431"'"

In this example, you send a query usingtoken and numeric restricts.

./grpc_cli call ${TARGET_IP}:10000 google.cloud.aiplatform.container.v1.MatchService.Match 'deployed_index_id: "${DEPLOYED_INDEX_ID}", float_val: [1, 1], "sparse_embedding": {"values": [111.0,111.1,111.2], "dimensions": [10,20,30]}, numeric_restricts: [{name: "double-ns", value_double: 0.3, op: LESS_EQUAL}, {name: "double-ns", value_double: -1.2, op: GREATER}, {name: "double-ns", value_double: 0., op: NOT_EQUAL}], restricts: [{name: "color", allow_tokens: ["red"]}]'

To learn more, seeClient libraries explained.

Console

Use these instructions to query a VPC index from the console.

  1. In the Vertex AI section of the Google Cloud console, go to theDeploy and Use section. SelectVector Search

    Go to Vector Search

  2. Select the VPC index you want to query. TheIndex info page opens.
  3. Scroll down to theDeployed indexes section and select the deployed index you want to query. TheDeployed index info page opens.
  4. From theQuery index section, select your query parameters. You can choose to query by a vector, or a specific data point.
  5. Execute the query using the open source tool grpc_cli, or by using the Vertex AI SDK for Python.

Query-time settings that impact performance

The following query-time parameters can affect latency, availability, andcost when using Vector Search. This guidance applies to most cases.However, always experiment with your configurations to make sure that they workfor your use case.

For parameter definitions, seeIndex configurationparameters.

ParameterAboutPerformance impact
approximateNeighborsCount

Tells the algorithm the number of approximate results to retrieve from each shard.

The value ofapproximateNeighborsCount should always be greater than the value ofsetNeighborsCount. If the value ofsetNeighborsCount is small, 10 times that value is recommended forapproximateNeighborsCount. For largersetNeighborsCount values, a smaller multiplier can be used.

The corresponding REST API name for this field isapproximate_neighbor_count.

Increasing the value ofapproximateNeighborsCount can affect performance in the following ways:

  • Recall: Increased
  • Latency: Potentially increased
  • Availability: No impact
  • Cost: Can increase because more data is processed during a search

Decreasing the value ofapproximateNeighborsCount can affect performance in the following ways:

  • Recall: Decreased
  • Latency: Potentially decreases
  • Availability: No impact
  • Cost: Can decrease cost because less data is processed during a search
setNeighborCount

Specifies the number of results that you want the query to return.

The corresponding REST API name for this field isneighbor_count.

Values less than or equal to 300 remain performant in most use cases. For larger values, test for your specific use case.

fractionLeafNodesToSearch Controls the percentage of leaf nodes to visit when searching for nearest neighbors. This is related to theleafNodeEmbeddingCount in that the more embeddings per leaf node, the more data examined per leaf.

The corresponding REST API name for this field isfraction_leaf_nodes_to_search_override.

Increasing the value offractionLeafNodesToSearch can affect performance in the following ways:

  • Recall: Increased
  • Latency: Increased
  • Availability: No impact
  • Cost: Can increase because higher latency occupies more machine resources

Decreasing the value offractionLeafNodesToSearch can affect performance in the following ways:

  • Recall: Decreased
  • Latency: Decreased
  • Availability: No impact
  • Cost: Can decrease because lower latency occupies fewer machine resources

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.