Get inferences from a image object detection model

This page shows you how to get online (real-time) inferences andbatch inferencesfrom your image object detection models using the Google Cloud console orthe Vertex AI API.

Difference between online and batch inferences

Online inferences are synchronous requests made to a model endpoint. Use onlineinferences when you are making requests in response to application input or insituations that require timely inference.

Batch inferences are asynchronous requests. You request batch inferencesdirectly from the model resource without needing to deploy the model to anendpoint. For image data, use batch inferences when you don't require animmediate response and want to process accumulated data by using a singlerequest.

Get online inferences

Deploy a model to an endpoint

You must deploy a model to an endpoint before that model can be used to serveonline inferences. Deploying a model associates physical resources with themodel so it can serve online inferences with low latency.

You can deploy more than one model to an endpoint, and you can deploy a model tomore than one endpoint. For more information about options and use cases fordeploying models, see About deploying models.

Use one of the following methods to deploy a model:

Google Cloud console

In the Google Cloud console, in the Vertex AI section, go totheModels page.
Go to the Models page
Click the name of the model you want to deploy to open its details page.
Select theDeploy & Test tab.
If your model is already deployed to any endpoints, they are listed in theDeploy your model section.
ClickDeploy to endpoint.
To deploy your model to a new endpoint, selectCreate new endpointand provide a name for the new endpoint. To deploy your model to an existingendpoint, selectAdd to existing endpointand select the endpoint from the drop-down list.
You can add more than one model to an endpoint, and you can add a modelto more than one endpoint.Learn more.
If you deploy your model to an existing endpoint that has one ormore models deployed to it, you must update theTraffic splitpercentage for the model you are deploying and the already deployed modelsso that all of the percentages add up to 100%.
SelectAutoML Image and configure as follows:
1. If you're deploying your model to a new endpoint, accept 100 for theTraffic split. Otherwise, adjust the traffic split values forall models on the endpoint so they add up to 100.
2. Enter theNumber of compute nodes you want to provide foryour model.
  This is the number of nodes available to this model at all times.You are charged for the nodes, even without inference traffic.See thepricing page.
3. Learn how tochange thedefault settings for inference logging.
4. Classification models only (optional): In theExplainability options section, selectEnable feature attributions for this model to enableVertex Explainable AI. Accept existingvisualization settings or choose new values and clickDone.
  Deploying AutoML image classification models with Vertex Explainable AI configured and performing inferences with explanations is optional. Enabling Vertex Explainable AI at deployment time incurs additional costs based on the deployed node count and deployment time. SeePricing for more information.
5. ClickDone for your model, and when all theTraffic splitpercentages are correct, clickContinue.
  The region where your model deploys is displayed. This must be the region where you created your model.
6. ClickDeploy to deploy your model to the endpoint.

API

When you deploy a model using the Vertex AI API, you complete thefollowing steps:

Create an endpoint if needed.
Get the endpoint ID.
Deploy the model to the endpoint.

Create an endpoint

If you are deploying a model to an existing endpoint, you can skip this step.

gcloud

The following example uses thegcloud ai endpoints createcommand:

gcloudaiendpointscreate\--region=LOCATION\--display-name=ENDPOINT_NAME

Replace the following:

LOCATION_ID: The region where you are using Vertex AI.
ENDPOINT_NAME: The display name for the endpoint.

The Google Cloud CLI tool might take a few seconds to create the endpoint.

REST

Before using any of the request data, make the following replacements:

LOCATION_ID: Your region.
PROJECT_ID: Yourproject ID.
ENDPOINT_NAME: The display name for the endpoint.

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

Request JSON body:

{  "display_name": "ENDPOINT_NAME"}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by running gcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by running gcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",  "metadata": {    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",    "genericMetadata": {      "createTime": "2020-11-05T17:45:42.812656Z",      "updateTime": "2020-11-05T17:45:42.812656Z"    }  }}

You can poll for the status of the operation untilthe response includes"done": true.

Java

Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.aiplatform.v1.CreateEndpointOperationMetadata;importcom.google.cloud.aiplatform.v1.Endpoint;importcom.google.cloud.aiplatform.v1.EndpointServiceClient;importcom.google.cloud.aiplatform.v1.EndpointServiceSettings;importcom.google.cloud.aiplatform.v1.LocationName;importjava.io.IOException;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;publicclassCreateEndpointSample{publicstaticvoidmain(String[]args)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringendpointDisplayName="YOUR_ENDPOINT_DISPLAY_NAME";createEndpointSample(project,endpointDisplayName);}staticvoidcreateEndpointSample(Stringproject,StringendpointDisplayName)throwsIOException,InterruptedException,ExecutionException,TimeoutException{EndpointServiceSettingsendpointServiceSettings=EndpointServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(EndpointServiceClientendpointServiceClient=EndpointServiceClient.create(endpointServiceSettings)){Stringlocation="us-central1";LocationNamelocationName=LocationName.of(project,location);Endpointendpoint=Endpoint.newBuilder().setDisplayName(endpointDisplayName).build();OperationFuture<Endpoint,CreateEndpointOperationMetadata>endpointFuture=endpointServiceClient.createEndpointAsync(locationName,endpoint);System.out.format("Operation name: %s\n",endpointFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");EndpointendpointResponse=endpointFuture.get(300,TimeUnit.SECONDS);System.out.println("Create Endpoint Response");System.out.format("Name: %s\n",endpointResponse.getName());System.out.format("Display Name: %s\n",endpointResponse.getDisplayName());System.out.format("Description: %s\n",endpointResponse.getDescription());System.out.format("Labels: %s\n",endpointResponse.getLabelsMap());System.out.format("Create Time: %s\n",endpointResponse.getCreateTime());System.out.format("Update Time: %s\n",endpointResponse.getUpdateTime());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME';// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';// Imports the Google Cloud Endpoint Service Client libraryconst{EndpointServiceClient}=require('@google-cloud/aiplatform');// Specifies the location of the api endpointconstclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstendpointServiceClient=newEndpointServiceClient(clientOptions);asyncfunctioncreateEndpoint(){// Configure the parent resourceconstparent=`projects/${project}/locations/${location}`;constendpoint={displayName:endpointDisplayName,};constrequest={parent,endpoint,};// Get and print out a list of all the endpoints for this resourceconst[response]=awaitendpointServiceClient.createEndpoint(request);console.log(`Long running operation :${response.name}`);// Wait for operation to completeawaitresponse.promise();constresult=response.result;console.log('Create endpoint response');console.log(`\tName :${result.name}`);console.log(`\tDisplay name :${result.displayName}`);console.log(`\tDescription :${result.description}`);console.log(`\tLabels :${JSON.stringify(result.labels)}`);console.log(`\tCreate time :${JSON.stringify(result.createTime)}`);console.log(`\tUpdate time :${JSON.stringify(result.updateTime)}`);}createEndpoint();

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

def create_endpoint_sample(    project: str,    display_name: str,    location: str,):    aiplatform.init(project=project, location=location)    endpoint = aiplatform.Endpoint.create(        display_name=display_name,        project=project,        location=location,    )    print(endpoint.display_name)    print(endpoint.resource_name)    return endpoint

Retrieve the endpoint ID

You need the endpoint ID to deploy the model.

gcloud

The following example uses thegcloud ai endpoints listcommand:

gcloudaiendpointslist\--region=LOCATION\--filter=display_name=ENDPOINT_NAME

Replace the following:

LOCATION_ID: The region where you are using Vertex AI.
ENDPOINT_NAME: The display name for the endpoint.

Note the number that appears in theENDPOINT_ID column. Use this ID in thefollowing step.

REST

Before using any of the request data, make the following replacements:

LOCATION_ID: The region where you are using Vertex AI.
PROJECT_ID: .
ENDPOINT_NAME: The display name for the endpoint.

HTTP method and URL:

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Execute the following command:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"

PowerShell (Windows)

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "endpoints": [    {      "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",      "displayName": "ENDPOINT_NAME",      "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",      "createTime": "2020-04-17T18:31:11.585169Z",      "updateTime": "2020-04-17T18:35:08.568959Z"    }  ]}

Note theENDPOINT_ID.

Deploy the model

Select the tab below for your language or environment:

gcloud

The following examples use thegcloud ai endpoints deploy-model command.

The following example deploys aModel to anEndpoint without splitting traffic between multipleDeployedModel resources:

Before using any of the command data below, make the following replacements:

ENDPOINT_ID: The ID for the endpoint.
LOCATION_ID: The region where you are using Vertex AI.
MODEL_ID: The ID for the model to be deployed.
DEPLOYED_MODEL_NAME: A name for theDeployedModel. You can use the display name of theModel for theDeployedModel as well.
MIN_REPLICA_COUNT: The minimum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to the maximum number of nodes and never fewer than this number of nodes.
MAX_REPLICA_COUNT: The maximum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to this number of nodes and never fewer than the minimum number of nodes. If you omit the--max-replica-count flag, then maximum number of nodes is set to the value of--min-replica-count.

Execute thegcloud ai endpoints deploy-model command:

Linux, macOS, or Cloud Shell

Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running either gcloud init; orgcloud auth login andgcloud config set project.

gcloudaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION_ID\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT\--max-replica-count=MAX_REPLICA_COUNT\--traffic-split=0=100

Windows (PowerShell)

Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running either gcloud init; orgcloud auth login andgcloud config set project.

gcloudaiendpointsdeploy-modelENDPOINT_ID`--region=LOCATION_ID`--model=MODEL_ID`--display-name=DEPLOYED_MODEL_NAME`--min-replica-count=MIN_REPLICA_COUNT`--max-replica-count=MAX_REPLICA_COUNT`--traffic-split=0=100

Windows (cmd.exe)

Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running either gcloud init; orgcloud auth login andgcloud config set project.

gcloudaiendpointsdeploy-modelENDPOINT_ID^--region=LOCATION_ID^--model=MODEL_ID^--display-name=DEPLOYED_MODEL_NAME^--min-replica-count=MIN_REPLICA_COUNT^--max-replica-count=MAX_REPLICA_COUNT^--traffic-split=0=100

Splitting traffic

The--traffic-split=0=100 flag in the preceding examples sends 100% of predictiontraffic that theEndpoint receives to the newDeployedModel, which isrepresented by the temporary ID0. If yourEndpoint already has otherDeployedModel resources, then you can split traffic between the newDeployedModel and the old ones.For example, to send 20% of traffic to the newDeployedModel and 80% to an older one,run the following command.

Before using any of the command data below, make the following replacements:

OLD_DEPLOYED_MODEL_ID: the ID of the existingDeployedModel.

Execute thegcloud ai endpoints deploy-model command:

Linux, macOS, or Cloud Shell

Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running either gcloud init; orgcloud auth login andgcloud config set project.

gcloudaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION_ID\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT\--max-replica-count=MAX_REPLICA_COUNT\--traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (PowerShell)

Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running either gcloud init; orgcloud auth login andgcloud config set project.

gcloudaiendpointsdeploy-modelENDPOINT_ID`--region=LOCATION_ID`--model=MODEL_ID`--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT`--max-replica-count=MAX_REPLICA_COUNT`--traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (cmd.exe)

Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running either gcloud init; orgcloud auth login andgcloud config set project.

gcloudaiendpointsdeploy-modelENDPOINT_ID^--region=LOCATION_ID^--model=MODEL_ID^--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT^--max-replica-count=MAX_REPLICA_COUNT^--traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

REST

Deploy the model.

Before using any of the request data, make the following replacements:

LOCATION_ID: The region where you are using Vertex AI.
PROJECT_ID: .
ENDPOINT_ID: The ID for the endpoint.
MODEL_ID: The ID for the model to be deployed.
DEPLOYED_MODEL_NAME: A name for theDeployedModel. You can use the display name of theModel for theDeployedModel as well.
MIN_REPLICA_COUNT: The minimum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to the maximum number of nodes and never fewer than this number of nodes.
MAX_REPLICA_COUNT: The maximum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to this number of nodes and never fewer than the minimum number of nodes.
TRAFFIC_SPLIT_THIS_MODEL: The percentage of the prediction traffic to this endpoint to be routed to the model being deployed with this operation. Defaults to 100. All traffic percentages must add up to 100.Learn more about traffic splits.
DEPLOYED_MODEL_ID_N: Optional. If other models are deployed to this endpoint, you must update their traffic split percentages so that all percentages add up to 100.
TRAFFIC_SPLIT_MODEL_N: The traffic split percentage value for the deployed model id key.
PROJECT_NUMBER: Your project's automatically generatedproject number

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

Request JSON body:

{  "deployedModel": {    "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",    "displayName": "DEPLOYED_MODEL_NAME",    "automaticResources": {       "minReplicaCount":MIN_REPLICA_COUNT,       "maxReplicaCount":MAX_REPLICA_COUNT     }  },  "trafficSplit": {    "0":TRAFFIC_SPLIT_THIS_MODEL,    "DEPLOYED_MODEL_ID_1":TRAFFIC_SPLIT_MODEL_1,    "DEPLOYED_MODEL_ID_2":TRAFFIC_SPLIT_MODEL_2  },}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell (Windows)

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",  "metadata": {    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",    "genericMetadata": {      "createTime": "2020-10-19T17:53:16.502088Z",      "updateTime": "2020-10-19T17:53:16.502088Z"    }  }}

Java

Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.api.gax.longrunning.OperationFuture;importcom.google.api.gax.longrunning.OperationTimedPollAlgorithm;importcom.google.api.gax.retrying.RetrySettings;importcom.google.cloud.aiplatform.v1.AutomaticResources;importcom.google.cloud.aiplatform.v1.DedicatedResources;importcom.google.cloud.aiplatform.v1.DeployModelOperationMetadata;importcom.google.cloud.aiplatform.v1.DeployModelResponse;importcom.google.cloud.aiplatform.v1.DeployedModel;importcom.google.cloud.aiplatform.v1.EndpointName;importcom.google.cloud.aiplatform.v1.EndpointServiceClient;importcom.google.cloud.aiplatform.v1.EndpointServiceSettings;importcom.google.cloud.aiplatform.v1.MachineSpec;importcom.google.cloud.aiplatform.v1.ModelName;importcom.google.cloud.aiplatform.v1.stub.EndpointServiceStubSettings;importjava.io.IOException;importjava.util.HashMap;importjava.util.Map;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;importorg.threeten.bp.Duration;publicclassDeployModelSample{publicstaticvoidmain(String[]args)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringdeployedModelDisplayName="YOUR_DEPLOYED_MODEL_DISPLAY_NAME";StringendpointId="YOUR_ENDPOINT_NAME";StringmodelId="YOUR_MODEL_ID";inttimeout=900;deployModelSample(project,deployedModelDisplayName,endpointId,modelId,timeout);}staticvoiddeployModelSample(Stringproject,StringdeployedModelDisplayName,StringendpointId,StringmodelId,inttimeout)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// Set long-running operations (LROs) timeoutfinalOperationTimedPollAlgorithmoperationTimedPollAlgorithm=OperationTimedPollAlgorithm.create(RetrySettings.newBuilder().setInitialRetryDelay(Duration.ofMillis(5000L)).setRetryDelayMultiplier(1.5).setMaxRetryDelay(Duration.ofMillis(45000L)).setInitialRpcTimeout(Duration.ZERO).setRpcTimeoutMultiplier(1.0).setMaxRpcTimeout(Duration.ZERO).setTotalTimeout(Duration.ofSeconds(timeout)).build());EndpointServiceStubSettings.BuilderendpointServiceStubSettingsBuilder=EndpointServiceStubSettings.newBuilder();endpointServiceStubSettingsBuilder.deployModelOperationSettings().setPollingAlgorithm(operationTimedPollAlgorithm);EndpointServiceStubSettingsendpointStubSettings=endpointServiceStubSettingsBuilder.build();EndpointServiceSettingsendpointServiceSettings=EndpointServiceSettings.create(endpointStubSettings);endpointServiceSettings=endpointServiceSettings.toBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(EndpointServiceClientendpointServiceClient=EndpointServiceClient.create(endpointServiceSettings)){Stringlocation="us-central1";EndpointNameendpointName=EndpointName.of(project,location,endpointId);// key '0' assigns traffic for the newly deployed model// Traffic percentage values must add up to 100// Leave dictionary empty if endpoint should not accept any trafficMap<String,Integer>trafficSplit=newHashMap<>();trafficSplit.put("0",100);ModelNamemodelName=ModelName.of(project,location,modelId);AutomaticResourcesautomaticResourcesInput=AutomaticResources.newBuilder().setMinReplicaCount(1).setMaxReplicaCount(1).build();DeployedModeldeployedModelInput=DeployedModel.newBuilder().setModel(modelName.toString()).setDisplayName(deployedModelDisplayName).setAutomaticResources(automaticResourcesInput).build();OperationFuture<DeployModelResponse,DeployModelOperationMetadata>deployModelResponseFuture=endpointServiceClient.deployModelAsync(endpointName,deployedModelInput,trafficSplit);System.out.format("Operation name: %s\n",deployModelResponseFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");DeployModelResponsedeployModelResponse=deployModelResponseFuture.get(20,TimeUnit.MINUTES);System.out.println("Deploy Model Response");DeployedModeldeployedModel=deployModelResponse.getDeployedModel();System.out.println("\tDeployed Model");System.out.format("\t\tid: %s\n",deployedModel.getId());System.out.format("\t\tmodel: %s\n",deployedModel.getModel());System.out.format("\t\tDisplay Name: %s\n",deployedModel.getDisplayName());System.out.format("\t\tCreate Time: %s\n",deployedModel.getCreateTime());DedicatedResourcesdedicatedResources=deployedModel.getDedicatedResources();System.out.println("\t\tDedicated Resources");System.out.format("\t\t\tMin Replica Count: %s\n",dedicatedResources.getMinReplicaCount());MachineSpecmachineSpec=dedicatedResources.getMachineSpec();System.out.println("\t\t\tMachine Spec");System.out.format("\t\t\t\tMachine Type: %s\n",machineSpec.getMachineType());System.out.format("\t\t\t\tAccelerator Type: %s\n",machineSpec.getAcceleratorType());System.out.format("\t\t\t\tAccelerator Count: %s\n",machineSpec.getAcceleratorCount());AutomaticResourcesautomaticResources=deployedModel.getAutomaticResources();System.out.println("\t\tAutomatic Resources");System.out.format("\t\t\tMin Replica Count: %s\n",automaticResources.getMinReplicaCount());System.out.format("\t\t\tMax Replica Count: %s\n",automaticResources.getMaxReplicaCount());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const modelId = "YOUR_MODEL_ID";// const endpointId = 'YOUR_ENDPOINT_ID';// const deployedModelDisplayName = 'YOUR_DEPLOYED_MODEL_DISPLAY_NAME';// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';constmodelName=`projects/${project}/locations/${location}/models/${modelId}`;constendpoint=`projects/${project}/locations/${location}/endpoints/${endpointId}`;// Imports the Google Cloud Endpoint Service Client libraryconst{EndpointServiceClient}=require('@google-cloud/aiplatform');// Specifies the location of the api endpoint:constclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstendpointServiceClient=newEndpointServiceClient(clientOptions);asyncfunctiondeployModel(){// Configure the parent resource// key '0' assigns traffic for the newly deployed model// Traffic percentage values must add up to 100// Leave dictionary empty if endpoint should not accept any trafficconsttrafficSplit={0:100};constdeployedModel={// format: 'projects/{project}/locations/{location}/models/{model}'model:modelName,displayName:deployedModelDisplayName,automaticResources:{minReplicaCount:1,maxReplicaCount:1},};constrequest={endpoint,deployedModel,trafficSplit,};// Get and print out a list of all the endpoints for this resourceconst[response]=awaitendpointServiceClient.deployModel(request);console.log(`Long running operation :${response.name}`);// Wait for operation to completeawaitresponse.promise();constresult=response.result;console.log('Deploy model response');constmodelDeployed=result.deployedModel;console.log('\tDeployed model');if(!modelDeployed){console.log('\t\tId : {}');console.log('\t\tModel : {}');console.log('\t\tDisplay name : {}');console.log('\t\tCreate time : {}');console.log('\t\tDedicated resources');console.log('\t\t\tMin replica count : {}');console.log('\t\t\tMachine spec {}');console.log('\t\t\t\tMachine type : {}');console.log('\t\t\t\tAccelerator type : {}');console.log('\t\t\t\tAccelerator count : {}');console.log('\t\tAutomatic resources');console.log('\t\t\tMin replica count : {}');console.log('\t\t\tMax replica count : {}');}else{console.log(`\t\tId :${modelDeployed.id}`);console.log(`\t\tModel :${modelDeployed.model}`);console.log(`\t\tDisplay name :${modelDeployed.displayName}`);console.log(`\t\tCreate time :${modelDeployed.createTime}`);constdedicatedResources=modelDeployed.dedicatedResources;console.log('\t\tDedicated resources');if(!dedicatedResources){console.log('\t\t\tMin replica count : {}');console.log('\t\t\tMachine spec {}');console.log('\t\t\t\tMachine type : {}');console.log('\t\t\t\tAccelerator type : {}');console.log('\t\t\t\tAccelerator count : {}');}else{console.log(`\t\t\tMin replica count : \${dedicatedResources.minReplicaCount}`);constmachineSpec=dedicatedResources.machineSpec;console.log('\t\t\tMachine spec');console.log(`\t\t\t\tMachine type :${machineSpec.machineType}`);console.log(`\t\t\t\tAccelerator type :${machineSpec.acceleratorType}`);console.log(`\t\t\t\tAccelerator count :${machineSpec.acceleratorCount}`);}constautomaticResources=modelDeployed.automaticResources;console.log('\t\tAutomatic resources');if(!automaticResources){console.log('\t\t\tMin replica count : {}');console.log('\t\t\tMax replica count : {}');}else{console.log(`\t\t\tMin replica count : \${automaticResources.minReplicaCount}`);console.log(`\t\t\tMax replica count : \${automaticResources.maxReplicaCount}`);}}}deployModel();

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

defdeploy_model_with_automatic_resources_sample(project,location,model_name:str,endpoint:Optional[aiplatform.Endpoint]=None,deployed_model_display_name:Optional[str]=None,traffic_percentage:Optional[int]=0,traffic_split:Optional[Dict[str, int]]=None,min_replica_count:int=1,max_replica_count:int=1,metadata:Optional[Sequence[Tuple[str, str]]]=(),sync:bool=True,):"""    model_name: A fully-qualified model resource name or model ID.          Example: "projects/123/locations/us-central1/models/456" or          "456" when project and location are initialized or passed.    """aiplatform.init(project=project,location=location)model=aiplatform.Model(model_name=model_name)model.deploy(endpoint=endpoint,deployed_model_display_name=deployed_model_display_name,traffic_percentage=traffic_percentage,traffic_split=traffic_split,min_replica_count=min_replica_count,max_replica_count=max_replica_count,metadata=metadata,sync=sync,)model.wait()print(model.display_name)print(model.resource_name)returnmodel

Learn how tochange thedefault settings for inference logging.

Get operation status

Some requests start long-running operations that require time to complete. Theserequests return an operation name, which you can use to view the operation'sstatus or cancel the operation. Vertex AI provides helper methodsto make calls against long-running operations. For more information, seeWorking with long-runningoperations.

Make an online inference using your deployed model

To make an online inference, submit one or more test items to a model foranalysis, and the model returns results that are based on your model'sobjective. For more information about inference results, see theInterpret results page.

Console

Use the Google Cloud console to request an online inference. Your model mustbe deployed to an endpoint.

In the Google Cloud console, in the Vertex AI section, go totheModels page.
Go to the Models page
From the list of models, click the name of the model to request inferencesfrom.
Select theDeploy & test tab.
Under theTest your model section, add test items to request aninference.
AutoML models for image objectives require you to upload an imageto request an inference.
For information about local feature importance, seeGet explanations.
After the inference is complete, Vertex AI returns the results inthe console.

API

Use the Vertex AI API to request an online inference. Your model mustbe deployed to an endpoint.

Image data type objectives include classification and object detection.

Edge model inference: When you use AutoML image Edge models forinference, you must convert any non-JPEG inference file to a JPEG filebefore you send the inference request.

gcloud

Create a file namedrequest.json with the following contents:
```
{"instances":[{"content":"CONTENT"}],"parameters":{"confidenceThreshold":THRESHOLD_VALUE,"maxPredictions":MAX_PREDICTIONS}}
```
Replace the following:
- CONTENT: Thebase64-encoded image content.
- THRESHOLD_VALUE Optional: The model returns only predictions thathave confidence scores with at least this value.
- MAX_PREDICTIONS Optional: The model returns up to this manypredictions with the highest confidence scores.
Run the following command:
```
gcloudaiendpointspredictENDPOINT_ID\--region=LOCATION_ID\--json-request=request.json
```
Replace the following:
- ENDPOINT_ID: The ID for the endpoint.
- LOCATION_ID: The region where you are using Vertex AI.

REST

Before using any of the request data, make the following replacements:

LOCATION_ID: Region where Endpoint is located. For example,us-central1.
PROJECT_ID: .
ENDPOINT_ID: The ID for the endpoint.
CONTENT: Thebase64-encoded image content.
THRESHOLD_VALUE Optional: The model returns only predictions thathave confidence scores with at least this value.
MAX_PREDICTIONS Optional: The model returns up to this manypredictions with the highest confidence scores.

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict

Request JSON body:

{  "instances": [{    "content": "CONTENT"  }],  "parameters": {    "confidenceThreshold":THRESHOLD_VALUE,    "maxPredictions":MAX_PREDICTIONS  }}

To send your request, choose one of these options:

curl

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict"

PowerShell

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "predictions": [    {      "confidences": [        0.975873291,        0.972160876,        0.879488528,        0.866532683,        0.686478078      ],      "displayNames": [        "Salad",        "Salad",        "Tomato",        "Tomato",        "Salad"      ],      "ids": [        "7517774415476555776",        "7517774415476555776",        "2906088397049167872",        "2906088397049167872",        "7517774415476555776"      ],      "bboxes": [        [          0.0869686604,          0.977020741,          0.395135701,          1        ],        [          0,          0.488701463,          0.00157663226,          0.512249        ],        [          0.361617863,          0.509664357,          0.772928834,          0.914706349        ],        [          0.310678929,          0.45781514,          0.565507233,          0.711237729        ],        [          0.584359646,          1,          0.00116168708,          0.130817384        ]      ]    }  ],  "deployedModelId": "3860570043075002368"}

Java

Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.cloud.aiplatform.util.ValueConverter;importcom.google.cloud.aiplatform.v1.EndpointName;importcom.google.cloud.aiplatform.v1.PredictResponse;importcom.google.cloud.aiplatform.v1.PredictionServiceClient;importcom.google.cloud.aiplatform.v1.PredictionServiceSettings;importcom.google.cloud.aiplatform.v1.schema.predict.instance.ImageObjectDetectionPredictionInstance;importcom.google.cloud.aiplatform.v1.schema.predict.params.ImageObjectDetectionPredictionParams;importcom.google.cloud.aiplatform.v1.schema.predict.prediction.ImageObjectDetectionPredictionResult;importcom.google.protobuf.Value;importjava.io.IOException;importjava.nio.charset.StandardCharsets;importjava.nio.file.Files;importjava.nio.file.Paths;importjava.util.ArrayList;importjava.util.Base64;importjava.util.List;publicclassPredictImageObjectDetectionSample{publicstaticvoidmain(String[]args)throwsIOException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringfileName="YOUR_IMAGE_FILE_PATH";StringendpointId="YOUR_ENDPOINT_ID";predictImageObjectDetection(project,fileName,endpointId);}staticvoidpredictImageObjectDetection(Stringproject,StringfileName,StringendpointId)throwsIOException{PredictionServiceSettingssettings=PredictionServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(PredictionServiceClientpredictionServiceClient=PredictionServiceClient.create(settings)){Stringlocation="us-central1";EndpointNameendpointName=EndpointName.of(project,location,endpointId);byte[]contents=Base64.getEncoder().encode(Files.readAllBytes(Paths.get(fileName)));Stringcontent=newString(contents,StandardCharsets.UTF_8);ImageObjectDetectionPredictionParamsparams=ImageObjectDetectionPredictionParams.newBuilder().setConfidenceThreshold((float)(0.5)).setMaxPredictions(5).build();ImageObjectDetectionPredictionInstanceinstance=ImageObjectDetectionPredictionInstance.newBuilder().setContent(content).build();List<Value>instances=newArrayList<>();instances.add(ValueConverter.toValue(instance));PredictResponsepredictResponse=predictionServiceClient.predict(endpointName,instances,ValueConverter.toValue(params));System.out.println("Predict Image Object Detection Response");System.out.format("\tDeployed Model Id: %s\n",predictResponse.getDeployedModelId());System.out.println("Predictions");for(Valueprediction:predictResponse.getPredictionsList()){ImageObjectDetectionPredictionResult.BuilderresultBuilder=ImageObjectDetectionPredictionResult.newBuilder();ImageObjectDetectionPredictionResultresult=(ImageObjectDetectionPredictionResult)ValueConverter.fromValue(resultBuilder,prediction);for(inti=0;i <result.getIdsCount();i++){System.out.printf("\tDisplay name: %s\n",result.getDisplayNames(i));System.out.printf("\tConfidences: %f\n",result.getConfidences(i));System.out.printf("\tIDs: %d\n",result.getIds(i));System.out.printf("\tBounding boxes: %s\n",result.getBboxes(i));}}}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const filename = "YOUR_PREDICTION_FILE_NAME";// const endpointId = "YOUR_ENDPOINT_ID";// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';constaiplatform=require('@google-cloud/aiplatform');const{instance,params,prediction}=aiplatform.protos.google.cloud.aiplatform.v1.schema.predict;// Imports the Google Cloud Prediction Service Client libraryconst{PredictionServiceClient}=aiplatform.v1;// Specifies the location of the api endpointconstclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstpredictionServiceClient=newPredictionServiceClient(clientOptions);asyncfunctionpredictImageObjectDetection(){// Configure the endpoint resourceconstendpoint=`projects/${project}/locations/${location}/endpoints/${endpointId}`;constparametersObj=newparams.ImageObjectDetectionPredictionParams({confidenceThreshold:0.5,maxPredictions:5,});constparameters=parametersObj.toValue();constfs=require('fs');constimage=fs.readFileSync(filename,'base64');constinstanceObj=newinstance.ImageObjectDetectionPredictionInstance({content:image,});constinstanceVal=instanceObj.toValue();constinstances=[instanceVal];constrequest={endpoint,instances,parameters,};// Predict requestconst[response]=awaitpredictionServiceClient.predict(request);console.log('Predict image object detection response');console.log(`\tDeployed model id :${response.deployedModelId}`);constpredictions=response.predictions;console.log('Predictions :');for(constpredictionResultValofpredictions){constpredictionResultObj=prediction.ImageObjectDetectionPredictionResult.fromValue(predictionResultVal);for(const[i,label]ofpredictionResultObj.displayNames.entries()){console.log(`\tDisplay name:${label}`);console.log(`\tConfidences:${predictionResultObj.confidences[i]}`);console.log(`\tIDs:${predictionResultObj.ids[i]}`);console.log(`\tBounding boxes:${predictionResultObj.bboxes[i]}\n\n`);}}}predictImageObjectDetection();

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

importbase64fromgoogle.cloudimportaiplatformfromgoogle.cloud.aiplatform.gapic.schemaimportpredictdefpredict_image_object_detection_sample(project:str,endpoint_id:str,filename:str,location:str="us-central1",api_endpoint:str="us-central1-aiplatform.googleapis.com",):# The AI Platform services require regional API endpoints.client_options={"api_endpoint":api_endpoint}# Initialize client that will be used to create and send requests.# This client only needs to be created once, and can be reused for multiple requests.client=aiplatform.gapic.PredictionServiceClient(client_options=client_options)withopen(filename,"rb")asf:file_content=f.read()# The format of each instance should conform to the deployed model's prediction input schema.encoded_content=base64.b64encode(file_content).decode("utf-8")instance=predict.instance.ImageObjectDetectionPredictionInstance(content=encoded_content,).to_value()instances=[instance]# See gs://google-cloud-aiplatform/schema/predict/params/image_object_detection_1.0.0.yaml for the format of the parameters.parameters=predict.params.ImageObjectDetectionPredictionParams(confidence_threshold=0.5,max_predictions=5,).to_value()endpoint=client.endpoint_path(project=project,location=location,endpoint=endpoint_id)response=client.predict(endpoint=endpoint,instances=instances,parameters=parameters)print("response")print(" deployed_model_id:",response.deployed_model_id)# See gs://google-cloud-aiplatform/schema/predict/prediction/image_object_detection_1.0.0.yaml for the format of the predictions.predictions=response.predictionsforpredictioninpredictions:print(" prediction:",dict(prediction))

Get batch inferences

To make a batch inference request, you specify an input sourceand an output format where Vertex AI stores inferenceresults. Batch inferences for the AutoML image model type require aninputJSON Lines file and the name of a Cloud Storage bucket to store the output.

Note: To minimize processing time when you use the Google Cloud console tocreate batch inferences, we recommend that you select input and outputlocations that are in the same region as your model. If you use the API tocreate batch inferences, send requests to a service endpoint (such ashttps://us-central1-aiplatform.googleapis.com) that is in the sameregion or geographically close to your input and output locations.

Input data requirements

The input for batch requests specifies the items to send to your model forinference. For image object detection models, you can use a JSON Lines file tospecify a list of images to make inferences about and then store the JSON Linesfile in a Cloud Storage bucket. The following sample shows a single line inan input JSON Lines file:

{"content": "gs://sourcebucket/datasets/images/source_image.jpg", "mimeType": "image/jpeg"}

Request a batch inference

For batch inference requests, you can use the Google Cloud console or theVertex AI API. Depending on the number of input items that you've submitted, abatch inference task can take some time to complete.

Google Cloud console

Use the Google Cloud console to request a batch inference.

In the Google Cloud console, in the Vertex AI section, go totheBatch predictions page.
Go to the Batch predictions page
ClickCreate to open theNew batch prediction window and completethe following steps:
1. Enter a name for the batch inference.
2. ForModel name, select the name of the model to use for thisbatch inference.
3. ForSource path, specify the Cloud Storage location where yourJSON Lines input file is located.
4. For theDestination path, specify a Cloud Storage locationwhere the batch inference results are stored. TheOutput format isdetermined by your model's objective. AutoML models for imageobjectives output JSON Lines files.

API

Use the Vertex AI API to send batch inference requests.

REST

Before using any of the request data, make the following replacements:

LOCATION_ID: Region where Model is stored and batch inference job is executed. For example,us-central1.
PROJECT_ID:
BATCH_JOB_NAME: Display name for the batch job
MODEL_ID: The ID for the model to use for making inferences
THRESHOLD_VALUE (optional): Vertex AI returns only inferences that have confidence scores with at least this value. The default is0.0.
MAX_PREDICTIONS (optional): Vertex AI returns up to this many inferences starting with the inferences that have highest confidence scores. The default is10.
URI: Cloud Storage URI where your input JSON Lines file is located.
BUCKET: Your Cloud Storage bucket
PROJECT_NUMBER: Your project's automatically generatedproject number

HTTP method and URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs

Request JSON body:

{    "displayName": "BATCH_JOB_NAME",    "model": "projects/PROJECT/locations/LOCATION/models/MODEL_ID",    "modelParameters": {      "confidenceThreshold":THRESHOLD_VALUE,      "maxPredictions":MAX_PREDICTIONS    },    "inputConfig": {        "instancesFormat": "jsonl",        "gcsSource": {            "uris": ["URI"],        },    },    "outputConfig": {        "predictionsFormat": "jsonl",        "gcsDestination": {            "outputUriPrefix": "OUTPUT_BUCKET",        },    },}

To send your request, choose one of these options:

curl

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"

PowerShell

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/batchPredictionJobs/BATCH_JOB_ID",  "displayName": "BATCH_JOB_NAME",  "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",  "inputConfig": {    "instancesFormat": "jsonl",    "gcsSource": {      "uris": [        "CONTENT"      ]    }  },  "outputConfig": {    "predictionsFormat": "jsonl",    "gcsDestination": {      "outputUriPrefix": "BUCKET"    }  },  "state": "JOB_STATE_PENDING",  "createTime": "2020-05-30T02:58:44.341643Z",  "updateTime": "2020-05-30T02:58:44.341643Z",  "modelDisplayName": "MODEL_NAME",  "modelObjective": "MODEL_OBJECTIVE"}

You can poll for the status of the batch job usingtheBATCH_JOB_ID until the jobstate isJOB_STATE_SUCCEEDED.

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

defcreate_batch_prediction_job_sample(project:str,location:str,model_resource_name:str,job_display_name:str,gcs_source:Union[str, Sequence[str]],gcs_destination:str,sync:bool=True,):aiplatform.init(project=project,location=location)my_model=aiplatform.Model(model_resource_name)batch_prediction_job=my_model.batch_predict(job_display_name=job_display_name,gcs_source=gcs_source,gcs_destination_prefix=gcs_destination,sync=sync,)batch_prediction_job.wait()print(batch_prediction_job.display_name)print(batch_prediction_job.resource_name)print(batch_prediction_job.state)returnbatch_prediction_job

Retrieve batch inference results

Vertex AI sends batch inference output to your specified destination.

When a batch inference task is complete, the output of the inference isstored in the Cloud Storage bucket that you specified in your request.

Example batch inference results

The following an example batch inference results from a image object detectionmodel.

Important: Bounding boxes are specified as:

"bboxes": [ [xMin, xMax, yMin, yMax], ...]

WherexMin andxMax are the minimum and maximum x values and yMin andyMax are the minimum and maximum y values respectively.

{  "instance": {"content": "gs://bucket/image.jpg", "mimeType": "image/jpeg"},  "prediction": {    "ids": [1, 2],    "displayNames": ["cat", "dog"],    "bboxes":  [      [0.1, 0.2, 0.3, 0.4],      [0.2, 0.3, 0.4, 0.5]    ],    "confidences": [0.7, 0.5]  }}

Evaluate model

Interpret results

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Get inferences from a image object detection model Stay organized with collections Save and categorize content based on your preferences.

Difference between online and batch inferences

Get online inferences

Deploy a model to an endpoint

Google Cloud console

API

Create an endpoint

gcloud

REST

curl (Linux, macOS, or Cloud Shell)

PowerShell (Windows)

Java

Node.js

Python

Retrieve the endpoint ID

gcloud

REST

curl (Linux, macOS, or Cloud Shell)

PowerShell (Windows)

Deploy the model

gcloud

Linux, macOS, or Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

Splitting traffic

Linux, macOS, or Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

REST

curl (Linux, macOS, or Cloud Shell)

PowerShell (Windows)

Java

Node.js

Python

Get operation status

Make an online inference using your deployed model

Console

API

gcloud

REST

curl

PowerShell

Java

Node.js

Python

Get batch inferences

Input data requirements

Request a batch inference

Google Cloud console

API

REST

curl

PowerShell

Python

Retrieve batch inference results

Example batch inference results

Get inferences from a image object detection model