Get inferences from a image object detection model Stay organized with collections Save and categorize content based on your preferences.
Difference between online and batch inferences
Online inferences are synchronous requests made to a model endpoint. Use onlineinferences when you are making requests in response to application input or insituations that require timely inference.
Batch inferences are asynchronous requests. You request batch inferencesdirectly from the model resource without needing to deploy the model to anendpoint. For image data, use batch inferences when you don't require animmediate response and want to process accumulated data by using a singlerequest.
Get online inferences
Deploy a model to an endpoint
You must deploy a model to an endpoint before that model can be used to serveonline inferences. Deploying a model associates physical resources with themodel so it can serve online inferences with low latency.
You can deploy more than one model to an endpoint, and you can deploy a model tomore than one endpoint. For more information about options and use cases fordeploying models, seeAbout deploying models.
Use one of the following methods to deploy a model:
Google Cloud console
In the Google Cloud console, in the Vertex AI section, go totheModels page.
Click the name of the model you want to deploy to open its details page.
Select theDeploy & Test tab.
If your model is already deployed to any endpoints, they are listed in theDeploy your model section.
ClickDeploy to endpoint.
To deploy your model to a new endpoint, selectCreate new endpointand provide a name for the new endpoint. To deploy your model to an existingendpoint, selectAdd to existing endpointand select the endpoint from the drop-down list.
You can add more than one model to an endpoint, and you can add a modelto more than one endpoint.Learn more.
If you deploy your model to an existing endpoint that has one ormore models deployed to it, you must update theTraffic splitpercentage for the model you are deploying and the already deployed modelsso that all of the percentages add up to 100%.
SelectAutoML Image and configure as follows:
If you're deploying your model to a new endpoint, accept 100 for theTraffic split. Otherwise, adjust the traffic split values forall models on the endpoint so they add up to 100.
Enter theNumber of compute nodes you want to provide foryour model.
This is the number of nodes available to this model at all times.You are charged for the nodes, even without inference traffic.See thepricing page.
Learn how tochange thedefault settings for inference logging.
Classification models only (optional): In theExplainability options section, selectEnable feature attributions for this model to enableVertex Explainable AI. Accept existingvisualization settings or choose new values and clickDone.
Deploying AutoML image classification models with Vertex Explainable AI configured and performing inferences with explanations is optional. Enabling Vertex Explainable AI at deployment time incurs additional costs based on the deployed node count and deployment time. SeePricing for more information.
ClickDone for your model, and when all theTraffic splitpercentages are correct, clickContinue.
The region where your model deploys is displayed. This must be the region where you created your model.
ClickDeploy to deploy your model to the endpoint.
API
When you deploy a model using the Vertex AI API, you complete thefollowing steps:
- Create an endpoint if needed.
- Get the endpoint ID.
- Deploy the model to the endpoint.
Create an endpoint
If you are deploying a model to an existing endpoint, you can skip this step.
gcloud
The following example uses thegcloud ai endpoints createcommand:
gcloudaiendpointscreate\--region=LOCATION\--display-name=ENDPOINT_NAMEReplace the following:
- LOCATION_ID: The region where you are using Vertex AI.
- ENDPOINT_NAME: The display name for the endpoint.
The Google Cloud CLI tool might take a few seconds to create the endpoint.
REST
Before using any of the request data, make the following replacements:
- LOCATION_ID: Your region.
- PROJECT_ID: Yourproject ID.
- ENDPOINT_NAME: The display name for the endpoint.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints
Request JSON body:
{ "display_name": "ENDPOINT_NAME"}To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"
PowerShell (Windows)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata", "genericMetadata": { "createTime": "2020-11-05T17:45:42.812656Z", "updateTime": "2020-11-05T17:45:42.812656Z" } }}"done": true.Java
Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.aiplatform.v1.CreateEndpointOperationMetadata;importcom.google.cloud.aiplatform.v1.Endpoint;importcom.google.cloud.aiplatform.v1.EndpointServiceClient;importcom.google.cloud.aiplatform.v1.EndpointServiceSettings;importcom.google.cloud.aiplatform.v1.LocationName;importjava.io.IOException;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;publicclassCreateEndpointSample{publicstaticvoidmain(String[]args)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringendpointDisplayName="YOUR_ENDPOINT_DISPLAY_NAME";createEndpointSample(project,endpointDisplayName);}staticvoidcreateEndpointSample(Stringproject,StringendpointDisplayName)throwsIOException,InterruptedException,ExecutionException,TimeoutException{EndpointServiceSettingsendpointServiceSettings=EndpointServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(EndpointServiceClientendpointServiceClient=EndpointServiceClient.create(endpointServiceSettings)){Stringlocation="us-central1";LocationNamelocationName=LocationName.of(project,location);Endpointendpoint=Endpoint.newBuilder().setDisplayName(endpointDisplayName).build();OperationFuture<Endpoint,CreateEndpointOperationMetadata>endpointFuture=endpointServiceClient.createEndpointAsync(locationName,endpoint);System.out.format("Operation name: %s\n",endpointFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");EndpointendpointResponse=endpointFuture.get(300,TimeUnit.SECONDS);System.out.println("Create Endpoint Response");System.out.format("Name: %s\n",endpointResponse.getName());System.out.format("Display Name: %s\n",endpointResponse.getDisplayName());System.out.format("Description: %s\n",endpointResponse.getDescription());System.out.format("Labels: %s\n",endpointResponse.getLabelsMap());System.out.format("Create Time: %s\n",endpointResponse.getCreateTime());System.out.format("Update Time: %s\n",endpointResponse.getUpdateTime());}}}Node.js
Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME';// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';// Imports the Google Cloud Endpoint Service Client libraryconst{EndpointServiceClient}=require('@google-cloud/aiplatform');// Specifies the location of the api endpointconstclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstendpointServiceClient=newEndpointServiceClient(clientOptions);asyncfunctioncreateEndpoint(){// Configure the parent resourceconstparent=`projects/${project}/locations/${location}`;constendpoint={displayName:endpointDisplayName,};constrequest={parent,endpoint,};// Get and print out a list of all the endpoints for this resourceconst[response]=awaitendpointServiceClient.createEndpoint(request);console.log(`Long running operation :${response.name}`);// Wait for operation to completeawaitresponse.promise();constresult=response.result;console.log('Create endpoint response');console.log(`\tName :${result.name}`);console.log(`\tDisplay name :${result.displayName}`);console.log(`\tDescription :${result.description}`);console.log(`\tLabels :${JSON.stringify(result.labels)}`);console.log(`\tCreate time :${JSON.stringify(result.createTime)}`);console.log(`\tUpdate time :${JSON.stringify(result.updateTime)}`);}createEndpoint();Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
def create_endpoint_sample( project: str, display_name: str, location: str,): aiplatform.init(project=project, location=location) endpoint = aiplatform.Endpoint.create( display_name=display_name, project=project, location=location, ) print(endpoint.display_name) print(endpoint.resource_name) return endpointRetrieve the endpoint ID
You need the endpoint ID to deploy the model.
gcloud
The following example uses thegcloud ai endpoints listcommand:
gcloudaiendpointslist\--region=LOCATION\--filter=display_name=ENDPOINT_NAMEReplace the following:
- LOCATION_ID: The region where you are using Vertex AI.
- ENDPOINT_NAME: The display name for the endpoint.
Note the number that appears in theENDPOINT_ID column. Use this ID in thefollowing step.
REST
Before using any of the request data, make the following replacements:
- LOCATION_ID: The region where you are using Vertex AI.
- PROJECT_ID: .
- ENDPOINT_NAME: The display name for the endpoint.
HTTP method and URL:
GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME
To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"
PowerShell (Windows)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "endpoints": [ { "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID", "displayName": "ENDPOINT_NAME", "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx", "createTime": "2020-04-17T18:31:11.585169Z", "updateTime": "2020-04-17T18:35:08.568959Z" } ]}Deploy the model
Select the tab below for your language or environment:
gcloud
The following examples use thegcloud ai endpoints deploy-model command.
The following example deploys aModel to anEndpoint without splitting traffic between multipleDeployedModel resources:
Before using any of the command data below, make the following replacements:
- ENDPOINT_ID: The ID for the endpoint.
- LOCATION_ID: The region where you are using Vertex AI.
- MODEL_ID: The ID for the model to be deployed.
- DEPLOYED_MODEL_NAME: A name for the
DeployedModel. You can use the display name of theModelfor theDeployedModelas well. - MIN_REPLICA_COUNT: The minimum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to the maximum number of nodes and never fewer than this number of nodes.
- MAX_REPLICA_COUNT: The maximum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to this number of nodes and never fewer than the minimum number of nodes. If you omit the
--max-replica-countflag, then maximum number of nodes is set to the value of--min-replica-count.
Execute thegcloud ai endpoints deploy-model command:
Linux, macOS, or Cloud Shell
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION_ID\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT\--max-replica-count=MAX_REPLICA_COUNT\--traffic-split=0=100
Windows (PowerShell)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID`--region=LOCATION_ID`--model=MODEL_ID`--display-name=DEPLOYED_MODEL_NAME`--min-replica-count=MIN_REPLICA_COUNT`--max-replica-count=MAX_REPLICA_COUNT`--traffic-split=0=100
Windows (cmd.exe)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID^--region=LOCATION_ID^--model=MODEL_ID^--display-name=DEPLOYED_MODEL_NAME^--min-replica-count=MIN_REPLICA_COUNT^--max-replica-count=MAX_REPLICA_COUNT^--traffic-split=0=100
Splitting traffic
The--traffic-split=0=100 flag in the preceding examples sends 100% of predictiontraffic that theEndpoint receives to the newDeployedModel, which isrepresented by the temporary ID0. If yourEndpoint already has otherDeployedModel resources, then you can split traffic between the newDeployedModel and the old ones.For example, to send 20% of traffic to the newDeployedModel and 80% to an older one,run the following command.
Before using any of the command data below, make the following replacements:
- OLD_DEPLOYED_MODEL_ID: the ID of the existing
DeployedModel.
Execute thegcloud ai endpoints deploy-model command:
Linux, macOS, or Cloud Shell
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID\--region=LOCATION_ID\--model=MODEL_ID\--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT\--max-replica-count=MAX_REPLICA_COUNT\--traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
Windows (PowerShell)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID`--region=LOCATION_ID`--model=MODEL_ID`--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT`--max-replica-count=MAX_REPLICA_COUNT`--traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
Windows (cmd.exe)
Note: Ensure you have initialized the Google Cloud CLI with authentication and a project by running eithergcloud init; orgcloud auth login andgcloud config set project.gcloudaiendpointsdeploy-modelENDPOINT_ID^--region=LOCATION_ID^--model=MODEL_ID^--display-name=DEPLOYED_MODEL_NAME\--min-replica-count=MIN_REPLICA_COUNT^--max-replica-count=MAX_REPLICA_COUNT^--traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80
REST
Deploy the model.
Before using any of the request data, make the following replacements:
- LOCATION_ID: The region where you are using Vertex AI.
- PROJECT_ID: .
- ENDPOINT_ID: The ID for the endpoint.
- MODEL_ID: The ID for the model to be deployed.
- DEPLOYED_MODEL_NAME: A name for the
DeployedModel. You can use the display name of theModelfor theDeployedModelas well. - MIN_REPLICA_COUNT: The minimum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to the maximum number of nodes and never fewer than this number of nodes.
- MAX_REPLICA_COUNT: The maximum number of nodes for this deployment.The node count can be increased or decreased as required by the inference load,up to this number of nodes and never fewer than the minimum number of nodes.
- TRAFFIC_SPLIT_THIS_MODEL: The percentage of the prediction traffic to this endpoint to be routed to the model being deployed with this operation. Defaults to 100. All traffic percentages must add up to 100.Learn more about traffic splits.
- DEPLOYED_MODEL_ID_N: Optional. If other models are deployed to this endpoint, you must update their traffic split percentages so that all percentages add up to 100.
- TRAFFIC_SPLIT_MODEL_N: The traffic split percentage value for the deployed model id key.
- PROJECT_NUMBER: Your project's automatically generatedproject number
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel
Request JSON body:
{ "deployedModel": { "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID", "displayName": "DEPLOYED_MODEL_NAME", "automaticResources": { "minReplicaCount":MIN_REPLICA_COUNT, "maxReplicaCount":MAX_REPLICA_COUNT } }, "trafficSplit": { "0":TRAFFIC_SPLIT_THIS_MODEL, "DEPLOYED_MODEL_ID_1":TRAFFIC_SPLIT_MODEL_1, "DEPLOYED_MODEL_ID_2":TRAFFIC_SPLIT_MODEL_2 },}To send your request, expand one of these options:
curl (Linux, macOS, or Cloud Shell)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"
PowerShell (Windows)
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata", "genericMetadata": { "createTime": "2020-10-19T17:53:16.502088Z", "updateTime": "2020-10-19T17:53:16.502088Z" } }}Java
Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.api.gax.longrunning.OperationFuture;importcom.google.api.gax.longrunning.OperationTimedPollAlgorithm;importcom.google.api.gax.retrying.RetrySettings;importcom.google.cloud.aiplatform.v1.AutomaticResources;importcom.google.cloud.aiplatform.v1.DedicatedResources;importcom.google.cloud.aiplatform.v1.DeployModelOperationMetadata;importcom.google.cloud.aiplatform.v1.DeployModelResponse;importcom.google.cloud.aiplatform.v1.DeployedModel;importcom.google.cloud.aiplatform.v1.EndpointName;importcom.google.cloud.aiplatform.v1.EndpointServiceClient;importcom.google.cloud.aiplatform.v1.EndpointServiceSettings;importcom.google.cloud.aiplatform.v1.MachineSpec;importcom.google.cloud.aiplatform.v1.ModelName;importcom.google.cloud.aiplatform.v1.stub.EndpointServiceStubSettings;importjava.io.IOException;importjava.util.HashMap;importjava.util.Map;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;importorg.threeten.bp.Duration;publicclassDeployModelSample{publicstaticvoidmain(String[]args)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringdeployedModelDisplayName="YOUR_DEPLOYED_MODEL_DISPLAY_NAME";StringendpointId="YOUR_ENDPOINT_NAME";StringmodelId="YOUR_MODEL_ID";inttimeout=900;deployModelSample(project,deployedModelDisplayName,endpointId,modelId,timeout);}staticvoiddeployModelSample(Stringproject,StringdeployedModelDisplayName,StringendpointId,StringmodelId,inttimeout)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// Set long-running operations (LROs) timeoutfinalOperationTimedPollAlgorithmoperationTimedPollAlgorithm=OperationTimedPollAlgorithm.create(RetrySettings.newBuilder().setInitialRetryDelay(Duration.ofMillis(5000L)).setRetryDelayMultiplier(1.5).setMaxRetryDelay(Duration.ofMillis(45000L)).setInitialRpcTimeout(Duration.ZERO).setRpcTimeoutMultiplier(1.0).setMaxRpcTimeout(Duration.ZERO).setTotalTimeout(Duration.ofSeconds(timeout)).build());EndpointServiceStubSettings.BuilderendpointServiceStubSettingsBuilder=EndpointServiceStubSettings.newBuilder();endpointServiceStubSettingsBuilder.deployModelOperationSettings().setPollingAlgorithm(operationTimedPollAlgorithm);EndpointServiceStubSettingsendpointStubSettings=endpointServiceStubSettingsBuilder.build();EndpointServiceSettingsendpointServiceSettings=EndpointServiceSettings.create(endpointStubSettings);endpointServiceSettings=endpointServiceSettings.toBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(EndpointServiceClientendpointServiceClient=EndpointServiceClient.create(endpointServiceSettings)){Stringlocation="us-central1";EndpointNameendpointName=EndpointName.of(project,location,endpointId);// key '0' assigns traffic for the newly deployed model// Traffic percentage values must add up to 100// Leave dictionary empty if endpoint should not accept any trafficMap<String,Integer>trafficSplit=newHashMap<>();trafficSplit.put("0",100);ModelNamemodelName=ModelName.of(project,location,modelId);AutomaticResourcesautomaticResourcesInput=AutomaticResources.newBuilder().setMinReplicaCount(1).setMaxReplicaCount(1).build();DeployedModeldeployedModelInput=DeployedModel.newBuilder().setModel(modelName.toString()).setDisplayName(deployedModelDisplayName).setAutomaticResources(automaticResourcesInput).build();OperationFuture<DeployModelResponse,DeployModelOperationMetadata>deployModelResponseFuture=endpointServiceClient.deployModelAsync(endpointName,deployedModelInput,trafficSplit);System.out.format("Operation name: %s\n",deployModelResponseFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");DeployModelResponsedeployModelResponse=deployModelResponseFuture.get(20,TimeUnit.MINUTES);System.out.println("Deploy Model Response");DeployedModeldeployedModel=deployModelResponse.getDeployedModel();System.out.println("\tDeployed Model");System.out.format("\t\tid: %s\n",deployedModel.getId());System.out.format("\t\tmodel: %s\n",deployedModel.getModel());System.out.format("\t\tDisplay Name: %s\n",deployedModel.getDisplayName());System.out.format("\t\tCreate Time: %s\n",deployedModel.getCreateTime());DedicatedResourcesdedicatedResources=deployedModel.getDedicatedResources();System.out.println("\t\tDedicated Resources");System.out.format("\t\t\tMin Replica Count: %s\n",dedicatedResources.getMinReplicaCount());MachineSpecmachineSpec=dedicatedResources.getMachineSpec();System.out.println("\t\t\tMachine Spec");System.out.format("\t\t\t\tMachine Type: %s\n",machineSpec.getMachineType());System.out.format("\t\t\t\tAccelerator Type: %s\n",machineSpec.getAcceleratorType());System.out.format("\t\t\t\tAccelerator Count: %s\n",machineSpec.getAcceleratorCount());AutomaticResourcesautomaticResources=deployedModel.getAutomaticResources();System.out.println("\t\tAutomatic Resources");System.out.format("\t\t\tMin Replica Count: %s\n",automaticResources.getMinReplicaCount());System.out.format("\t\t\tMax Replica Count: %s\n",automaticResources.getMaxReplicaCount());}}}Node.js
Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const modelId = "YOUR_MODEL_ID";// const endpointId = 'YOUR_ENDPOINT_ID';// const deployedModelDisplayName = 'YOUR_DEPLOYED_MODEL_DISPLAY_NAME';// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';constmodelName=`projects/${project}/locations/${location}/models/${modelId}`;constendpoint=`projects/${project}/locations/${location}/endpoints/${endpointId}`;// Imports the Google Cloud Endpoint Service Client libraryconst{EndpointServiceClient}=require('@google-cloud/aiplatform');// Specifies the location of the api endpoint:constclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstendpointServiceClient=newEndpointServiceClient(clientOptions);asyncfunctiondeployModel(){// Configure the parent resource// key '0' assigns traffic for the newly deployed model// Traffic percentage values must add up to 100// Leave dictionary empty if endpoint should not accept any trafficconsttrafficSplit={0:100};constdeployedModel={// format: 'projects/{project}/locations/{location}/models/{model}'model:modelName,displayName:deployedModelDisplayName,automaticResources:{minReplicaCount:1,maxReplicaCount:1},};constrequest={endpoint,deployedModel,trafficSplit,};// Get and print out a list of all the endpoints for this resourceconst[response]=awaitendpointServiceClient.deployModel(request);console.log(`Long running operation :${response.name}`);// Wait for operation to completeawaitresponse.promise();constresult=response.result;console.log('Deploy model response');constmodelDeployed=result.deployedModel;console.log('\tDeployed model');if(!modelDeployed){console.log('\t\tId : {}');console.log('\t\tModel : {}');console.log('\t\tDisplay name : {}');console.log('\t\tCreate time : {}');console.log('\t\tDedicated resources');console.log('\t\t\tMin replica count : {}');console.log('\t\t\tMachine spec {}');console.log('\t\t\t\tMachine type : {}');console.log('\t\t\t\tAccelerator type : {}');console.log('\t\t\t\tAccelerator count : {}');console.log('\t\tAutomatic resources');console.log('\t\t\tMin replica count : {}');console.log('\t\t\tMax replica count : {}');}else{console.log(`\t\tId :${modelDeployed.id}`);console.log(`\t\tModel :${modelDeployed.model}`);console.log(`\t\tDisplay name :${modelDeployed.displayName}`);console.log(`\t\tCreate time :${modelDeployed.createTime}`);constdedicatedResources=modelDeployed.dedicatedResources;console.log('\t\tDedicated resources');if(!dedicatedResources){console.log('\t\t\tMin replica count : {}');console.log('\t\t\tMachine spec {}');console.log('\t\t\t\tMachine type : {}');console.log('\t\t\t\tAccelerator type : {}');console.log('\t\t\t\tAccelerator count : {}');}else{console.log(`\t\t\tMin replica count : \${dedicatedResources.minReplicaCount}`);constmachineSpec=dedicatedResources.machineSpec;console.log('\t\t\tMachine spec');console.log(`\t\t\t\tMachine type :${machineSpec.machineType}`);console.log(`\t\t\t\tAccelerator type :${machineSpec.acceleratorType}`);console.log(`\t\t\t\tAccelerator count :${machineSpec.acceleratorCount}`);}constautomaticResources=modelDeployed.automaticResources;console.log('\t\tAutomatic resources');if(!automaticResources){console.log('\t\t\tMin replica count : {}');console.log('\t\t\tMax replica count : {}');}else{console.log(`\t\t\tMin replica count : \${automaticResources.minReplicaCount}`);console.log(`\t\t\tMax replica count : \${automaticResources.maxReplicaCount}`);}}}deployModel();Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
defdeploy_model_with_automatic_resources_sample(project,location,model_name:str,endpoint:Optional[aiplatform.Endpoint]=None,deployed_model_display_name:Optional[str]=None,traffic_percentage:Optional[int]=0,traffic_split:Optional[Dict[str, int]]=None,min_replica_count:int=1,max_replica_count:int=1,metadata:Optional[Sequence[Tuple[str, str]]]=(),sync:bool=True,):""" model_name: A fully-qualified model resource name or model ID. Example: "projects/123/locations/us-central1/models/456" or "456" when project and location are initialized or passed. """aiplatform.init(project=project,location=location)model=aiplatform.Model(model_name=model_name)model.deploy(endpoint=endpoint,deployed_model_display_name=deployed_model_display_name,traffic_percentage=traffic_percentage,traffic_split=traffic_split,min_replica_count=min_replica_count,max_replica_count=max_replica_count,metadata=metadata,sync=sync,)model.wait()print(model.display_name)print(model.resource_name)returnmodelLearn how tochange thedefault settings for inference logging.
Get operation status
Some requests start long-running operations that require time to complete. Theserequests return an operation name, which you can use to view the operation'sstatus or cancel the operation. Vertex AI provides helper methodsto make calls against long-running operations. For more information, seeWorking with long-runningoperations.
Make an online inference using your deployed model
To make an online inference, submit one or more test items to a model foranalysis, and the model returns results that are based on your model'sobjective. For more information about inference results, see theInterpret results page.
Console
Use the Google Cloud console to request an online inference. Your model mustbe deployed to an endpoint.
In the Google Cloud console, in the Vertex AI section, go totheModels page.
From the list of models, click the name of the model to request inferencesfrom.
Select theDeploy & test tab.
Under theTest your model section, add test items to request aninference.
AutoML models for image objectives require you to upload an imageto request an inference.
For information about local feature importance, seeGet explanations.
After the inference is complete, Vertex AI returns the results inthe console.
API
Use the Vertex AI API to request an online inference. Your model mustbe deployed to an endpoint.
Image data type objectives include classification and object detection.
Edge model inference: When you use AutoML image Edge models forinference, you must convert any non-JPEG inference file to a JPEG filebefore you send the inference request.
gcloud
Create a file named
request.jsonwith the following contents:{"instances":[{"content":"CONTENT"}],"parameters":{"confidenceThreshold":THRESHOLD_VALUE,"maxPredictions":MAX_PREDICTIONS}}Replace the following:
- CONTENT: Thebase64-encoded image content.
- THRESHOLD_VALUE Optional: The model returns only predictions thathave confidence scores with at least this value.
- MAX_PREDICTIONS Optional: The model returns up to this manypredictions with the highest confidence scores.
Run the following command:
gcloudaiendpointspredictENDPOINT_ID\--region=LOCATION_ID\--json-request=request.json
Replace the following:
- ENDPOINT_ID: The ID for the endpoint.
- LOCATION_ID: The region where you are using Vertex AI.
REST
Before using any of the request data, make the following replacements:
- LOCATION_ID: Region where Endpoint is located. For example,
us-central1. - PROJECT_ID: .
- ENDPOINT_ID: The ID for the endpoint.
- CONTENT: Thebase64-encoded image content.
- THRESHOLD_VALUE Optional: The model returns only predictions thathave confidence scores with at least this value.
- MAX_PREDICTIONS Optional: The model returns up to this manypredictions with the highest confidence scores.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict
Request JSON body:
{ "instances": [{ "content": "CONTENT" }], "parameters": { "confidenceThreshold":THRESHOLD_VALUE, "maxPredictions":MAX_PREDICTIONS }}To send your request, choose one of these options:
curl
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict"
PowerShell
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:predict" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "predictions": [ { "confidences": [ 0.975873291, 0.972160876, 0.879488528, 0.866532683, 0.686478078 ], "displayNames": [ "Salad", "Salad", "Tomato", "Tomato", "Salad" ], "ids": [ "7517774415476555776", "7517774415476555776", "2906088397049167872", "2906088397049167872", "7517774415476555776" ], "bboxes": [ [ 0.0869686604, 0.977020741, 0.395135701, 1 ], [ 0, 0.488701463, 0.00157663226, 0.512249 ], [ 0.361617863, 0.509664357, 0.772928834, 0.914706349 ], [ 0.310678929, 0.45781514, 0.565507233, 0.711237729 ], [ 0.584359646, 1, 0.00116168708, 0.130817384 ] ] } ], "deployedModelId": "3860570043075002368"}Java
Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.cloud.aiplatform.util.ValueConverter;importcom.google.cloud.aiplatform.v1.EndpointName;importcom.google.cloud.aiplatform.v1.PredictResponse;importcom.google.cloud.aiplatform.v1.PredictionServiceClient;importcom.google.cloud.aiplatform.v1.PredictionServiceSettings;importcom.google.cloud.aiplatform.v1.schema.predict.instance.ImageObjectDetectionPredictionInstance;importcom.google.cloud.aiplatform.v1.schema.predict.params.ImageObjectDetectionPredictionParams;importcom.google.cloud.aiplatform.v1.schema.predict.prediction.ImageObjectDetectionPredictionResult;importcom.google.protobuf.Value;importjava.io.IOException;importjava.nio.charset.StandardCharsets;importjava.nio.file.Files;importjava.nio.file.Paths;importjava.util.ArrayList;importjava.util.Base64;importjava.util.List;publicclassPredictImageObjectDetectionSample{publicstaticvoidmain(String[]args)throwsIOException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringfileName="YOUR_IMAGE_FILE_PATH";StringendpointId="YOUR_ENDPOINT_ID";predictImageObjectDetection(project,fileName,endpointId);}staticvoidpredictImageObjectDetection(Stringproject,StringfileName,StringendpointId)throwsIOException{PredictionServiceSettingssettings=PredictionServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(PredictionServiceClientpredictionServiceClient=PredictionServiceClient.create(settings)){Stringlocation="us-central1";EndpointNameendpointName=EndpointName.of(project,location,endpointId);byte[]contents=Base64.getEncoder().encode(Files.readAllBytes(Paths.get(fileName)));Stringcontent=newString(contents,StandardCharsets.UTF_8);ImageObjectDetectionPredictionParamsparams=ImageObjectDetectionPredictionParams.newBuilder().setConfidenceThreshold((float)(0.5)).setMaxPredictions(5).build();ImageObjectDetectionPredictionInstanceinstance=ImageObjectDetectionPredictionInstance.newBuilder().setContent(content).build();List<Value>instances=newArrayList<>();instances.add(ValueConverter.toValue(instance));PredictResponsepredictResponse=predictionServiceClient.predict(endpointName,instances,ValueConverter.toValue(params));System.out.println("Predict Image Object Detection Response");System.out.format("\tDeployed Model Id: %s\n",predictResponse.getDeployedModelId());System.out.println("Predictions");for(Valueprediction:predictResponse.getPredictionsList()){ImageObjectDetectionPredictionResult.BuilderresultBuilder=ImageObjectDetectionPredictionResult.newBuilder();ImageObjectDetectionPredictionResultresult=(ImageObjectDetectionPredictionResult)ValueConverter.fromValue(resultBuilder,prediction);for(inti=0;i <result.getIdsCount();i++){System.out.printf("\tDisplay name: %s\n",result.getDisplayNames(i));System.out.printf("\tConfidences: %f\n",result.getConfidences(i));System.out.printf("\tIDs: %d\n",result.getIds(i));System.out.printf("\tBounding boxes: %s\n",result.getBboxes(i));}}}}}Node.js
Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const filename = "YOUR_PREDICTION_FILE_NAME";// const endpointId = "YOUR_ENDPOINT_ID";// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';constaiplatform=require('@google-cloud/aiplatform');const{instance,params,prediction}=aiplatform.protos.google.cloud.aiplatform.v1.schema.predict;// Imports the Google Cloud Prediction Service Client libraryconst{PredictionServiceClient}=aiplatform.v1;// Specifies the location of the api endpointconstclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstpredictionServiceClient=newPredictionServiceClient(clientOptions);asyncfunctionpredictImageObjectDetection(){// Configure the endpoint resourceconstendpoint=`projects/${project}/locations/${location}/endpoints/${endpointId}`;constparametersObj=newparams.ImageObjectDetectionPredictionParams({confidenceThreshold:0.5,maxPredictions:5,});constparameters=parametersObj.toValue();constfs=require('fs');constimage=fs.readFileSync(filename,'base64');constinstanceObj=newinstance.ImageObjectDetectionPredictionInstance({content:image,});constinstanceVal=instanceObj.toValue();constinstances=[instanceVal];constrequest={endpoint,instances,parameters,};// Predict requestconst[response]=awaitpredictionServiceClient.predict(request);console.log('Predict image object detection response');console.log(`\tDeployed model id :${response.deployedModelId}`);constpredictions=response.predictions;console.log('Predictions :');for(constpredictionResultValofpredictions){constpredictionResultObj=prediction.ImageObjectDetectionPredictionResult.fromValue(predictionResultVal);for(const[i,label]ofpredictionResultObj.displayNames.entries()){console.log(`\tDisplay name:${label}`);console.log(`\tConfidences:${predictionResultObj.confidences[i]}`);console.log(`\tIDs:${predictionResultObj.ids[i]}`);console.log(`\tBounding boxes:${predictionResultObj.bboxes[i]}\n\n`);}}}predictImageObjectDetection();Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
importbase64fromgoogle.cloudimportaiplatformfromgoogle.cloud.aiplatform.gapic.schemaimportpredictdefpredict_image_object_detection_sample(project:str,endpoint_id:str,filename:str,location:str="us-central1",api_endpoint:str="us-central1-aiplatform.googleapis.com",):# The AI Platform services require regional API endpoints.client_options={"api_endpoint":api_endpoint}# Initialize client that will be used to create and send requests.# This client only needs to be created once, and can be reused for multiple requests.client=aiplatform.gapic.PredictionServiceClient(client_options=client_options)withopen(filename,"rb")asf:file_content=f.read()# The format of each instance should conform to the deployed model's prediction input schema.encoded_content=base64.b64encode(file_content).decode("utf-8")instance=predict.instance.ImageObjectDetectionPredictionInstance(content=encoded_content,).to_value()instances=[instance]# See gs://google-cloud-aiplatform/schema/predict/params/image_object_detection_1.0.0.yaml for the format of the parameters.parameters=predict.params.ImageObjectDetectionPredictionParams(confidence_threshold=0.5,max_predictions=5,).to_value()endpoint=client.endpoint_path(project=project,location=location,endpoint=endpoint_id)response=client.predict(endpoint=endpoint,instances=instances,parameters=parameters)print("response")print(" deployed_model_id:",response.deployed_model_id)# See gs://google-cloud-aiplatform/schema/predict/prediction/image_object_detection_1.0.0.yaml for the format of the predictions.predictions=response.predictionsforpredictioninpredictions:print(" prediction:",dict(prediction))Get batch inferences
To make a batch inference request, you specify an input sourceand an output format where Vertex AI stores inferenceresults. Batch inferences for the AutoML image model type require aninputJSON Lines file and the name of a Cloud Storage bucket to store the output.
Note: To minimize processing time when you use the Google Cloud console tocreate batch inferences, we recommend that you select input and outputlocations that are in the same region as your model. If you use the API tocreate batch inferences, send requests to a service endpoint (such ashttps://us-central1-aiplatform.googleapis.com) that is in the sameregion or geographically close to your input and output locations.Input data requirements
The input for batch requests specifies the items to send to your model forinference. For image object detection models, you can use a JSON Lines file tospecify a list of images to make inferences about and then store the JSON Linesfile in a Cloud Storage bucket. The following sample shows a single line inan input JSON Lines file:
{"content": "gs://sourcebucket/datasets/images/source_image.jpg", "mimeType": "image/jpeg"}Request a batch inference
For batch inference requests, you can use the Google Cloud console or theVertex AI API. Depending on the number of input items that you've submitted, abatch inference task can take some time to complete.
Google Cloud console
Use the Google Cloud console to request a batch inference.
In the Google Cloud console, in the Vertex AI section, go totheBatch predictions page.
ClickCreate to open theNew batch prediction window and completethe following steps:
- Enter a name for the batch inference.
- ForModel name, select the name of the model to use for thisbatch inference.
- ForSource path, specify the Cloud Storage location where yourJSON Lines input file is located.
- For theDestination path, specify a Cloud Storage locationwhere the batch inference results are stored. TheOutput format isdetermined by your model's objective. AutoML models for imageobjectives output JSON Lines files.
API
Use the Vertex AI API to send batch inference requests.
REST
Before using any of the request data, make the following replacements:
- LOCATION_ID: Region where Model is stored and batch inference job is executed. For example,
us-central1. - PROJECT_ID:
- BATCH_JOB_NAME: Display name for the batch job
- MODEL_ID: The ID for the model to use for making inferences
- THRESHOLD_VALUE (optional): Vertex AI returns only inferences that have confidence scores with at least this value. The default is
0.0. - MAX_PREDICTIONS (optional): Vertex AI returns up to this many inferences starting with the inferences that have highest confidence scores. The default is
10. - URI: Cloud Storage URI where your input JSON Lines file is located.
- BUCKET: Your Cloud Storage bucket
- PROJECT_NUMBER: Your project's automatically generatedproject number
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs
Request JSON body:
{ "displayName": "BATCH_JOB_NAME", "model": "projects/PROJECT/locations/LOCATION/models/MODEL_ID", "modelParameters": { "confidenceThreshold":THRESHOLD_VALUE, "maxPredictions":MAX_PREDICTIONS }, "inputConfig": { "instancesFormat": "jsonl", "gcsSource": { "uris": ["URI"], }, }, "outputConfig": { "predictionsFormat": "jsonl", "gcsDestination": { "outputUriPrefix": "OUTPUT_BUCKET", }, },}To send your request, choose one of these options:
curl
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"
PowerShell
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/batchPredictionJobs/BATCH_JOB_ID", "displayName": "BATCH_JOB_NAME", "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID", "inputConfig": { "instancesFormat": "jsonl", "gcsSource": { "uris": [ "CONTENT" ] } }, "outputConfig": { "predictionsFormat": "jsonl", "gcsDestination": { "outputUriPrefix": "BUCKET" } }, "state": "JOB_STATE_PENDING", "createTime": "2020-05-30T02:58:44.341643Z", "updateTime": "2020-05-30T02:58:44.341643Z", "modelDisplayName": "MODEL_NAME", "modelObjective": "MODEL_OBJECTIVE"}You can poll for the status of the batch job usingtheBATCH_JOB_ID until the jobstate isJOB_STATE_SUCCEEDED.
Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
defcreate_batch_prediction_job_sample(project:str,location:str,model_resource_name:str,job_display_name:str,gcs_source:Union[str, Sequence[str]],gcs_destination:str,sync:bool=True,):aiplatform.init(project=project,location=location)my_model=aiplatform.Model(model_resource_name)batch_prediction_job=my_model.batch_predict(job_display_name=job_display_name,gcs_source=gcs_source,gcs_destination_prefix=gcs_destination,sync=sync,)batch_prediction_job.wait()print(batch_prediction_job.display_name)print(batch_prediction_job.resource_name)print(batch_prediction_job.state)returnbatch_prediction_jobRetrieve batch inference results
Vertex AI sends batch inference output to your specified destination.
When a batch inference task is complete, the output of the inference isstored in the Cloud Storage bucket that you specified in your request.
Example batch inference results
The following an example batch inference results from a image object detectionmodel.
Important: Bounding boxes are specified as: "bboxes": [ [xMin, xMax, yMin, yMax], ...]
xMin andxMax are the minimum and maximum x values and yMin andyMax are the minimum and maximum y values respectively.{ "instance": {"content": "gs://bucket/image.jpg", "mimeType": "image/jpeg"}, "prediction": { "ids": [1, 2], "displayNames": ["cat", "dog"], "bboxes": [ [0.1, 0.2, 0.3, 0.4], [0.2, 0.3, 0.4, 0.5] ], "confidences": [0.7, 0.5] }}Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.