Get batch predictions from a self-deployed Model Garden model

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Some of the models that are available inModel Gardencan be self-deployed in your own Google Cloud project and used to provide batchpredictions. Batch predictions let you efficiently use a model to processmultiple text-only prompts that aren't latency sensitive.

Prepare input

Before you begin, prepare your inputs in a BigQuery table or as aJSONL file in Cloud Storage. The input for both sources mustfollow theOpenAI API schema JSON format, as shown in the followingexample:

{"body":{"messages":[{"role":"user","content":"Give me a recipe for banana bread"}],"max_tokens":1000}}

Note: Vertex AI doesn't use thecustom_id,method,url, andmodel fields. You can include them, but they are ignored by the batchprediction job.

BigQuery

Your BigQuery input table must adhere to the following schema:

Column name	Description
custom_id	An ID for each request to match the input with the output.
method	The request method.
url	The request endpoint.
body(JSON)	Your input prompt.

Your input table can have other columns, which are ignored by the batch joband passed directly to the output table.
Batch prediction jobs reserve two column names for the batch predictionoutput:response(JSON) andid. Don't use these columns in the inputtable.
Themethod andurl columns are dropped and not included in the outputtable.

Cloud Storage

For Cloud Storage, the input file must be a JSONL file that islocated in a Cloud Storage bucket.

Get the required resources for a model

Choose a model and query its resource requirements. The requiredresources appear in the response, in thededicatedResources field, whichyou specify in the configuration of your batch prediction job.

REST

Before using any of the request data, make the following replacements:

PUBLISHER: The model publisher, for example,meta,google,mistral-ai, ordeepseek-ai.
PUBLISHER_MODEL_ID: The publisher's model ID for the model, for example,llama3_1.
VERSION_ID: The publisher's version ID for the model, for example,llama-3.1-8b-instruct.

HTTP method and URL:

GET "https://us-central1-aiplatform.googleapis.com/ui/publishers/PUBLISHER/models/PUBLISHER_MODEL_ID@VERSION_ID" | jq '.supportedActions.multiDeployVertex'

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by running gcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Execute the following command:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project:PROJECT_ID" \
     ""https://us-central1-aiplatform.googleapis.com/ui/publishers/PUBLISHER/models/PUBLISHER_MODEL_ID@VERSION_ID" | jq '.supportedActions.multiDeployVertex'"

PowerShell

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by running gcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri ""https://us-central1-aiplatform.googleapis.com/ui/publishers/PUBLISHER/models/PUBLISHER_MODEL_ID@VERSION_ID" | jq '.supportedActions.multiDeployVertex'" | Select-Object -Expand Content

You should receive a successful status code (2xx) and an empty response.

Request a batch prediction

Make a batch prediction against a self-deployed Model Garden model by using input fromBigQuery or Cloud Storage.You can independently choose to output predictions to either aBigQuery table or a JSONL file in a Cloud Storagebucket.

BigQuery

Specify your BigQuery input table, model, and output location.The batch prediction job and your table must be in the same region.

REST

Before using any of the request data, make the following replacements:

LOCATION: A region that supports Model Garden self-deployed models.
PROJECT_ID: Yourproject ID.
MODEL: The name of themodel to tune, for example,llama-3.1-8b-instruct.
PUBLISHER: The model publisher, for example,meta,google,mistral-ai, ordeepseek-ai.
INPUT_URI: The BigQuery table where your batch prediction input is located such asmyproject.mydataset.input_table.
OUTPUT_FORMAT: To output to a BigQuery table, specifybigquery. To output to a Cloud Storage bucket, specifyjsonl.
DESTINATION: For BigQuery, specifybigqueryDestination. For Cloud Storage, specifygcsDestination.
OUTPUT_URI_FIELD_NAME: For BigQuery, specifyoutputUri. For Cloud Storage, specifyoutputUriPrefix.
OUTPUT_URI: For BigQuery, specify the table location such asmyproject.mydataset.output_result. For Cloud Storage, specify the bucket and folder location such asgs://mybucket/path/to/outputfile.
MACHINE_TYPE: Defines the set of resources to deploy for your model, for example,g2-standard-4.
ACC_TYPE: Specifies accelerators to add to your batch prediction job to help improve performance when working with intensive workloads, for example,NVIDIA_L4.
ACC_COUNT: The number of accelerators to use in your batch prediction job.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs

Request JSON body:

'{  "displayName": "JOB_NAME",  "model": "publishers/PUBLISHER/models/MODEL",  "inputConfig": {    "instancesFormat":"bigquery",    "bigquerySource":{      "inputUri" : "INPUT_URI"    }  },  "outputConfig": {    "predictionsFormat":"OUTPUT_FORMAT",    "DESTINATION":{      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"    }  },  "dedicated_resources": {    "machine_spec": {      "machine_type": "MACHINE_TYPE",      "accelerator_type": "ACC_TYPE",      "accelerator_count":ACC_COUNT,    },    "starting_replica_count": 1,  },}'

To send your request, choose one of these options:

curl

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"

PowerShell

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

{"name":  "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",  "displayName": "JOB_NAME",  "model": "publishers/PUBLISHER/models/MODEL",  "inputConfig": {    "instancesFormat":"bigquery",    "bigquerySource":{      "inputUri" : "INPUT_URI"    }  },  "outputConfig": {    "predictionsFormat":"OUTPUT_FORMAT",    "DESTINATION":{      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"    }  },  "state": "JOB_STATE_PENDING",  "createTime": "2024-10-16T19:33:59.153782Z",  "updateTime": "2024-10-16T19:33:59.153782Z",  "labels": {    "purpose": "testing"  },  "modelVersionId": "1"}

Cloud Storage

Specify your JSONL file's Cloud Storage location, model, and outputlocation.

REST

Before using any of the request data, make the following replacements:

LOCATION: A region that supports Model Garden self-deployed models.
PROJECT_ID: Yourproject ID.
MODEL: The name of themodel to tune, for example,llama-3.1-8b-instruct.
PUBLISHER: The model publisher, for example,meta,google,mistral-ai, ordeepseek-ai.
INPUT_URI: The Cloud Storage location of your JSONL batch prediction input such asgs://bucketname/path/to/jsonl.
OUTPUT_FORMAT: To output to a BigQuery table, specifybigquery. To output to a Cloud Storage bucket, specifyjsonl.
DESTINATION: For BigQuery, specifybigqueryDestination. For Cloud Storage, specifygcsDestination.
OUTPUT_URI_FIELD_NAME: For BigQuery, specifyoutputUri. For Cloud Storage, specifyoutputUriPrefix.
OUTPUT_URI: For BigQuery, specify the table location such asmyproject.mydataset.output_result. For Cloud Storage, specify the bucket and folder location such asgs://mybucket/path/to/outputfile.
MACHINE_TYPE: Defines the set of resources to deploy for your model, for example,g2-standard-4.
ACC_TYPE: Specifies accelerators to add to your batch prediction job to help improve performance when working with intensive workloads, for example,NVIDIA_L4.
ACC_COUNT: The number of accelerators to use in your batch prediction job.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs

Request JSON body:

'{  "displayName": "JOB_NAME",  "model": "publishers/PUBLISHER/models/MODEL",  "inputConfig": {    "instancesFormat":"jsonl",    "gcsDestination":{      "uris" : "INPUT_URI"    }  },  "outputConfig": {    "predictionsFormat":"OUTPUT_FORMAT",    "DESTINATION":{      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"    }  },  "dedicated_resources": {    "machine_spec": {        "machine_type": "MACHINE_TYPE",        "accelerator_type": "ACC_TYPE",        "accelerator_count":ACC_COUNT,    },    "starting_replica_count": 1,  },}'

To send your request, choose one of these options:

curl

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"

PowerShell

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

{"name":  "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",  "displayName": "JOB_NAME",  "model": "publishers/PUBLISHER/models/MODEL",  "inputConfig": {    "instancesFormat": "jsonl",    "gcsSource": {      "uris": [        "INPUT_URI"      ]    }  },  "outputConfig": {    "predictionsFormat":"OUTPUT_FORMAT",    "DESTINATION":{      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"    }  },  "state": "JOB_STATE_PENDING",  "createTime": "2024-10-16T19:33:59.153782Z",  "updateTime": "2024-10-16T19:33:59.153782Z",  "labels": {    "purpose": "testing"  },  "modelVersionId": "1"}

Get the status of a batch prediction job

Get the state of your batch prediction job to check whether it has completedsuccessfully. The job length depends on the number of input items that yousubmitted.

REST

Before using any of the request data, make the following replacements:

PROJECT_ID: Yourproject ID.
LOCATION: The region where your batch job is located.
JOB_ID: The batch job ID that was returned when you created the job.

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

{"name":  "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID",  "displayName": "JOB_NAME",  "model": "publishers/PUBLISHER/models/MODEL",  "inputConfig": {    "instancesFormat":"bigquery",    "bigquerySource":{      "inputUri" : "INPUT_URI"    }  },  "outputConfig": {    "predictionsFormat":"OUTPUT_FORMAT",    "DESTINATION":{      "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI"    }  },  "state": "JOB_STATE_SUCCEEDED",  "createTime": "2024-10-16T19:33:59.153782Z",  "updateTime": "2024-10-16T19:33:59.153782Z",  "labels": {    "purpose": "testing"  },  "modelVersionId": "1"}

Retrieve output

When a batch prediction job completes, retrieve the output from the locationthat you specified:

For BigQuery, the output is in theresponse(JSON) column ofyour destination BigQuery table.
For Cloud Storage, the output is saved as a JSONL file in the outputCloud Storage location.

Supported models

Vertex AI supports batch predictions for the following self-deployedmodels:

Llama
- publishers/meta/models/llama3_1@llama-3.1-8b-instruct
- publishers/meta/models/llama3_1@llama-3.1-70b-instruct
- publishers/meta/models/llama3_1@llama-3.1-405b-instruct-fp8
- publishers/meta/models/llama3-2@llama-3.2-1b-instruct
- publishers/meta/models/llama3-2@llama-3.2-3b-instruct
- publishers/meta/models/llama3-2@llama-3.2-90b-vision-instruct
Gemma
- publishers/google/models/gemma@gemma-1.1-2b-it
- publishers/google/models/gemma@gemma-7b-it
- publishers/google/models/gemma@gemma-1.1-7b-it
- publishers/google/models/gemma@gemma-2b-it
- publishers/google/models/gemma2@gemma-2-2b-it
- publishers/google/models/gemma2@gemma-2-9b-it
- publishers/google/models/gemma2@gemma-2-27b-it
Mistral
- publishers/mistral-ai/models/mistral@mistral-7b-instruct-v0.2
- publishers/mistral-ai/models/mistral@mistral-7b-instruct-v0.3
- publishers/mistral-ai/models/mistral@mistral-7b-instruct-v0.1
- publishers/mistral-ai/models/mistral@mistral-nemo-instruct-2407
Deepseek
- publishers/deepseek-ai/models/deepseek-r1@deepseek-r1-distill-llama-8b

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換

Get batch predictions from a self-deployed Model Garden model Stay organized with collections Save and categorize content based on your preferences.

Prepare input

BigQuery

Cloud Storage

Get the required resources for a model

REST

curl

PowerShell

Request a batch prediction

BigQuery

REST

curl

PowerShell

Response

Cloud Storage

REST

curl

PowerShell

Response

Get the status of a batch prediction job

REST

curl

PowerShell

Response

Retrieve output

Supported models

Get batch predictions from a self-deployed Model Garden model