Batch predictions Stay organized with collections Save and categorize content based on your preferences.
Batch predictions lets you efficiently send multiple text-only prompts thataren't latency sensitive to a model. Compared to online predictions, where yousend one input prompt for each request, you can batch a large number of inputprompts in a single request.
Supported models
Vertex AI supports batch predictions for the following models.
Llama:
- Llama 4 Maverick 17B-128E
- Llama 4 Scout 17B-16E
- Llama 3.3 70B
- Llama 3.1 405B (Preview)
- Llama 3.1 70B (Preview)
- Llama 3.1 8B (Preview)
OpenAI gpt-oss:
Qwen:
DeepSeek:
Embedding models:
Prepare input
Before you begin, prepare your inputs in a BigQuery table or as aJSONL file in Cloud Storage. The input for both sources must follow theOpenAI API schema JSONL format, as shown in the following examples.
Large language models:
{"custom_id": "test-request-0", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "MODEL_ID", "messages": [{"role": "user", "content": "Give me a recipe for banana bread"}], "max_tokens": 1000}}Embedding models:
{"custom_id": "test-request-0", "method": "POST", "url": "/v1/embeddings", "body": {"model": "MODEL_ID", "input": "Hello World"}}custom_id,method,url, andmodelfields. You can include them but they are ignored by the batch prediction job.BigQuery
Your BigQuery input table must adhere to the followingschema:
| Column name | Description |
|---|---|
| custom_id | An ID for each request to match the input with the output. |
| method | The request method. |
| url | The request endpoint. |
| body(JSON) | Your input prompt. |
- Your input table can have other columns, which are ignored by the batchjob and passed directly to the output table.
- Batch prediction jobs reserve two column names for the batch predictionoutput:response(JSON) andid. Don't use these columns in the inputtable.
- Themethod andurl columns are dropped and not included in theoutput table.
Cloud Storage
For Cloud Storage, the input file must be a JSONL file that islocated in a Cloud Storage bucket.
Request a batch prediction
Make a batch prediction against a model by using input fromBigQuery orCloud Storage.You can independently choose to output predictions to either aBigQuery table or a JSONL file in a Cloud Storagebucket.
BigQuery
Specify your BigQuery input table, model, and output location.The batch prediction job and your table must be in the same region.
REST
After youset up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- LOCATION: A region that supports the model.
- PROJECT_ID: Yourproject ID.
- MODEL: The name of themodel to tune.
- INPUT_URI: The BigQuery table where your batch prediction input is located such as
myproject.mydataset.input_table. - OUTPUT_FORMAT: To output to a BigQuery table, specify
bigquery. To output to a Cloud Storage bucket, specifyjsonl. - DESTINATION: For BigQuery, specify
bigqueryDestination. For Cloud Storage, specifygcsDestination. - OUTPUT_URI_FIELD_NAME: For BigQuery, specify
outputUri. For Cloud Storage, specifyoutputUriPrefix. - OUTPUT_URI: For BigQuery, specify the table location such as
myproject.mydataset.output_result. For Cloud Storage, specify the bucket and folder location such asgs://mybucket/path/to/outputfile.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs
Request JSON body:
'{ "displayName": "JOB_NAME", "model": "publishers/PUBLISHER/models/MODEL_ID", "inputConfig": { "instancesFormat":"bigquery", "bigquerySource":{ "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"OUTPUT_FORMAT", "DESTINATION":{ "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI" } }}'To send your request, choose one of these options:
curl
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"
PowerShell
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Response
{"name": "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID", "displayName": "JOB_NAME", "model": "publishers/PUBLISHER/models/MODEL_ID", "inputConfig": { "instancesFormat":"bigquery", "bigquerySource":{ "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"OUTPUT_FORMAT", "DESTINATION":{ "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI" } }, "state": "JOB_STATE_PENDING", "createTime": "2024-10-16T19:33:59.153782Z", "updateTime": "2024-10-16T19:33:59.153782Z", "labels": { "purpose": "testing" }, "modelVersionId": "1"}Cloud Storage
Specify your JSONL file's Cloud Storage location, model, and outputlocation.
REST
After youset up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- LOCATION: A region that supports the model.
- PROJECT_ID: Yourproject ID.
- MODEL: The name of themodel to tune.
- INPUT_URI: The Cloud Storage location of your JSONL batch prediction input such as
gs://bucketname/path/to/jsonl. - OUTPUT_FORMAT: To output to a BigQuery table, specify
bigquery. To output to a Cloud Storage bucket, specifyjsonl. - DESTINATION: For BigQuery, specify
bigqueryDestination. For Cloud Storage, specifygcsDestination. - OUTPUT_URI_FIELD_NAME: For BigQuery, specify
outputUri. For Cloud Storage, specifyoutputUriPrefix. - OUTPUT_URI: For BigQuery, specify the table location such as
myproject.mydataset.output_result. For Cloud Storage, specify the bucket and folder location such asgs://mybucket/path/to/outputfile.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs
Request JSON body:
'{ "displayName": "JOB_NAME", "model": "publishers/PUBLISHER/models/MODEL_ID", "inputConfig": { "instancesFormat":"jsonl", "gcsSource":{ "uris" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"OUTPUT_FORMAT", "DESTINATION":{ "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI" } }}'To send your request, choose one of these options:
curl
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs"
PowerShell
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Response
{"name": "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID", "displayName": "JOB_NAME", "model": "publishers/PUBLISHER/models/MODEL_ID", "inputConfig": { "instancesFormat": "jsonl", "gcsSource": { "uris": [ "INPUT_URI" ] } }, "outputConfig": { "predictionsFormat":"OUTPUT_FORMAT", "DESTINATION":{ "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI" } }, "state": "JOB_STATE_PENDING", "createTime": "2024-10-16T19:33:59.153782Z", "updateTime": "2024-10-16T19:33:59.153782Z", "labels": { "purpose": "testing" }, "modelVersionId": "1"}Get the status of a batch prediction job
Get the state of your batch prediction job to check whether it has completedsuccessfully. The job length depends on the number input items that yousubmitted.
REST
After youset up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Yourproject ID.
- LOCATION: The region where your batch job is located.
- JOB_ID: The batch job ID that was returned when you created the job.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID
To send your request, choose one of these options:
curl
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID"
PowerShell
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/JOB_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Response
{"name": "projects/PROJECT_ID/locations/LOCATION/batchPredictionJobs/BATCH_JOB_ID", "displayName": "JOB_NAME", "model": "publishers/PUBLISHER/models/MODEL_ID", "inputConfig": { "instancesFormat":"bigquery", "bigquerySource":{ "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"OUTPUT_FORMAT", "DESTINATION":{ "OUTPUT_URI_FIELD_NAME": "OUTPUT_URI" } }, "state": "JOB_STATE_SUCCEEDED", "createTime": "2024-10-16T19:33:59.153782Z", "updateTime": "2024-10-16T19:33:59.153782Z", "labels": { "purpose": "testing" }, "modelVersionId": "1"}Retrieve output
When a batch prediction job completes, retrieve the output from the locationthat you specified. For BigQuery, the output is in theresponse(JSON) column of your destination BigQuery table. ForCloud Storage, the output is saved as a JSONL file in the outputCloud Storage location.
What's next
- Learn how toCall MaaS APIs for open models for streaming and non-streaming use cases.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.