Get batch text embeddings inferences

Getting responses in a batch is a way to efficiently send large numbers of non-latencysensitive embeddings requests. Different from getting online responses,where you are limited to one input request at a time, you can send a large numberof LLM requests in a single batch request. Similar to how batch inference is donefortabular data in Vertex AI,you determine your output location, add your input, and your responses asynchronouslypopulate into your output location.

Text embeddings models that support batch inferences

All stable versions of text embedding models support batch inferences with theexception of Gemini embeddings (gemini-embedding-001). Stable versionsare fully supported for production environments. To view the full list ofembedding models, seeEmbedding model and versions.

Prepare your inputs

The input for batch requests are a list of prompts that can either be stored ina BigQuery table or as aJSON Lines (JSONL) file inCloud Storage. Each request can include up to 30,000 prompts.

JSONL example

This section shows examples of how to format JSONL input and output.

JSONL input example

{"content":"Give a short description of a machine learning model:"}{"content":"Best recipe for banana bread:"}

JSONL output example

{"instance":{"content":"Give..."},"predictions":[{"embeddings":{"statistics":{"token_count":8,"truncated":false},"values":[0.2,....]}}],"status":""}{"instance":{"content":"Best..."},"predictions":[{"embeddings":{"statistics":{"token_count":3,"truncated":false},"values":[0.1,....]}}],"status":""}

BigQuery example

This section shows examples of how to format BigQuery input and output.

BigQuery input example

This example shows a single column BigQuery table.

content
"Give a short description of a machine learning model:"
"Best recipe for banana bread:"

BigQuery output example

contentpredictionsstatus
"Give a short description of a machine learning model:"
'[{"embeddings":{"statistics":{"token_count":8,"truncated":false},"Values":[0.1,....]}}]'
"Best recipe for banana bread:"
'[{"embeddings":{"statistics":{"token_count":3,"truncated":false},"Values":[0.2,....]}}]'

Request a batch response

Depending on the number of input items that you've submitted, abatch generation task can take some time to complete.

REST

To test a text prompt by using the Vertex AI API, send a POST request to thepublisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: The ID of your Google Cloud project.
  • BP_JOB_NAME: The job name.
  • INPUT_URI: The input source URI. This is either a BigQuery table URI or a JSONL file URI in Cloud Storage.
  • OUTPUT_URI: Output target URI.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs

Request JSON body:

{    "name": "BP_JOB_NAME",    "displayName": "BP_JOB_NAME",    "model": "publishers/google/models/textembedding-gecko",    "inputConfig": {      "instancesFormat":"bigquery",      "bigquerySource":{        "inputUri" : "INPUT_URI"      }    },    "outputConfig": {      "predictionsFormat":"bigquery",      "bigqueryDestination":{        "outputUri": "OUTPUT_URI"    }  }}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"

PowerShell

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/123456789012/locations/us-central1/batchPredictionJobs/1234567890123456789",  "displayName": "BP_sample_publisher_BQ_20230712_134650",  "model": "projects/{PROJECT_ID}/locations/us-central1/models/textembedding-gecko",  "inputConfig": {    "instancesFormat": "bigquery",    "bigquerySource": {      "inputUri": "bq://project_name.dataset_name.text_input"    }  },  "modelParameters": {},  "outputConfig": {    "predictionsFormat": "bigquery",    "bigqueryDestination": {      "outputUri": "bq://project_name.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650"    }  },  "state": "JOB_STATE_PENDING",  "createTime": "2023-07-12T20:46:52.148717Z",  "updateTime": "2023-07-12T20:46:52.148717Z",  "labels": {    "owner": "sample_owner",    "product": "llm"  },  "modelVersionId": "1",  "modelMonitoringStatus": {}}

The response includes a unique identifier for the batch job.You can poll for the status of the batch job usingtheBATCH_JOB_ID until the jobstate isJOB_STATE_SUCCEEDED. For example:

curl\-XGET\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID
Note: You can run only one batch response job at a time. Custom Service accounts, live progress, CMEK, and VPC-SC reports aren't supported at this time.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values# with appropriate values for your project.exportGOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECTexportGOOGLE_CLOUD_LOCATION=us-central1exportGOOGLE_GENAI_USE_VERTEXAI=True

importtimefromgoogleimportgenaifromgoogle.genai.typesimportCreateBatchJobConfig,JobState,HttpOptionsclient=genai.Client(http_options=HttpOptions(api_version="v1"))# TODO(developer): Update and un-comment below line# output_uri = "gs://your-bucket/your-prefix"# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.batches.Batches.createjob=client.batches.create(model="text-embedding-005",# Source link: https://storage.cloud.google.com/cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonlsrc="gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl",config=CreateBatchJobConfig(dest=output_uri),)print(f"Job name:{job.name}")print(f"Job state:{job.state}")# Example response:# Job name: projects/.../locations/.../batchPredictionJobs/9876453210000000000# Job state: JOB_STATE_PENDING# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.types.BatchJobcompleted_states={JobState.JOB_STATE_SUCCEEDED,JobState.JOB_STATE_FAILED,JobState.JOB_STATE_CANCELLED,JobState.JOB_STATE_PAUSED,}whilejob.statenotincompleted_states:time.sleep(30)job=client.batches.get(name=job.name)print(f"Job state:{job.state}")ifjob.state==JobState.JOB_STATE_FAILED:print(f"Error:{job.error}")break# Example response:# Job state: JOB_STATE_PENDING# Job state: JOB_STATE_RUNNING# Job state: JOB_STATE_RUNNING# ...# Job state: JOB_STATE_SUCCEEDED

Go

Learn how to install or update theGo.

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values# with appropriate values for your project.exportGOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECTexportGOOGLE_CLOUD_LOCATION=us-central1exportGOOGLE_GENAI_USE_VERTEXAI=True

import("context""fmt""io""time""google.golang.org/genai")//generateBatchEmbeddingsshowshowtorunabatchembeddingspredictionjob.funcgenerateBatchEmbeddings(wio.Writer,outputURIstring)error{//outputURI="gs://your-bucket/your-prefix"ctx:=context.Background()client,err:=genai.NewClient(ctx, &genai.ClientConfig{HTTPOptions:genai.HTTPOptions{APIVersion:"v1"},})iferr!=nil{returnfmt.Errorf("failed to create genai client: %w",err)}modelName:="text-embedding-005"//Seethedocumentation:https://pkg.go.dev/google.golang.org/genai#Batches.Createjob,err:=client.Batches.Create(ctx,modelName,&genai.BatchJobSource{Format:"jsonl",//Sourcelink:https://storage.cloud.google.com/cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonlGCSURI:[]string{"gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl"},},&genai.CreateBatchJobConfig{Dest: &genai.BatchJobDestination{Format:"jsonl",GCSURI:outputURI,},},)iferr!=nil{returnfmt.Errorf("failed to create batch job: %w",err)}fmt.Fprintf(w,"Job name:%s\n",job.Name)fmt.Fprintf(w,"Job state:%s\n",job.State)//Exampleresponse://Jobname:projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/9876453210000000000//Jobstate:JOB_STATE_PENDING//Seethedocumentation:https://pkg.go.dev/google.golang.org/genai#BatchJobcompletedStates:=map[genai.JobState]bool{genai.JobStateSucceeded:true,genai.JobStateFailed:true,genai.JobStateCancelled:true,genai.JobStatePaused:true,}//Polluntiljobfinishesfor!completedStates[job.State]{time.Sleep(30*time.Second)job,err=client.Batches.Get(ctx,job.Name,nil)iferr!=nil{returnfmt.Errorf("failed to get batch job: %w",err)}fmt.Fprintf(w,"Job state:%s\n",job.State)ifjob.State==genai.JobStateFailed{fmt.Fprintf(w,"Error: %+v\n",job.Error)break}}//Exampleresponse://Jobstate:JOB_STATE_PENDING//Jobstate:JOB_STATE_RUNNING//Jobstate:JOB_STATE_RUNNING//...//Jobstate:JOB_STATE_SUCCEEDEDreturnnil}

Node.js

Install

npm install @google/genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values# with appropriate values for your project.exportGOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECTexportGOOGLE_CLOUD_LOCATION=us-central1exportGOOGLE_GENAI_USE_VERTEXAI=True

const{GoogleGenAI}=require('@google/genai');constGOOGLE_CLOUD_PROJECT=process.env.GOOGLE_CLOUD_PROJECT;constGOOGLE_CLOUD_LOCATION=process.env.GOOGLE_CLOUD_LOCATION||'us-central1';constOUTPUT_URI='gs://your-bucket/your-prefix';asyncfunctionrunBatchPredictionJob(outputUri=OUTPUT_URI,projectId=GOOGLE_CLOUD_PROJECT,location=GOOGLE_CLOUD_LOCATION){constclient=newGoogleGenAI({vertexai:true,project:projectId,location:location,httpOptions:{apiVersion:'v1',},});//Seethedocumentation:https://googleapis.github.io/js-genai/release_docs/classes/batches.Batches.htmlletjob=awaitclient.batches.create({model:'text-embedding-005',//Sourcelink:https://storage.cloud.google.com/cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonlsrc:'gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl',config:{dest:outputUri,},});console.log(`Jobname:${job.name}`);console.log(`Jobstate:${job.state}`);//Exampleresponse://Jobname:projects/%PROJECT_ID%/locations/us-central1/batchPredictionJobs/9876453210000000000//Jobstate:JOB_STATE_PENDINGconstcompletedStates=newSet(['JOB_STATE_SUCCEEDED','JOB_STATE_FAILED','JOB_STATE_CANCELLED','JOB_STATE_PAUSED',]);while(!completedStates.has(job.state)){awaitnewPromise(resolve=>setTimeout(resolve,30000));job=awaitclient.batches.get({name:job.name});console.log(`Jobstate:${job.state}`);}//Exampleresponse://Jobstate:JOB_STATE_PENDING//Jobstate:JOB_STATE_RUNNING//Jobstate:JOB_STATE_RUNNING//...//Jobstate:JOB_STATE_SUCCEEDEDreturnjob.state;}

Java

Learn how to install or update theJava.

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values# with appropriate values for your project.exportGOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECTexportGOOGLE_CLOUD_LOCATION=us-central1exportGOOGLE_GENAI_USE_VERTEXAI=True

importstaticcom.google.genai.types.JobState.Known.JOB_STATE_CANCELLED;importstaticcom.google.genai.types.JobState.Known.JOB_STATE_FAILED;importstaticcom.google.genai.types.JobState.Known.JOB_STATE_PAUSED;importstaticcom.google.genai.types.JobState.Known.JOB_STATE_SUCCEEDED;importcom.google.genai.Client;importcom.google.genai.types.BatchJob;importcom.google.genai.types.BatchJobDestination;importcom.google.genai.types.BatchJobSource;importcom.google.genai.types.CreateBatchJobConfig;importcom.google.genai.types.GetBatchJobConfig;importcom.google.genai.types.HttpOptions;importcom.google.genai.types.JobState;importjava.util.EnumSet;importjava.util.Set;importjava.util.concurrent.TimeUnit;publicclassBatchPredictionEmbeddingsWithGcs{publicstaticvoidmain(String[]args)throwsInterruptedException{//TODO(developer):Replacethesevariablesbeforerunningthesample.StringmodelId="text-embedding-005";StringoutputGcsUri="gs://your-bucket/your-prefix";createBatchJob(modelId,outputGcsUri);}//CreatesabatchpredictionjobwithembeddingmodelandGoogleCloudStorage.publicstaticJobStatecreateBatchJob(StringmodelId,StringoutputGcsUri)throwsInterruptedException{//ClientInitialization.Oncecreated,itcanbereusedformultiplerequests.try(Clientclient=Client.builder().location("us-central1").vertexAI(true).httpOptions(HttpOptions.builder().apiVersion("v1").build()).build()){//Seethedocumentation://https://googleapis.github.io/java-genai/javadoc/com/google/genai/Batches.htmlBatchJobSourcebatchJobSource=BatchJobSource.builder()//Sourcelink://https://storage.cloud.google.com/cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl.gcsUri("gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl").format("jsonl").build();CreateBatchJobConfigbatchJobConfig=CreateBatchJobConfig.builder().displayName("your-display-name").dest(BatchJobDestination.builder().gcsUri(outputGcsUri).format("jsonl").build()).build();BatchJobbatchJob=client.batches.create(modelId,batchJobSource,batchJobConfig);StringjobName=batchJob.name().orElseThrow(()->newIllegalStateException("Missing job name"));JobStatejobState=batchJob.state().orElseThrow(()->newIllegalStateException("Missing job state"));System.out.println("Job name: "+jobName);System.out.println("Job state: "+jobState);//Jobname:projects/.../locations/.../batchPredictionJobs/6205497615459549184//Jobstate:JOB_STATE_PENDING//Seethedocumentation://https://googleapis.github.io/java-genai/javadoc/com/google/genai/types/BatchJob.htmlSet<JobState.Known>completedStates=EnumSet.of(JOB_STATE_SUCCEEDED,JOB_STATE_FAILED,JOB_STATE_CANCELLED,JOB_STATE_PAUSED);while(!completedStates.contains(jobState.knownEnum())){TimeUnit.SECONDS.sleep(30);batchJob=client.batches.get(jobName,GetBatchJobConfig.builder().build());jobState=batchJob.state().orElseThrow(()->newIllegalStateException("Missing job state during polling"));System.out.println("Job state: "+jobState);}//Exampleresponse://Jobstate:JOB_STATE_QUEUED//Jobstate:JOB_STATE_RUNNING//...//Jobstate:JOB_STATE_SUCCEEDEDreturnjobState;}}}

Retrieve batch output

When a batch inference task is complete, the output is storedin the Cloud Storage bucket or BigQuery table that you specifiedin your request.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.