Get batch text embeddings inferences Stay organized with collections Save and categorize content based on your preferences.
Getting responses in a batch is a way to efficiently send large numbers of non-latencysensitive embeddings requests. Different from getting online responses,where you are limited to one input request at a time, you can send a large numberof LLM requests in a single batch request. Similar to how batch inference is donefortabular data in Vertex AI,you determine your output location, add your input, and your responses asynchronouslypopulate into your output location.
Text embeddings models that support batch inferences
All stable versions of text embedding models support batch inferences with theexception of Gemini embeddings (gemini-embedding-001). Stable versionsare fully supported for production environments. To view the full list ofembedding models, seeEmbedding model and versions.
Prepare your inputs
The input for batch requests are a list of prompts that can either be stored ina BigQuery table or as aJSON Lines (JSONL) file inCloud Storage. Each request can include up to 30,000 prompts.
JSONL example
This section shows examples of how to format JSONL input and output.
JSONL input example
{"content":"Give a short description of a machine learning model:"}{"content":"Best recipe for banana bread:"}JSONL output example
{"instance":{"content":"Give..."},"predictions":[{"embeddings":{"statistics":{"token_count":8,"truncated":false},"values":[0.2,....]}}],"status":""}{"instance":{"content":"Best..."},"predictions":[{"embeddings":{"statistics":{"token_count":3,"truncated":false},"values":[0.1,....]}}],"status":""}BigQuery example
This section shows examples of how to format BigQuery input and output.
BigQuery input example
This example shows a single column BigQuery table.
| content |
|---|
| "Give a short description of a machine learning model:" |
| "Best recipe for banana bread:" |
BigQuery output example
| content | predictions | status |
|---|---|---|
| "Give a short description of a machine learning model:" | '[{"embeddings":{"statistics":{"token_count":8,"truncated":false},"Values":[0.1,....]}}]' | |
| "Best recipe for banana bread:" | '[{"embeddings":{"statistics":{"token_count":3,"truncated":false},"Values":[0.2,....]}}]' |
Request a batch response
Depending on the number of input items that you've submitted, abatch generation task can take some time to complete.
REST
To test a text prompt by using the Vertex AI API, send a POST request to thepublisher model endpoint.
Before using any of the request data, make the following replacements:
- PROJECT_ID: The ID of your Google Cloud project.
- BP_JOB_NAME: The job name.
- INPUT_URI: The input source URI. This is either a BigQuery table URI or a JSONL file URI in Cloud Storage.
- OUTPUT_URI: Output target URI.
HTTP method and URL:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs
Request JSON body:
{ "name": "BP_JOB_NAME", "displayName": "BP_JOB_NAME", "model": "publishers/google/models/textembedding-gecko", "inputConfig": { "instancesFormat":"bigquery", "bigquerySource":{ "inputUri" : "INPUT_URI" } }, "outputConfig": { "predictionsFormat":"bigquery", "bigqueryDestination":{ "outputUri": "OUTPUT_URI" } }}To send your request, choose one of these options:
curl
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs"
PowerShell
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/123456789012/locations/us-central1/batchPredictionJobs/1234567890123456789", "displayName": "BP_sample_publisher_BQ_20230712_134650", "model": "projects/{PROJECT_ID}/locations/us-central1/models/textembedding-gecko", "inputConfig": { "instancesFormat": "bigquery", "bigquerySource": { "inputUri": "bq://project_name.dataset_name.text_input" } }, "modelParameters": {}, "outputConfig": { "predictionsFormat": "bigquery", "bigqueryDestination": { "outputUri": "bq://project_name.llm_dataset.embedding_out_BP_sample_publisher_BQ_20230712_134650" } }, "state": "JOB_STATE_PENDING", "createTime": "2023-07-12T20:46:52.148717Z", "updateTime": "2023-07-12T20:46:52.148717Z", "labels": { "owner": "sample_owner", "product": "llm" }, "modelVersionId": "1", "modelMonitoringStatus": {}}The response includes a unique identifier for the batch job.You can poll for the status of the batch job usingtheBATCH_JOB_ID until the jobstate isJOB_STATE_SUCCEEDED. For example:
curl\-XGET\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/batchPredictionJobs/BATCH_JOB_ID
Python
Install
pip install --upgrade google-genai
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values# with appropriate values for your project.exportGOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECTexportGOOGLE_CLOUD_LOCATION=us-central1exportGOOGLE_GENAI_USE_VERTEXAI=True
importtimefromgoogleimportgenaifromgoogle.genai.typesimportCreateBatchJobConfig,JobState,HttpOptionsclient=genai.Client(http_options=HttpOptions(api_version="v1"))# TODO(developer): Update and un-comment below line# output_uri = "gs://your-bucket/your-prefix"# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.batches.Batches.createjob=client.batches.create(model="text-embedding-005",# Source link: https://storage.cloud.google.com/cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonlsrc="gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl",config=CreateBatchJobConfig(dest=output_uri),)print(f"Job name:{job.name}")print(f"Job state:{job.state}")# Example response:# Job name: projects/.../locations/.../batchPredictionJobs/9876453210000000000# Job state: JOB_STATE_PENDING# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.types.BatchJobcompleted_states={JobState.JOB_STATE_SUCCEEDED,JobState.JOB_STATE_FAILED,JobState.JOB_STATE_CANCELLED,JobState.JOB_STATE_PAUSED,}whilejob.statenotincompleted_states:time.sleep(30)job=client.batches.get(name=job.name)print(f"Job state:{job.state}")ifjob.state==JobState.JOB_STATE_FAILED:print(f"Error:{job.error}")break# Example response:# Job state: JOB_STATE_PENDING# Job state: JOB_STATE_RUNNING# Job state: JOB_STATE_RUNNING# ...# Job state: JOB_STATE_SUCCEEDEDGo
Learn how to install or update theGo.
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values# with appropriate values for your project.exportGOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECTexportGOOGLE_CLOUD_LOCATION=us-central1exportGOOGLE_GENAI_USE_VERTEXAI=True
import("context""fmt""io""time""google.golang.org/genai")//generateBatchEmbeddingsshowshowtorunabatchembeddingspredictionjob.funcgenerateBatchEmbeddings(wio.Writer,outputURIstring)error{//outputURI="gs://your-bucket/your-prefix"ctx:=context.Background()client,err:=genai.NewClient(ctx, &genai.ClientConfig{HTTPOptions:genai.HTTPOptions{APIVersion:"v1"},})iferr!=nil{returnfmt.Errorf("failed to create genai client: %w",err)}modelName:="text-embedding-005"//Seethedocumentation:https://pkg.go.dev/google.golang.org/genai#Batches.Createjob,err:=client.Batches.Create(ctx,modelName,&genai.BatchJobSource{Format:"jsonl",//Sourcelink:https://storage.cloud.google.com/cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonlGCSURI:[]string{"gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl"},},&genai.CreateBatchJobConfig{Dest: &genai.BatchJobDestination{Format:"jsonl",GCSURI:outputURI,},},)iferr!=nil{returnfmt.Errorf("failed to create batch job: %w",err)}fmt.Fprintf(w,"Job name:%s\n",job.Name)fmt.Fprintf(w,"Job state:%s\n",job.State)//Exampleresponse://Jobname:projects/{PROJECT_ID}/locations/us-central1/batchPredictionJobs/9876453210000000000//Jobstate:JOB_STATE_PENDING//Seethedocumentation:https://pkg.go.dev/google.golang.org/genai#BatchJobcompletedStates:=map[genai.JobState]bool{genai.JobStateSucceeded:true,genai.JobStateFailed:true,genai.JobStateCancelled:true,genai.JobStatePaused:true,}//Polluntiljobfinishesfor!completedStates[job.State]{time.Sleep(30*time.Second)job,err=client.Batches.Get(ctx,job.Name,nil)iferr!=nil{returnfmt.Errorf("failed to get batch job: %w",err)}fmt.Fprintf(w,"Job state:%s\n",job.State)ifjob.State==genai.JobStateFailed{fmt.Fprintf(w,"Error: %+v\n",job.Error)break}}//Exampleresponse://Jobstate:JOB_STATE_PENDING//Jobstate:JOB_STATE_RUNNING//Jobstate:JOB_STATE_RUNNING//...//Jobstate:JOB_STATE_SUCCEEDEDreturnnil}Node.js
Install
npm install @google/genai
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values# with appropriate values for your project.exportGOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECTexportGOOGLE_CLOUD_LOCATION=us-central1exportGOOGLE_GENAI_USE_VERTEXAI=True
const{GoogleGenAI}=require('@google/genai');constGOOGLE_CLOUD_PROJECT=process.env.GOOGLE_CLOUD_PROJECT;constGOOGLE_CLOUD_LOCATION=process.env.GOOGLE_CLOUD_LOCATION||'us-central1';constOUTPUT_URI='gs://your-bucket/your-prefix';asyncfunctionrunBatchPredictionJob(outputUri=OUTPUT_URI,projectId=GOOGLE_CLOUD_PROJECT,location=GOOGLE_CLOUD_LOCATION){constclient=newGoogleGenAI({vertexai:true,project:projectId,location:location,httpOptions:{apiVersion:'v1',},});//Seethedocumentation:https://googleapis.github.io/js-genai/release_docs/classes/batches.Batches.htmlletjob=awaitclient.batches.create({model:'text-embedding-005',//Sourcelink:https://storage.cloud.google.com/cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonlsrc:'gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl',config:{dest:outputUri,},});console.log(`Jobname:${job.name}`);console.log(`Jobstate:${job.state}`);//Exampleresponse://Jobname:projects/%PROJECT_ID%/locations/us-central1/batchPredictionJobs/9876453210000000000//Jobstate:JOB_STATE_PENDINGconstcompletedStates=newSet(['JOB_STATE_SUCCEEDED','JOB_STATE_FAILED','JOB_STATE_CANCELLED','JOB_STATE_PAUSED',]);while(!completedStates.has(job.state)){awaitnewPromise(resolve=>setTimeout(resolve,30000));job=awaitclient.batches.get({name:job.name});console.log(`Jobstate:${job.state}`);}//Exampleresponse://Jobstate:JOB_STATE_PENDING//Jobstate:JOB_STATE_RUNNING//Jobstate:JOB_STATE_RUNNING//...//Jobstate:JOB_STATE_SUCCEEDEDreturnjob.state;}Java
Learn how to install or update theJava.
To learn more, see the SDK reference documentation.
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values# with appropriate values for your project.exportGOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECTexportGOOGLE_CLOUD_LOCATION=us-central1exportGOOGLE_GENAI_USE_VERTEXAI=True
importstaticcom.google.genai.types.JobState.Known.JOB_STATE_CANCELLED;importstaticcom.google.genai.types.JobState.Known.JOB_STATE_FAILED;importstaticcom.google.genai.types.JobState.Known.JOB_STATE_PAUSED;importstaticcom.google.genai.types.JobState.Known.JOB_STATE_SUCCEEDED;importcom.google.genai.Client;importcom.google.genai.types.BatchJob;importcom.google.genai.types.BatchJobDestination;importcom.google.genai.types.BatchJobSource;importcom.google.genai.types.CreateBatchJobConfig;importcom.google.genai.types.GetBatchJobConfig;importcom.google.genai.types.HttpOptions;importcom.google.genai.types.JobState;importjava.util.EnumSet;importjava.util.Set;importjava.util.concurrent.TimeUnit;publicclassBatchPredictionEmbeddingsWithGcs{publicstaticvoidmain(String[]args)throwsInterruptedException{//TODO(developer):Replacethesevariablesbeforerunningthesample.StringmodelId="text-embedding-005";StringoutputGcsUri="gs://your-bucket/your-prefix";createBatchJob(modelId,outputGcsUri);}//CreatesabatchpredictionjobwithembeddingmodelandGoogleCloudStorage.publicstaticJobStatecreateBatchJob(StringmodelId,StringoutputGcsUri)throwsInterruptedException{//ClientInitialization.Oncecreated,itcanbereusedformultiplerequests.try(Clientclient=Client.builder().location("us-central1").vertexAI(true).httpOptions(HttpOptions.builder().apiVersion("v1").build()).build()){//Seethedocumentation://https://googleapis.github.io/java-genai/javadoc/com/google/genai/Batches.htmlBatchJobSourcebatchJobSource=BatchJobSource.builder()//Sourcelink://https://storage.cloud.google.com/cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl.gcsUri("gs://cloud-samples-data/generative-ai/embeddings/embeddings_input.jsonl").format("jsonl").build();CreateBatchJobConfigbatchJobConfig=CreateBatchJobConfig.builder().displayName("your-display-name").dest(BatchJobDestination.builder().gcsUri(outputGcsUri).format("jsonl").build()).build();BatchJobbatchJob=client.batches.create(modelId,batchJobSource,batchJobConfig);StringjobName=batchJob.name().orElseThrow(()->newIllegalStateException("Missing job name"));JobStatejobState=batchJob.state().orElseThrow(()->newIllegalStateException("Missing job state"));System.out.println("Job name: "+jobName);System.out.println("Job state: "+jobState);//Jobname:projects/.../locations/.../batchPredictionJobs/6205497615459549184//Jobstate:JOB_STATE_PENDING//Seethedocumentation://https://googleapis.github.io/java-genai/javadoc/com/google/genai/types/BatchJob.htmlSet<JobState.Known>completedStates=EnumSet.of(JOB_STATE_SUCCEEDED,JOB_STATE_FAILED,JOB_STATE_CANCELLED,JOB_STATE_PAUSED);while(!completedStates.contains(jobState.knownEnum())){TimeUnit.SECONDS.sleep(30);batchJob=client.batches.get(jobName,GetBatchJobConfig.builder().build());jobState=batchJob.state().orElseThrow(()->newIllegalStateException("Missing job state during polling"));System.out.println("Job state: "+jobState);}//Exampleresponse://Jobstate:JOB_STATE_QUEUED//Jobstate:JOB_STATE_RUNNING//...//Jobstate:JOB_STATE_SUCCEEDEDreturnjobState;}}}Retrieve batch output
When a batch inference task is complete, the output is storedin the Cloud Storage bucket or BigQuery table that you specifiedin your request.
What's next
- Learn how toget text embeddings.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.