Generate and search multimodal embeddings
This tutorial shows how to generate multimodal embeddings for imagesand text using BigQuery and Vertex AI, and then use theseembeddings to perform a text-to-image semantic search.
This tutorial covers the following tasks:
- Creating aBigQuery object tableover image data in a Cloud Storage bucket.
- Exploring the image data by using aColab Enterprise notebook in BigQuery.
- Creating a BigQuery MLremote modelthat targets theVertex AI
multimodalembeddingfoundation model. - Using the remote model with the
AI.GENERATE_EMBEDDINGfunctionto generate embeddings from the images in the object table. - Correct any embedding generation errors.
- Optionally, creating avector index to indexthe image embeddings.
- Creating a text embedding for a given search string.
- Using the
VECTOR_SEARCHfunctionto perform a semantic search for image embeddings that are similar to thetext embedding. - Visualizing the results by using a notebook.
This tutorial uses the public domain art images fromThe Metropolitan Museum of Art that are availablein the public Cloud Storagegcs-public-data--met bucket.
Required roles
To run this tutorial, you need the following Identity and Access Management (IAM)roles:
- Create and use BigQuery datasets, connections, models, and notebooks:BigQuery Studio Admin (
roles/bigquery.studioAdmin). - Grant permissions to the connection's service account: Project IAM Admin(
roles/resourcemanager.projectIamAdmin).
These predefined roles contain the permissions required to perform the tasks inthis document. To see the exact permissions that are required, expand theRequired permissions section:
Required permissions
- Create a dataset:
bigquery.datasets.create - Create, delegate, and use a connection:
bigquery.connections.* - Set the default connection:
bigquery.config.* - Set service account permissions:
resourcemanager.projects.getIamPolicyandresourcemanager.projects.setIamPolicy - Create an object table:
bigquery.tables.createandbigquery.tables.update - Create a model and run inference:
bigquery.jobs.createbigquery.models.createbigquery.models.getDatabigquery.models.updateDatabigquery.models.updateMetadata
- Create and use notebooks:
resourcemanager.projects.getresourcemanager.projects.listbigquery.config.getbigquery.jobs.createbigquery.readsessions.createbigquery.readsessions.getDatabigquery.readsessions.updatedataform.locations.getdataform.locations.listdataform.repositories.create
Users who have thedataform.repositories.createpermission can execute code using the default Dataform service account and all permissions granted to that service account. For more information, seeSecurity considerations for Dataform permissions.dataform.repositories.listdataform.collections.createdataform.collections.listaiplatform.notebookRuntimeTemplates.applyaiplatform.notebookRuntimeTemplates.getaiplatform.notebookRuntimeTemplates.listaiplatform.notebookRuntimeTemplates.getIamPolicyaiplatform.notebookRuntimes.assignaiplatform.notebookRuntimes.getaiplatform.notebookRuntimes.listaiplatform.operations.listaiplatform.notebookRuntimeTemplates.apply
You might also be able to get these permissions withcustom roles or otherpredefined roles.
Costs
In this document, you use the following billable components of Google Cloud:
- BigQuery ML: You incur costs for the data that you process in BigQuery.
- Vertex AI: You incur costs for calls to the Vertex AI service that's represented by the remote model.
To generate a cost estimate based on your projected usage, use thepricing calculator.
For more information about BigQuery pricing, seeBigQuery pricing inthe BigQuery documentation.
For more information about Vertex AI pricing, see theVertex AI pricingpage.
Before you begin
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.
Create a dataset
Create a BigQuery dataset to store your ML model.
Console
In the Google Cloud console, go to theBigQuery page.
In theExplorer pane, click your project name.
ClickView actions > Create dataset
On theCreate dataset page, do the following:
ForDataset ID, enter
bqml_tutorial.ForLocation type, selectMulti-region, and then selectUS (multiple regions in United States).
Leave the remaining default settings as they are, and clickCreate dataset.
bq
To create a new dataset, use thebq mk commandwith the--location flag. For a full list of possible parameters, see thebq mk --dataset commandreference.
Create a dataset named
bqml_tutorialwith the data location set toUSand a description ofBigQuery ML tutorial dataset:bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Instead of using the
--datasetflag, the command uses the-dshortcut.If you omit-dand--dataset, the command defaults to creating adataset.Confirm that the dataset was created:
bqls
API
Call thedatasets.insertmethod with a defineddataset resource.
{"datasetReference":{"datasetId":"bqml_tutorial"}}
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.
importgoogle.cloud.bigquerybqclient=google.cloud.bigquery.Client()bqclient.create_dataset("bqml_tutorial",exists_ok=True)Create the object table
Create an object table over the art images in the public Cloud Storagegcs-public-data--met bucket.The object table makes it possible to analyze the images without moving themfrom Cloud Storage.
In the Google Cloud console, go to theBigQuery page.
In the query editor, run the following query:
CREATEORREPLACEEXTERNALTABLE`bqml_tutorial.met_images`WITHCONNECTIONDEFAULTOPTIONS(object_metadata='SIMPLE',uris=['gs://gcs-public-data--met/*']);
Explore the image data
Create aColab Enterprise notebook inBigQuery to explore the image data.
In the Google Cloud console, go to theBigQuery page.
Set up the notebook:
- Add a code cell to the notebook.
Copy and paste the following code into the code cell:
#@title Set up credentialsfromgoogle.colabimportauthauth.authenticate_user()print('Authenticated')PROJECT_ID='PROJECT_ID'fromgoogle.cloudimportbigqueryclient=bigquery.Client(PROJECT_ID)Replace
PROJECT_IDwith the name of the projectthat you are using for this tutorial.Run the code cell.
Enable table display:
- Add a code cell to the notebook.
Copy and paste the following code into the code cell:
#@title Enable data table display%load_extgoogle.colab.data_tableRun the code cell.
Create a function to display the images:
- Add a code cell to the notebook.
Copy and paste the following code into the code cell:
#@title Util function to display imagesimportiofromPILimportImageimportmatplotlib.pyplotaspltimporttensorflowastfdefprintImages(results):image_results_list=list(results)amt_of_images=len(image_results_list)fig,axes=plt.subplots(nrows=amt_of_images,ncols=2,figsize=(20,20))fig.tight_layout()fig.subplots_adjust(hspace=0.5)foriinrange(amt_of_images):gcs_uri=image_results_list[i][0]text=image_results_list[i][1]f=tf.io.gfile.GFile(gcs_uri,'rb')stream=io.BytesIO(f.read())img=Image.open(stream)axes[i,0].axis('off')axes[i,0].imshow(img)axes[i,1].axis('off')axes[i,1].text(0,0,text,fontsize=10)plt.show()Run the code cell.
Display the images:
- Add a code cell to the notebook.
Copy and paste the following code into the code cell:
#@title Display Met imagesinspect_obj_table_query="""SELECT uri, content_typeFROM bqml_tutorial.met_imagesWHERE content_type = 'image/jpeg'Order by uriLIMIT 10;"""printImages(client.query(inspect_obj_table_query))Run the code cell.
The results should look similar to the following:

Save the notebook as
met-image-analysis.
Create the remote model
Create a remote model that represents a hosted Vertex AImultimodal embedding model:
In the Google Cloud console, go to theBigQuery page.
In the query editor, run the following query:
CREATEORREPLACEMODEL`bqml_tutorial.multimodal_embedding_model`REMOTEWITHCONNECTIONDEFAULTOPTIONS(ENDPOINT='multimodalembedding@001');
The query takes several seconds to complete, after which you can access the
multimodal_embedding_modelmodel that appears in thebqml_tutorialdataset. Because the query uses aCREATE MODELstatement to create a model, there are no query results.
Generate image embeddings
Generate embeddings from the images in the object table by using theAI.GENERATE_EMBEDDING function,and then write them to a table foruse in a following step. Embedding generation is an expensive operation, so thequery uses a subquery including theLIMIT clause to limit embedding generation to 10,000 imagesinstead of embedding the full dataset of 601,294 images. This also helps keepthe number of images under the 25,000 limit for theAI.GENERATE_EMBEDDINGfunction. This query takes approximately 40 minutes to run.
In the Google Cloud console, go to theBigQuery page.
In the query editor, run the following query:
CREATEORREPLACETABLE`bqml_tutorial.met_image_embeddings`ASSELECT*FROMAI.GENERATE_EMBEDDING(MODEL`bqml_tutorial.multimodal_embedding_model`,(SELECT*FROM`bqml_tutorial.met_images`WHEREcontent_type='image/jpeg'LIMIT10000))
Correct any embedding generation errors
Check for and correct any embedding generation errors. Embedding generationcan fail because ofGenerative AI on Vertex AI quotasor service unavailability.
TheAI.GENERATE_EMBEDDING function returns error details in thestatus column. This column is empty if embeddinggeneration was successful, or contains an error message if embeddinggeneration failed.
In the Google Cloud console, go to theBigQuery page.
In the query editor, run the following query to see if there were anyembedding generation failures:
SELECTDISTINCT(status),COUNT(uri)ASnum_rowsFROMbqml_tutorial.met_image_embeddingsGROUPBY1;
If rows with errors are returned, drop any rows where embedding generationfailed:
DELETEFROM`bqml_tutorial.met_image_embeddings`WHEREstatus='A retryable error occurred: RESOURCE_EXHAUSTED error from remote service/endpoint.';
Create a vector index
You can optionally use theCREATE VECTOR INDEX statementto create themet_images_index vector index on theembedding column of themet_images_embeddings table.A vector index lets you perform a vector search more quickly, with thetrade-off of reducing recall and so returning more approximate results.
In the Google Cloud console, go to theBigQuery page.
In the query editor, run the following query:
CREATEORREPLACEVECTORINDEX`met_images_index`ONbqml_tutorial.met_image_embeddings(embedding)OPTIONS(index_type='IVF',distance_type='COSINE');
The vector index is created asynchronously. To check if the vector indexhas been created, query the
INFORMATION_SCHEMA.VECTOR_INDEXESviewand confirm that thecoverage_percentagevalue is greater than0, and thelast_refresh_timevalue isn'tNULL:SELECTtable_name,index_name,index_status,coverage_percentage,last_refresh_time,disable_reasonFROMbqml_tutorial.INFORMATION_SCHEMA.VECTOR_INDEXESWHEREindex_name='met_images_index';
Generate an embedding for the search text
To search images that correspond to a specified text search string, you mustfirst create a text embedding for that string. Use the same remote model tocreate the text embedding that you used to create the image embeddings,and then write the text embedding to a table for use in a following step. Thesearch string ispictures of white or cream colored dress from victorian era.
In the Google Cloud console, go to theBigQuery page.
In the query editor, run the following query:
CREATEORREPLACETABLE`bqml_tutorial.search_embedding`ASSELECT*FROMAI.GENERATE_EMBEDDING(MODEL`bqml_tutorial.multimodal_embedding_model`,(SELECT'pictures of white or cream colored dress from victorian era'AScontent));
Perform a text-to-image semantic search
Use theVECTOR_SEARCH functionto perform a semantic search for images that best correspond to the searchstring represented by the text embedding.
In the Google Cloud console, go to theBigQuery page.
In the query editor, run the following query to perform a semanticsearch and write the results to a table:
CREATEORREPLACETABLE`bqml_tutorial.vector_search_results`ASSELECTbase.uriASgcs_uri,distanceFROMVECTOR_SEARCH(TABLE`bqml_tutorial.met_image_embeddings`,'embedding',TABLE`bqml_tutorial.search_embedding`,'embedding',top_k=>3);
Visualize the semantic search results
Visualize the semantic search results by using a notebook.
In the Google Cloud console, go to theBigQuery page.
Open the
met-image-analysisnotebook that you created earlier.Visualize the vector search results:
- Add a code cell to the notebook.
Copy and paste the following code into the code cell:
query=""" SELECT * FROM `bqml_tutorial.vector_search_results` ORDER BY distance;"""printImages(client.query(query))Run the code cell.
The results should look similar to the following:

Clean up
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.