Generate and search multimodal embeddings

This tutorial shows how to generate multimodal embeddings for imagesand text using BigQuery and Vertex AI, and then use theseembeddings to perform a text-to-image semantic search.

This tutorial covers the following tasks:

This tutorial uses the public domain art images fromThe Metropolitan Museum of Art that are availablein the public Cloud Storagegcs-public-data--met bucket.

Required roles

To run this tutorial, you need the following Identity and Access Management (IAM)roles:

  • Create and use BigQuery datasets, connections, models, and notebooks:BigQuery Studio Admin (roles/bigquery.studioAdmin).
  • Grant permissions to the connection's service account: Project IAM Admin(roles/resourcemanager.projectIamAdmin).

These predefined roles contain the permissions required to perform the tasks inthis document. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

  • Create a dataset:bigquery.datasets.create
  • Create, delegate, and use a connection:bigquery.connections.*
  • Set the default connection:bigquery.config.*
  • Set service account permissions:resourcemanager.projects.getIamPolicy andresourcemanager.projects.setIamPolicy
  • Create an object table:bigquery.tables.create andbigquery.tables.update
  • Create a model and run inference:
    • bigquery.jobs.create
    • bigquery.models.create
    • bigquery.models.getData
    • bigquery.models.updateData
    • bigquery.models.updateMetadata
  • Create and use notebooks:

You might also be able to get these permissions withcustom roles or otherpredefined roles.

Costs

In this document, you use the following billable components of Google Cloud:

  • BigQuery ML: You incur costs for the data that you process in BigQuery.
  • Vertex AI: You incur costs for calls to the Vertex AI service that's represented by the remote model.

To generate a cost estimate based on your projected usage, use thepricing calculator.

New Google Cloud users might be eligible for afree trial.

For more information about BigQuery pricing, seeBigQuery pricing inthe BigQuery documentation.

For more information about Vertex AI pricing, see theVertex AI pricingpage.

Before you begin

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  2. Verify that billing is enabled for your Google Cloud project.

  3. Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the APIs

Create a dataset

Create a BigQuery dataset to store your ML model.

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to the BigQuery page

  2. In theExplorer pane, click your project name.

  3. ClickView actions > Create dataset

  4. On theCreate dataset page, do the following:

    • ForDataset ID, enterbqml_tutorial.

    • ForLocation type, selectMulti-region, and then selectUS (multiple regions in United States).

    • Leave the remaining default settings as they are, and clickCreate dataset.

bq

To create a new dataset, use thebq mk commandwith the--location flag. For a full list of possible parameters, see thebq mk --dataset commandreference.

  1. Create a dataset namedbqml_tutorial with the data location set toUSand a description ofBigQuery ML tutorial dataset:

    bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial

    Instead of using the--dataset flag, the command uses the-d shortcut.If you omit-d and--dataset, the command defaults to creating adataset.

  2. Confirm that the dataset was created:

    bqls

API

Call thedatasets.insertmethod with a defineddataset resource.

{"datasetReference":{"datasetId":"bqml_tutorial"}}

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

importgoogle.cloud.bigquerybqclient=google.cloud.bigquery.Client()bqclient.create_dataset("bqml_tutorial",exists_ok=True)

Create the object table

Create an object table over the art images in the public Cloud Storagegcs-public-data--met bucket.The object table makes it possible to analyze the images without moving themfrom Cloud Storage.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATEORREPLACEEXTERNALTABLE`bqml_tutorial.met_images`WITHCONNECTIONDEFAULTOPTIONS(object_metadata='SIMPLE',uris=['gs://gcs-public-data--met/*']);

Explore the image data

Create aColab Enterprise notebook inBigQuery to explore the image data.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. Create a notebook by using the BigQuery editor.

  3. Connect the notebook to the default runtime.

  4. Set up the notebook:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      #@title Set up credentialsfromgoogle.colabimportauthauth.authenticate_user()print('Authenticated')PROJECT_ID='PROJECT_ID'fromgoogle.cloudimportbigqueryclient=bigquery.Client(PROJECT_ID)

      ReplacePROJECT_ID with the name of the projectthat you are using for this tutorial.

    3. Run the code cell.

  5. Enable table display:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      #@title Enable data table display%load_extgoogle.colab.data_table
    3. Run the code cell.

  6. Create a function to display the images:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      #@title Util function to display imagesimportiofromPILimportImageimportmatplotlib.pyplotaspltimporttensorflowastfdefprintImages(results):image_results_list=list(results)amt_of_images=len(image_results_list)fig,axes=plt.subplots(nrows=amt_of_images,ncols=2,figsize=(20,20))fig.tight_layout()fig.subplots_adjust(hspace=0.5)foriinrange(amt_of_images):gcs_uri=image_results_list[i][0]text=image_results_list[i][1]f=tf.io.gfile.GFile(gcs_uri,'rb')stream=io.BytesIO(f.read())img=Image.open(stream)axes[i,0].axis('off')axes[i,0].imshow(img)axes[i,1].axis('off')axes[i,1].text(0,0,text,fontsize=10)plt.show()
    3. Run the code cell.

  7. Display the images:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      #@title Display Met imagesinspect_obj_table_query="""SELECT uri, content_typeFROM bqml_tutorial.met_imagesWHERE content_type = 'image/jpeg'Order by uriLIMIT 10;"""printImages(client.query(inspect_obj_table_query))
    3. Run the code cell.

      The results should look similar to the following:

      Images showing objects from the Metropolitan Museum of Art.

  8. Save the notebook asmet-image-analysis.

Create the remote model

Create a remote model that represents a hosted Vertex AImultimodal embedding model:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATEORREPLACEMODEL`bqml_tutorial.multimodal_embedding_model`REMOTEWITHCONNECTIONDEFAULTOPTIONS(ENDPOINT='multimodalembedding@001');

    The query takes several seconds to complete, after which you can access themultimodal_embedding_model model that appears in thebqml_tutorialdataset. Because the query uses aCREATE MODEL statement to create a model, there are no query results.

Generate image embeddings

Generate embeddings from the images in the object table by using theAI.GENERATE_EMBEDDING function,and then write them to a table foruse in a following step. Embedding generation is an expensive operation, so thequery uses a subquery including theLIMIT clause to limit embedding generation to 10,000 imagesinstead of embedding the full dataset of 601,294 images. This also helps keepthe number of images under the 25,000 limit for theAI.GENERATE_EMBEDDINGfunction. This query takes approximately 40 minutes to run.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATEORREPLACETABLE`bqml_tutorial.met_image_embeddings`ASSELECT*FROMAI.GENERATE_EMBEDDING(MODEL`bqml_tutorial.multimodal_embedding_model`,(SELECT*FROM`bqml_tutorial.met_images`WHEREcontent_type='image/jpeg'LIMIT10000))

Correct any embedding generation errors

Check for and correct any embedding generation errors. Embedding generationcan fail because ofGenerative AI on Vertex AI quotasor service unavailability.

TheAI.GENERATE_EMBEDDING function returns error details in thestatus column. This column is empty if embeddinggeneration was successful, or contains an error message if embeddinggeneration failed.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query to see if there were anyembedding generation failures:

    SELECTDISTINCT(status),COUNT(uri)ASnum_rowsFROMbqml_tutorial.met_image_embeddingsGROUPBY1;
  3. If rows with errors are returned, drop any rows where embedding generationfailed:

    DELETEFROM`bqml_tutorial.met_image_embeddings`WHEREstatus='A retryable error occurred: RESOURCE_EXHAUSTED error from remote service/endpoint.';

Create a vector index

You can optionally use theCREATE VECTOR INDEX statementto create themet_images_index vector index on theembedding column of themet_images_embeddings table.A vector index lets you perform a vector search more quickly, with thetrade-off of reducing recall and so returning more approximate results.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATEORREPLACEVECTORINDEX`met_images_index`ONbqml_tutorial.met_image_embeddings(embedding)OPTIONS(index_type='IVF',distance_type='COSINE');
  3. The vector index is created asynchronously. To check if the vector indexhas been created, query theINFORMATION_SCHEMA.VECTOR_INDEXES viewand confirm that thecoverage_percentage value is greater than0, and thelast_refresh_time value isn'tNULL:

    SELECTtable_name,index_name,index_status,coverage_percentage,last_refresh_time,disable_reasonFROMbqml_tutorial.INFORMATION_SCHEMA.VECTOR_INDEXESWHEREindex_name='met_images_index';

Generate an embedding for the search text

To search images that correspond to a specified text search string, you mustfirst create a text embedding for that string. Use the same remote model tocreate the text embedding that you used to create the image embeddings,and then write the text embedding to a table for use in a following step. Thesearch string ispictures of white or cream colored dress from victorian era.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query:

    CREATEORREPLACETABLE`bqml_tutorial.search_embedding`ASSELECT*FROMAI.GENERATE_EMBEDDING(MODEL`bqml_tutorial.multimodal_embedding_model`,(SELECT'pictures of white or cream colored dress from victorian era'AScontent));

Perform a text-to-image semantic search

Use theVECTOR_SEARCH functionto perform a semantic search for images that best correspond to the searchstring represented by the text embedding.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, run the following query to perform a semanticsearch and write the results to a table:

    CREATEORREPLACETABLE`bqml_tutorial.vector_search_results`ASSELECTbase.uriASgcs_uri,distanceFROMVECTOR_SEARCH(TABLE`bqml_tutorial.met_image_embeddings`,'embedding',TABLE`bqml_tutorial.search_embedding`,'embedding',top_k=>3);

Visualize the semantic search results

Visualize the semantic search results by using a notebook.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. Open themet-image-analysis notebook that you created earlier.

  3. Visualize the vector search results:

    1. Add a code cell to the notebook.
    2. Copy and paste the following code into the code cell:

      query="""  SELECT * FROM `bqml_tutorial.vector_search_results`  ORDER BY distance;"""printImages(client.query(query))
    3. Run the code cell.

      The results should look similar to the following:

      Returned images from a multimodal vector search query.

Clean up

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.