Analyze multimodal data with SQL and Python UDFs
Preview
This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Note: To provide feedback or request support for this feature, send an email tobq-objectref-feedback@google.com.This tutorial shows you how toanalyze multimodal data by using SQLqueries andPython user-defined functions (UDFs).
This tutorial uses the product catalog from the public Cymbal pet store dataset.
Objectives
- Use
ObjectRefvalues to store image data alongside structured data ina BigQuerystandard table. - Generate text based on image data from a standard table by using the
AI.GENERATE_TABLEfunction. - Transform existing images to create new images by using aPython UDF.
- Chunk PDFs for further analysis by using a Python UDF.
- Use a Gemini model and the
AI.GENERATE_TEXTfunction to analyze thechunked PDF data. - Generate embeddings based on image data from a standard table by using the
AI.GENERATE_EMBEDDINGfunction. - Process ordered multimodal data using arrays of
ObjectRefvalues.
Costs
In this document, you use the following billable components of Google Cloud:
- BigQuery: you incur costs for the data that you process in BigQuery.
- BigQuery Python UDFs: you incur costs for using Python UDFs.
- Cloud Storage: you incur costs for the objects stored in Cloud Storage.
- Vertex AI: you incur costs for calls to Vertex AI models.
To generate a cost estimate based on your projected usage, use thepricing calculator.
For more information about, see the following pricing pages:
Before you begin
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, BigQuery Connection, Cloud Storage, and Vertex AI APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.
Required roles
To get the permissions that you need to complete this tutorial, ask your administrator to grant you the following IAM roles:
- Create a connection:BigQuery Connection Admin (
roles/bigquery.connectionAdmin) - Grant permissions to the connection's service account:Project IAM Admin (
roles/resourcemanager.projectIamAdmin) - Create a Cloud Storage bucket:Storage Admin (
roles/storage.admin) - Create datasets, models, UDFs, and tables, and run BigQuery jobs:BigQuery Admin (
roles/bigquery.admin) - Create URLs that let you read and modify Cloud Storage objects:BigQuery ObjectRef Admin (
roles/bigquery.objectRefAdmin)
For more information about granting roles, seeManage access to projects, folders, and organizations.
You might also be able to get the required permissions throughcustom roles or otherpredefined roles.
Set up
In this section, you create the dataset, connection, tables, and models usedin this tutorial.
Create a dataset
Create a BigQuery dataset to contain the objects you createin this tutorial:
In the Google Cloud console, go to theBigQuery page.
In the left pane, clickExplorer:

If you don't see the left pane, clickExpand left pane to open the pane.
In theExplorer pane, select your project.
ClickView actions, and then clickCreate dataset. TheCreate datasetpane opens.
ForDataset ID, type
cymbal_pets.ClickCreate dataset.
Create a bucket
Create a Cloud Storage bucket for storing transformed objects:
Go to theBuckets page.
ClickCreate.
On theCreate a bucket page, in theGet started section, enter aglobally unique name that meets thebucket name requirements.
ClickCreate.
Create a connection
Create aCloud resource connectionand get the connection's service account. BigQuery uses theconnection to access objects in Cloud Storage:
Go to theBigQuery page.
In the left pane, clickExplorer:

In theExplorer pane, clickAdd data.
TheAdd data dialog opens.
In theFilter By pane, in theData Source Type section, selectBusiness Applications.
Alternatively, in theSearch for data sources field, you can enter
Vertex AI.In theFeatured data sources section, clickVertex AI.
Click theVertex AI Models: BigQuery Federation solution card.
In theConnection type list, selectVertex AI remote models, remote functions, BigLake and Spanner (Cloud Resource).
In theConnection ID field, type
cymbal_conn.ClickCreate connection.
ClickGo to connection.
In theConnection info pane, copy the service account ID for use in afollowing step.
Grant permissions to the connection's service account
Grant the connection's service account the appropriate roles to access otherservices. You must grant these roles in the same project you created orselected in theBefore you begin section. Granting theroles in a different project results in the errorbqcx-1234567890-xxxx@gcp-sa-bigquery-condel.iam.gserviceaccount.comdoes not have the permission to access resource.
Grant permissions on the Cloud Storage bucket
Give the service account access to use objects in the bucket you created:
Go to theBuckets page.
Click the name of the bucket you created.
ClickPermissions.
ClickGrant access.TheGrant access dialog opens.
In theNew principals field, enter the service account ID that youcopied earlier.
In theSelect a role field, chooseCloud Storage, and thenselectStorage Object User.
ClickSave.
Grant permissions on to use Vertex AI models
Give the service account access to use Vertex AI models:
Go to theIAM & Admin page.
ClickGrant access.TheGrant access dialog opens.
In theNew principals field, enter the service account ID that youcopied earlier.
In theSelect a role field, selectVertex AI, and then selectVertex AI User.
ClickSave.
Create the tables of example data
Create tables to store the Cymbal pets product information.
Create theproducts table
Create a standard table that contains the Cymbal pets product information:
In the Google Cloud console, go to theBigQuery page.
In the query editor, run the following query to create the
productstable:LOADDATAOVERWRITEcymbal_pets.productsFROMFILES(format='avro',uris=['gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/tables/products/products_*.avro']);
Create theproduct_images table
Create an object table that contains the Cymbal pets product images:
In the query editor of theBigQuery page, run the following query tocreate the
product_imagestable:CREATEORREPLACEEXTERNALTABLEcymbal_pets.product_imagesWITHCONNECTION`us.cymbal_conn`OPTIONS(object_metadata='SIMPLE',uris=['gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*.png'],max_staleness=INTERVAL30MINUTE,metadata_cache_mode=AUTOMATIC);
Create theproduct_manuals table
Create an object table that contains the Cymbal pets product manuals:
In the query editor of theBigQuery page, run the following query tocreate the
product_manualstable:CREATEORREPLACEEXTERNALTABLEcymbal_pets.product_manualsWITHCONNECTION`us.cymbal_conn`OPTIONS(object_metadata='SIMPLE',uris=['gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/documents/*.pdf']);
Create a text generation model
Create a BigQuery MLremote modelthat represents a Vertex AI Gemini model:
In the query editor of theBigQuery page, run the following query tocreate the remote model:
CREATEORREPLACEMODEL`cymbal_pets.gemini`REMOTEWITHCONNECTION`us.cymbal_conn`OPTIONS(ENDPOINT='gemini-2.0-flash');
Create an embedding generation model
Create a BigQuery ML remote modelthat represents a Vertex AI multimodal embedding model:
In the query editor of theBigQuery page, run the following query tocreate the remote model:
CREATEORREPLACEMODEL`cymbal_pets.embedding_model`REMOTEWITHCONNECTION`us.cymbal_conn`OPTIONS(ENDPOINT='multimodalembedding@001');
Create aproducts_mm table with multimodal data
Create aproducts_mm table that contains animage column populated withproduct images from theproduct_images object table. Theimage columnthat is created is aSTRUCT column that uses theObjectRef format.
In the query editor of theBigQuery page, run the following queryto create the
products_mmtable and populate theimagecolumn:CREATEORREPLACETABLEcymbal_pets.products_mmASSELECTproducts.*EXCEPT(uri),ot.refASimageFROMcymbal_pets.productsINNERJOINcymbal_pets.product_imagesotONot.uri=products.uri;
In the query editor of theBigQuery page, run the following queryto view the
imagecolumn data:SELECTproduct_name,imageFROMcymbal_pets.products_mm
The results look similar to the following:
+--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+| product_name | image.uri | image.version | image.authorizer | image.details |+--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+| AquaClear Aquarium Background | gs://cloud-samples-data/bigquery/ | 1234567891011 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"image/png", || | tutorials/cymbal-pets/images/ | | | "md5_hash":"494f63b9b137975ff3e7a11b060edb1d", || | aquaclear-aquarium-background.png | | | "size":1282805,"updated":1742492680017000}} |+--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+| AquaClear Aquarium | gs://cloud-samples-data/bigquery/ | 2345678910112 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"image/png", || Gravel Vacuum | tutorials/cymbal-pets/images/ | | | "md5_hash":"b7bfc2e2641a77a402a1937bcf0003fd", || | aquaclear-aquarium-gravel-vacuum.png | | | "size":820254,"updated":1742492682411000}} |+--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+| ... | ... | ... | | ... |+--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+
Generate product information by using a Gemini model
Use a Gemini model to generate the following data for the pet storeproducts:
- Add an
image_descriptioncolumn to theproducts_mmtable. - Populate the
animal_type,search_keywords, andsubcategorycolumnsof theproducts_mmtable. - Run a query that returns a description of each product brand and also acount of the number of products from that brand. The brand descriptionis generated by analyzing product information for all of the productsfrom that brand, including product images.
In the query editor of theBigQuery page, run the followingquery to create and populate the
image_descriptioncolumn:CREATEORREPLACETABLEcymbal_pets.products_mmASSELECTproduct_id,product_name,brand,category,subcategory,animal_type,search_keywords,price,description,inventory_level,supplier_id,average_rating,image,image_descriptionFROMAI.GENERATE_TABLE(MODEL`cymbal_pets.gemini`,(SELECT('Can you describe the following image? ',OBJ.GET_ACCESS_URL(image,'r'))ASprompt,*FROMcymbal_pets.products_mm),STRUCT('image_description STRING'ASoutput_schema));
In the query editor of theBigQuery page, run the followingquery to update the
animal_type,search_keywords, andsubcategorycolumns with generated data:UPDATEcymbal_pets.products_mmpSETp.animal_type=s.animal_type,p.search_keywords=s.search_keywords,p.subcategory=s.subcategoryFROM(SELECTanimal_type,search_keywords,subcategory,uriFROMAI.GENERATE_TABLE(MODEL`cymbal_pets.gemini`,(SELECT('For the image of a pet product, concisely generate the following metadata: ''1) animal_type and 2) 5 SEO search keywords, and 3) product subcategory. ',OBJ.GET_ACCESS_URL(image,'r'),description)ASprompt,image.uriASuri,FROMcymbal_pets.products_mm),STRUCT('animal_type STRING, search_keywords ARRAY<STRING>, subcategory STRING'ASoutput_schema,100ASmax_output_tokens)))sWHEREp.image.uri=s.uri;
In the query editor of theBigQuery page, run the following queryto view the generated data:
SELECTproduct_name,image_description,animal_type,search_keywords,subcategory,FROMcymbal_pets.products_mm;
The results look similar to the following:
+--------------------------------+-------------------------------------+-------------+------------------------+------------------+| product_name | image.description | animal_type | search_keywords | subcategory |+--------------------------------+-------------------------------------+-------------+------------------------+------------------+| AquaClear Aquarium Background | The image shows a colorful coral | fish | aquarium background | aquarium decor || | reef backdrop. The background is a | | fish tank backdrop | || | blue ocean with a bright light... | | coral reef decor | || | | | underwater scenery | || | | | aquarium decoration | |+--------------------------------+-------------------------------------+-------------+------------------------+------------------+| AquaClear Aquarium | The image shows a long, clear | fish | aquarium gravel vacuum | aquarium || Gravel Vacuum | plastic tube with a green hose | | aquarium cleaning | cleaning || | attached to one end. The tube... | | aquarium maintenance | || | | | fish tank cleaning | || | | | gravel siphon | |+--------------------------------+-------------------------------------+-------------+------------------------+------------------+| ... | ... | ... | ... | ... |+--------------------------------+-------------------------------------+-------------+------------------------+------------------+
In the query editor of theBigQuery page, run the followingquery to generate a description of each product brand and also acount of the number of products from that brand:
SELECTbrand,brand_description,cntFROMAI.GENERATE_TABLE(MODEL`cymbal_pets.gemini`,(SELECTbrand,COUNT(*)AScnt,('Use the images and text to give one concise brand description for a website brand page.''Return the description only. ',ARRAY_AGG(OBJ.GET_ACCESS_URL(image,'r')),' ',ARRAY_AGG(description),' ',ARRAY_AGG(category),' ',ARRAY_AGG(subcategory))ASpromptFROMcymbal_pets.products_mmGROUPBYbrand),STRUCT('brand_description STRING'ASoutput_schema))ORDERBYcntDESC;
The results look similar to the following:
+--------------+-------------------------------------+-----+| brand | brand.description | cnt |+--------------+-------------------------------------+-----+| AquaClear | AquaClear is a brand of aquarium | 33 || | and pond care products that offer | || | a wide range of solutions for... | |+--------------+-------------------------------------+-----+| Ocean | Ocean Bites is a brand of cat food | 28 || Bites | that offers a variety of recipes | || | and formulas to meet the specific.. | |+--------------+-------------------------------------+-----+| ... | ... |... |+--------------+-------------------------------------+-----+
Create a Python UDF to transform product images
Create a Python UDF to convert product images to grayscale.
The Python UDF uses open source libraries , and also uses parallel executionto transform multiple images simultaneously.
In the query editor of theBigQuery page, run the followingquery to create the
to_grayscaleUDF:CREATEORREPLACEFUNCTIONcymbal_pets.to_grayscale(src_jsonSTRING,dst_jsonSTRING)RETURNSSTRINGLANGUAGEpythonWITHCONNECTION`us.cymbal_conn`OPTIONS(entry_point='to_grayscale',runtime_version='python-3.11',packages=['numpy','opencv-python'])AS"""import cv2 as cvimport numpy as npfrom urllib.request import urlopen, Requestimport json# Transform the image to grayscale.def to_grayscale(src_ref, dst_ref): src_json = json.loads(src_ref) srcUrl = src_json["access_urls"]["read_url"] dst_json = json.loads(dst_ref) dstUrl = dst_json["access_urls"]["write_url"] req = urlopen(srcUrl) arr = np.asarray(bytearray(req.read()), dtype=np.uint8) img = cv.imdecode(arr, -1) # 'Load it as it is' # Convert the image to grayscale gray_image = cv.cvtColor(img, cv.COLOR_BGR2GRAY) # Send POST request to the URL _, img_encoded = cv.imencode('.png', gray_image) req = Request(url=dstUrl, data=img_encoded.tobytes(), method='PUT', headers = { "Content-Type": "image/png", }) with urlopen(req) as f: pass return dst_ref""";
Transform product images
Create theproducts_grayscale table with anObjectRef column that containsthe destination paths and authorizers for grayscale images. The destinationpath is derived from the original image path.
After you create the table, run theto_grayscale function to create thegrayscale images, write them to a Cloud Storage bucket, and then returnObjectRefRuntimevalues containing access URLs and metadata for the grayscale images.
In the query editor of theBigQuery page, run the following query tocreate the
products_grayscaletable:CREATEORREPLACETABLEcymbal_pets.products_grayscaleASSELECTproduct_id,product_name,image,OBJ.MAKE_REF(CONCAT('gs://BUCKET/cymbal-pets-images/grayscale/',REGEXP_EXTRACT(image.uri,r'([^/]+)$')),'us.cymbal_conn')ASgray_imageFROMcymbal_pets.products_mm;
Replace
BUCKETwith the name of thebucket that you created.In the query editor of theBigQuery page, run the following queryto create the grayscale images, write them to a Cloud Storagebucket, and then return
ObjectRefRuntimevalues containing access URLsand metadata for the grayscale images:SELECTcymbal_pets.to_grayscale(TO_JSON_STRING(OBJ.GET_ACCESS_URL(image,'r')),TO_JSON_STRING(OBJ.GET_ACCESS_URL(gray_image,'rw')))FROMcymbal_pets.products_grayscale;
The results look similar to the following:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+| f0 |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+| {"access_urls":{"expiry_time":"2025-04-26T03:00:48Z", || "read_url":"https://storage.googleapis.com/mybucket/cymbal-pets-images%2Fgrayscale%2Focean-bites-salmon-%26-tuna-cat-food.png?additional_read URL_information", || "write_url":"https://storage.googleapis.com/myproject/cymbal-pets-images%2Fgrayscale%2Focean-bites-salmon-%26-tuna-cat-food.png?additional_write URL_information"}, || "objectref":{"authorizer":"myproject.region.myconnection","uri":"gs://myproject/cymbal-pets-images/grayscale/ocean-bites-salmon-&-tuna-cat-food.png"}} |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+| {"access_urls":{"expiry_time":"2025-04-26T03:00:48Z", || "read_url":"https://storage.googleapis.com/mybucket/cymbal-pets-images%2Fgrayscale%2Ffluffy-buns-guinea-pig-tunnel.png?additional _read URL_information", || "write_url":"https://storage.googleapis.com/myproject/cymbal-pets-images%2Fgrayscale%2Focean-bites-salmon-%26-tuna-cat-food.png?additional_write_URL_information"}, || "objectref":{"authorizer":"myproject.region.myconnection","uri":"gs://myproject/cymbal-pets-images%2Fgrayscale%2Ffluffy-buns-guinea-pig-tunnel.png"}} |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+| ... |+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Create a Python UDF to chunk PDF data
Create a Python UDF to chunk the PDF objects that contain theCymbal pets product manuals into multiple parts.
PDFs are often very large and might not fit into a single call to agenerative AI model. By chunking the PDFs, you can store the PDF data in amodel-ready format for easier analysis.
In the query editor of theBigQuery page, run the following queryto create the
chunk_pdfUDF:-- This function chunks the product manual PDF into multiple parts.-- The function accepts an ObjectRefRuntime value for the PDF file and the chunk size.-- It then parses the PDF, chunks the contents, and returns an array of chunked text.CREATEORREPLACEFUNCTIONcymbal_pets.chunk_pdf(src_jsonSTRING,chunk_sizeINT64,overlap_sizeINT64)RETURNSARRAY<STRING>LANGUAGEpythonWITHCONNECTION`us.cymbal_conn`OPTIONS(entry_point='chunk_pdf',runtime_version='python-3.11',packages=['pypdf'])AS"""import ioimport jsonfrom pypdf import PdfReader # type: ignorefrom urllib.request import urlopen, Requestdef chunk_pdf(src_ref: str, chunk_size: int, overlap_size: int) -> str: src_json = json.loads(src_ref) srcUrl = src_json["access_urls"]["read_url"] req = urlopen(srcUrl) pdf_file = io.BytesIO(bytearray(req.read())) reader = PdfReader(pdf_file, strict=False) # extract and chunk text simultaneously all_text_chunks = [] curr_chunk ="" for page in reader.pages: page_text = page.extract_text() if page_text: curr_chunk += page_text # split the accumulated text into chunks of a specific size with overlaop # this loop implements a sliding window approach to create chunks while len(curr_chunk) >= chunk_size: split_idx = curr_chunk.rfind("", 0, chunk_size) if split_idx == -1: split_idx = chunk_size actual_chunk = curr_chunk[:split_idx] all_text_chunks.append(actual_chunk) overlap = curr_chunk[split_idx + 1 : split_idx + 1 + overlap_size] curr_chunk = overlap + curr_chunk[split_idx + 1 + overlap_size :] if curr_chunk: all_text_chunks.append(curr_chunk) return all_text_chunks""";
Analyze PDF data
Run thechunk_pdf function to chunk the PDF data in theproduct_manualstable, and then create aproduct_manual_chunk_strings table that contains one PDFchunk per row. Use a Gemini model on theproduct_manual_chunk_strings data tosummarize the legal information found in the product manuals.
In the query editor of theBigQuery page, run the following query tocreate the
product_manual_chunk_stringstable:CREATEORREPLACETABLEcymbal_pets.product_manual_chunk_stringsASSELECTchunkedFROMcymbal_pets.product_manuals,UNNEST(cymbal_pets.chunk_pdf(TO_JSON_STRING(OBJ.GET_ACCESS_URL(OBJ.MAKE_REF(uri,'us.cymbal_conn'),'r')),1000,100))aschunked;
In the query editor of theBigQuery page, run the following query toanalyze the PDF data by using a Gemini model:
SELECTresultFROMAI.GENERATE_TEXT(MODEL`cymbal_pets.gemini`,(SELECT('Can you summarize the product manual as bullet points? Highlight the legal clauses',chunked)ASprompt,FROMcymbal_pets.product_manual_chunk_strings));
The results look similar to the following:
+-------------------------------------------------------------------------------------------------------------------------------------------+| result |+-------------------------------------------------------------------------------------------------------------------------------------------+| ## CritterCuisine Pro 5000 Automatic Pet Feeder Manual Summary: || || **Safety:** || || * **Stability:** Place feeder on a level, stable surface to prevent tipping. || * **Power Supply:** Only use the included AC adapter. Using an incompatible adapter can damage the unit and void the warranty. || * **Cord Safety:** Keep the power cord out of reach of pets to prevent chewing or entanglement. || * **Children:** Supervise children around the feeder. This is not a toy. || * **Pet Health:** Consult your veterinarian before using an automatic feeder if your pet has special dietary needs, health conditions, or |+-------------------------------------------------------------------------------------------------------------------------------------------+| ## Product Manual Summary: || || **6.3 Manual Feeding:** || || * Press MANUAL button to dispense a single portion (Meal 1 size). **(Meal Enabled)** || || **6.4 Recording a Voice Message:** || || * Press and hold VOICE button. || * Speak clearly into the microphone (up to 10 seconds). || * Release VOICE button to finish recording. || * Briefly press VOICE button to play back the recording. || * To disable the voice message, record a blank message (hold VOICE button for 10 seconds without speaking). **(Meal Enabled)** || || **6.5 Low Food Level Indicator:** |+-------------------------------------------------------------------------------------------------------------------------------------------+| ... |+-------------------------------------------------------------------------------------------------------------------------------------------+
Generate embeddings and perform a vector search
Generate embeddings from image data, and then use the embeddings to returnsimilar images by usingvector search.
In a production scenario, we recommend creating avector indexbefore running a vector search. A vectorindex lets you perform the vector search more quickly, with the trade-off ofreducing recall and so returning more approximate results.
In the query editor of theBigQuery page, run the following query to create the
products_embeddingstable:CREATEORREPLACETABLEcymbal_pets.products_embeddingASSELECTproduct_id,embedding,contentasimageFROMAI.GENERATE_EMBEDDING(MODEL`cymbal_pets.embedding_model`,(SELECTOBJ.GET_ACCESS_URL(image,'r')ascontent,image,product_idFROMcymbal_pets.products_mm));
In the query editor of theBigQuery page, run the followingquery to run a vector search to return product images that are similar tothe given input image:
SELECT*FROMVECTOR_SEARCH(TABLEcymbal_pets.products_embedding,'embedding',(SELECTembeddingFROMAI.GENERATE_EMBEDDING(MODEL`cymbal_pets.embedding_model`,(SELECTOBJ.FETCH_METADATA(OBJ.MAKE_REF('gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/cozy-naps-cat-scratching-post-with-condo.png','us.cymbal_conn'))ascontent))));
The results look similar to the following:
+-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+| query.embedding | base.product_id | base.embedding | base.image.uri | base.image.version | base.image.authorizer | base.image.details | distance |+-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+| -0.0112330541 | 181 | -0.0112330541 | gs://cloud-samples-data/bigquery/ | 12345678910 | myproject.region.myconnection | {"gcs_metadata":{"content_type": | 0.0 || 0.0142525584 | | 0.0142525584 | tutorials/cymbal-pets/images/ | | | "image/png","md5_hash":"21234567hst16555w60j", | || 0.0135886827 | | 0.0135886827 | cozy-naps-cat-scratching-post-with-condo.png | | | "size":828318,"updated":1742492688982000}} | || 0.0149955815 | | 0.0149955815 | | | | | || ... | | ... | | | | | || | | | | | | | || | | | | | | | |+-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+| -0.0112330541 | 187 | -0.0190353896 | gs://cloud-samples-data/bigquery/ | 23456789101 | myproject.region.myconnection | {"gcs_metadata":{"content_type": | 0.4216330832.. || 0.0142525584 | | 0.0116206668 | tutorials/cymbal-pets/images/ | | | "image/png","md5_hash":"7328728fhakd9937djo4", | || 0.0135886827 | | 0.0136198215 | cozy-naps-cat-scratching-post-with-bed.png | | | "size":860113,"updated":1742492688774000}} | || 0.0149955815 | | 0.0173457414 | | | | | || ... | | ... | | | | | || | | | | | | | || | | | | | | | |+---------C--------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+| ... | ... | ... | ... | ... | ... | ... | ... |+-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+
Process ordered multimodal data using arrays ofObjectRef values
This section shows you how to complete the following tasks:
- Recreate the
product_manualstable so that it contains both a PDF filefor theCrittercuisine 5000product manual, and PDF files for each pageof that manual. - Create a table that maps the manual to its chunks. The
ObjectRefvaluethat represents the complete manual is stored in aSTRUCT<uri STRING, version STRING, authorizer STRING, details JSON>>column. TheObjectRefvalues that represent the manual pages are storedin anARRAY<STRUCT<uri STRING, version STRING, authorizer STRING, details JSON>>column. - Analyze an array of
ObjectRefvalues together to returna single generated value. - Analyze an array of
ObjectRefvalues separately andreturning a generated value for each array value.
As part of the analysis tasks, you convert the array ofObjectRef values toan ordered list ofObjectRefRuntimevalues, and then pass that list to a Gemini model, specifying theObjectRefRuntime values as part of the prompt. TheObjectRefRuntime valuesprovide signed URLs that the model uses to access the object information inCloud Storage.
Follow these steps to process ordered multimodal data usingarrays ofObjectRef values:
Go to theBigQuery page.
In the query editor, run the following query to recreate the
product_manualstable:CREATEORREPLACEEXTERNALTABLE`cymbal_pets.product_manuals`WITHCONNECTION`us.cymbal_conn`OPTIONS(object_metadata='SIMPLE',uris=['gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/documents/*.pdf','gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/document_chunks/*.pdf']);
In the query editor, run the following query to write PDF data to the
map_manual_to_chunkstable:-- Extract the file and chunks into a single table.-- Store the chunks in the chunks column as array of ObjectRefs (ordered by page number)CREATEORREPLACETABLEcymbal_pets.map_manual_to_chunksASSELECTARRAY_AGG(m1.ref)[0]manual,ARRAY_AGG(m2.refORDERBYm2.ref.uri)chunksFROMcymbal_pets.product_manualsm1JOINcymbal_pets.product_manualsm2ONREGEXP_EXTRACT(m1.uri,r'.*/([^.]*).[^/]+')=REGEXP_EXTRACT(m2.uri,r'.*/([^.]*)_page[0-9]+.[^/]+')GROUPBYm1.uri;
In the query editor, run the following query to view the PDF data in the
map_manual_to_chunkstable:SELECT*FROMcymbal_pets.map_manual_to_chunks;
The results look similar to the following:
+-------------------------------------+--------------------------------+-----------------------------------+------------------------------------------------------+-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+| manual.uri | manual.version | manual.authorizer | manual.details | chunks.uri | chunks.version | chunks.authorizer | chunks.details |+-------------------------------------+--------------------------------+-----------------------------------+------------------------------------------------------+-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+| gs://cloud-samples-data/bigquery/ | 1742492785900455 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"application/pef", | gs://cloud-samples-data/bigquery/ | 1745875761227129 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"application/pdf", || tutorials/cymbal-pets/documents/ | | | "md5_hash":"c9032b037693d15a33210d638c763d0e", | tutorials/cymbal-pets/documents/ | | | "md5_hash":"5a1116cce4978ec1b094d8e8b49a1d7c", || crittercuisine_5000_user_manual.pdf | | | "size":566105,"updated":1742492785941000}} | crittercuisine_5000_user_manual_page1.pdf | | | "size":504583,"updated":1745875761266000}} || | | | +-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+| | | | | crittercuisine_5000_user_manual_page1.pdf | 1745875760613874 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"application/pdf", || | | | | tutorials/cymbal-pets/documents/ | | | "md5_hash":"94d03ec65d28b173bc87eac7e587b325", || | | | | crittercuisine_5000_user_manual_page2.pdf | | | "size":94622,"updated":1745875760649000}} || | | | +-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+| | | | | ... | ... | ... | ... |+-------------------------------------+--------------------------------+-----------------------------------+------------------------------------------------------+-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+In the query editor, run the following query to generate a single responsefrom a Gemini model based on the analysis of an array of
ObjectRefvalues:WITHmanualsAS(SELECTOBJ.GET_ACCESS_URL(manual,'r')ASmanual,ARRAY(SELECTOBJ.GET_ACCESS_URL(chunk,'r')ASchunkFROMUNNEST(m1.chunks)ASchunkWITHOFFSETASidxORDERBYidx)ASchunksFROMcymbal_pets.map_manual_to_chunksASm1)SELECTresultASResponseFROMAI.GENERATE_TEXT(MODEL`cymbal_pets.gemini`,(SELECT('Can you provide a page by page summary for the first 3 pages of the attached manual? Only write one line for each page. The pages are provided in serial order',manuals.chunks)ASprompt,FROMmanuals));
The results look similar to the following:
+-------------------------------------------+| Response |+-------------------------------------------+| Page 1: This manual is for the || CritterCuisine Pro 5000 automatic || pet feeder. || Page 2: The manual covers safety || precautions, what's included, || and product overview. || Page 3: The manual covers assembly, || initial setup, and programming the clock. |+-------------------------------------------+
In the query editor, run the following query to generate multiple responsesfrom a Gemini model based on the analysis of an array of
ObjectRefvalues:WITHinput_chunked_objrefsAS(SELECTrow_id,offset,chunk_refFROM(SELECTROW_NUMBER()OVER()ASrow_id,*FROM`cymbal_pets.map_manual_to_chunks`)ASindexed_tableLEFTJOINUNNEST(indexed_table.chunks)ASchunk_refWITHOFFSET),get_access_urlsAS(SELECTrow_id,offset,chunk_ref,OBJ.GET_ACCESS_URL(chunk_ref,'r')ASObjectRefRuntimeFROMinput_chunked_objrefs),valid_get_access_urlsAS(SELECT*FROMget_access_urlsWHEREObjectRefRuntime['runtime_errors']ISNULL),ordered_output_objrefruntime_arrayAS(SELECTARRAY_AGG(ObjectRefRuntimeORDERBYoffset)ASObjectRefRuntimeArrayFROMvalid_get_access_urlsGROUPBYrow_id)SELECTpage1_summary,page2_summary,page3_summaryFROMAI.GENERATE_TABLE(MODEL`cymbal_pets.gemini`,(SELECT('Can you provide a page by page summary for the first 3 pages of the attached manual? Only write one line for each page. The pages are provided in serial order',ObjectRefRuntimeArray)ASprompt,FROMordered_output_objrefruntime_array),STRUCT('page1_summary STRING, page2_summary STRING, page3_summary STRING'ASoutput_schema));
The results look similar to the following:
+-----------------------------------------------+-------------------------------------------+----------------------------------------------------+| page1_summary | page2_summary | page3_summary |+-----------------------------------------------+-------------------------------------------+----------------------------------------------------+| This manual provides an overview of the | This section explains how to program | This page covers connecting the feeder to Wi-Fi || CritterCuisine Pro 5000 automatic pet feeder, | the feeder's clock, set feeding | using the CritterCuisine Connect app, remote || including its features, safety precautions, | schedules, copy and delete meal settings, | feeding, managing feeding schedules, viewing || assembly instructions, and initial setup. | manually feed your pet, record | feeding logs, receiving low food alerts, || | a voice message, and understand | updating firmware, creating multiple pet profiles, || | the low food level indicator. | sharing access with other users, and cleaning || | | and maintaining the feeder. |+-----------------------------------------------+-------------------------------------------+----------------------------------------------------+
Clean up
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-16 UTC.