Analyze object tables by using remote functions
This document describes how to analyze unstructured data inobject tables by usingremote functions.
Overview
You can analyze the unstructured data represented by an object table by usinga remote function. A remote function lets you call a function running onCloud Run functions or Cloud Run, which you can program to accessresources such as:
- Google's pre-trained AI models, including Cloud Vision API andDocument AI.
- Open source libraries such asApache Tika.
- Your own custom models.
To analyze object table data by using a remote function, you mustgenerate and pass insigned URLs for theobjects in the object table when you call the remote function. These signedURLs are what grant the remote function access to the objects.
Required permissions
To create the connection resource used by the remote function, you need the following permissions:
bigquery.connections.createbigquery.connections.getbigquery.connections.listbigquery.connections.updatebigquery.connections.usebigquery.connections.delete
To create a remote function, you need the permissions associated with theCloud Functions DeveloperorCloud Run Developer roles.
To invoke a remote function, you need the permissions described inRemote functions.
To analyze an object table with a remote function, you need the
bigquery.tables.getDatapermission on the object table.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, BigQuery Connection API, Cloud Run functions APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, BigQuery Connection API, Cloud Run functions APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.- Ensure that your BigQuery administrator hascreated a connection andset up access to Cloud Storage.
Create a remote function
For general instructions on creating a remote function, seeWorking with remote functions.
When you create a remote function to analyze object table data, you mustpass insigned URLSthat have been generated for the objects in the object table. You can do thisby using an input parameter with aSTRING data type. The signed URLS aremade available to the remote function as input data in thecalls field of the HTTPPOST request.An example of a request is:
{ // Other fields omitted. "calls": [ ["https://storage.googleapis.com/mybucket/1.pdf?X-Goog-SignedHeaders=abcd"], ["https://storage.googleapis.com/mybucket/2.pdf?X-Goog-SignedHeaders=wxyz"] ]}You can read an object in your remote function by using a method that makesan HTTPGET request to the signed URL. The remote function can access theobject because the signed URL contains authentication information in itsquery string.
When you specify theCREATE FUNCTION statementfor the remote function, we recommend that you set themax_batching_rowsoption to 1 in order toavoid Cloud Run functions timeoutand increase processing parallelism.
Example
The following Cloud Run functions Python code example reads storageobjects and returns their content length to BigQuery:
importfunctions_frameworkimportjsonimporturllib.request@functions_framework.httpdefobject_length(request):calls=request.get_json()['calls']replies=[]forcallincalls:object_content=urllib.request.urlopen(call[0]).read()replies.append(len(object_content))returnjson.dumps({'replies':replies})Deployed, this function would have an endpoint similar tohttps://us-central1-myproject.cloudfunctions.net/object_length.
The following example shows how to create a BigQuery remotefunction based on this Cloud Run functions function:
CREATEFUNCTIONmydataset.object_length(signed_urlSTRING)RETURNSINT64REMOTEWITHCONNECTION`us.myconnection`OPTIONS(endpoint="https://us-central1-myproject.cloudfunctions.net/object_length",max_batching_rows=1);
For step-by-step guidance, seeTutorial: Analyze an object table with a remote function.
Call a remote function
To call a remote function on object table data, reference the remotefunction in theselect_listof the query, and then call theEXTERNAL_OBJECT_TRANSFORM functionin theFROM clauseto generate the signed URLs for the objects.
LIMIT clause to limit theresults returned if necessary to stay within quota.The following example shows typical statement syntax:
SELECTuri,function_name(signed_url)ASfunction_outputFROMEXTERNAL_OBJECT_TRANSFORM(TABLEmy_dataset.object_table,["SIGNED_URL"])LIMIT10000;
The following example shows how to process only a subset of the object tablecontents with a remote function:
SELECTuri,function_name(signed_url)ASfunction_outputFROMEXTERNAL_OBJECT_TRANSFORM(TABLEmy_dataset.object_table,["SIGNED_URL"])WHEREcontent_type="application/pdf";
What's next
Learn how torun inference on image object tables.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.