Configure and use entity resolution in BigQuery
This document shows how to implemententity resolutionfor entity resolution end users (hereafter referred to asend users) andidentity providers.
End users can use this document to connect with an identity provider and use theprovider's service to match records. Identity providers can use this document toset up and configure services to share with end users on theGoogle Cloud Marketplace.
Workflow for end users
The following sections show end users how to configure entity resolution inBigQuery. For a visual representation of the complete setup, seethearchitecture for entity resolution.
Before you begin
- Contact and establish a relationship with an identity provider.BigQuery supports entity resolution withLiveRamp andTransUnion.
- Acquire the following items from the identity provider:
- Service account credentials
- Remote function signature
- Create two datasets in your project:
- Input dataset
- Output dataset
Required roles
To get the permissions that you need to run entity resolution jobs, ask your administrator to grant you the following IAM roles:
- For the identity provider's service account to read the input dataset and write to the output dataset:
- BigQuery Data Viewer (
roles/bigquery.dataViewer) on the input dataset - BigQuery Data Editor (
roles/bigquery.dataEditor) on the output dataset
- BigQuery Data Viewer (
For more information about granting roles, seeManage access to projects, folders, and organizations.
You might also be able to get the required permissions throughcustom roles or otherpredefined roles.
Translate or resolve entities
For specific identity provider instructions, refer to the following sections.
LiveRamp
Prerequisites
- Configure LiveRamp Embedded Identity in BigQuery. For moreinformation, seeEnabling LiveRamp Embedded Identity in BigQuery.
- Coordinate with LiveRamp to enable API credentials for use with EmbeddedIdentity. For more information, seeAuthentication.
Setup
The following steps are required when you use LiveRamp Embedded Identity for thefirst time. After setup is complete, only the input table and metadata tableneed to be modified between runs.
Create an input table
Create a table in the input dataset. Populate the table with RampIDs, targetdomains, and target types. For details and examples, seeInput Table Columns and Descriptions.
Create a metadata table
The metadata table is used to control the execution of LiveRamp EmbeddedIdentity on BigQuery. Create a metadata table in the input dataset.Populate the metadata table with client IDs, execution modes, targetdomains, and target types. For details and examples, seeMetadata Table Columns and Descriptions.
Share tables with LiveRamp
Grant the LiveRamp Google Cloud service account access to view andprocess data in your input dataset. For details and examples, seeShare Tables and Datasets with LiveRamp.
Run an embedded identity job
To run an embedded identity job with LiveRamp in BigQuery, do thefollowing:
- Confirm that all RampIDs that were encoded in your domain are in your inputtable.
- Confirm that your metadata table is still accurate before you run the job.
- ContactLiveRampIdentitySupport@liveramp.comwith a job process request. Include the project ID, dataset ID, and tableID (if applicable) for your input table, metadata table, and outputdataset. For more information, seeNotify LiveRamp to Initiate Transcoding.
Results are generally delivered to your output dataset within three business days.
LiveRamp support
For support issues, contactLiveRamp Identity Support.
LiveRamp billing
LiveRamphandles billing for entity resolution.
TransUnion
Prerequisites
- ContactTransUnion Cloud Supportto execute an agreement to access the service. Provide the details of yourGoogle Cloud project ID, input data types, use case, and data volume.
- TransUnion Cloud Support enables the service for your Google Cloud projectand shares a detailed implementation guide that includes available outputdata.
Setup
The following steps are required when you use TransUnion's TruAudienceIdentity Resolution and Enrichment service in your BigQueryenvironment.
Create an external connection
Create a connection to an external data sourceof theVertex AI remote models, remote functions and BigLake (Cloud Resource)type. You will use this connection to trigger the identity resolution servicehosted in the TransUnion Google Cloud account from yourGoogle Cloud account.
Copy the connection ID and service account ID and share theseidentifiers with the TransUnion customer delivery team.
Create a remote function
Create a remote functionthat interacts with the service orchestrator endpoint that is hosted on theTransUnion Google Cloud project to pass the necessary metadata (including schemamappings) to the TransUnion service. Use the connection ID from the externalconnection that you created and the TransUnion-hosted cloud function endpointshared by the TransUnion customer delivery team.
Create an input table
Create a table in the input dataset. TransUnion supports name, postal address,email, phone, date of birth, IPv4 address, and device IDs as inputs. Followthe formatting guidelines in the implementation guide that TransUnion sharedwith you.
Create a metadata table
Create a metadata table that will store the configuration required by theidentity resolution service to process data, including schema mappings. Fordetails and examples, refer to the implementation guide that TransUnion sharedwith you.
Create a job status table
Create a table that will receive updates about the processing of an inputbatch. You can query this table to trigger other downstream processes in yourpipeline. The possible job statuses are as follows:RUNNING,COMPLETED, orERROR.
Create the service invocation
Use the following procedure to call the TransUnion identity resolution serviceafter collecting all the metadata, packaging it, and passing it to theinvocation cloud function endpoint hosted by TransUnion.
-- create service invocation procedureCREATEORREPLACEPROCEDURE`<project_id>.<dataset_id>.TransUnion_get_identities`(metadata_tableSTRING,config_idSTRING)begindeclaresql_querySTRING;declarejson_resultSTRING;declarebase64_resultSTRING;SETsql_query='''select to_json_string(array_agg(struct(config_id,key,value))) from `'''||metadata_table||'''` where config_id="'''||config_id||'''" ''';EXECUTEimmediatesql_queryINTOjson_result;SETbase64_result=(SELECTto_base64(CAST(json_resultASbytes)));SELECT`<project_id>.<dataset_id>.remote_call_TransUnion_er`(base64_result);END;Create the matching output table
Run the following SQL script to create the matching output table. This is thestandard output of the application, which includes match flags, scores,persistent individual IDs, and household IDs.
-- create output tableCREATETABLE`<project_id>.<dataset_id>.TransUnion_identity_output`(batchidSTRING,uniqueidSTRING,ekeySTRING,hhidSTRING,collaborationidSTRING,firstnamematchSTRING,lastnamematchSTRING,addressmatchesSTRING,addresslinkagescoresSTRING,phonematchesSTRING,phonelinkagescoresSTRING,emailmatchesSTRING,emaillinkagescoresSTRING,dobmatchesSTRING,doblinkagescoreSTRING,ipmatchesSTRING,iplinkagescoreSTRING,devicematchesSTRING,devicelinkagescoreSTRING,lastprocessedSTRING);Configure metadata
Follow the implementation guide that TransUnion shared with you to map yourinput schema to the application schema. This metadata also configures thegeneration of collaboration IDs, which are shareable non-persistentidentifiers that can be used in data clean rooms.
Grant read and write access
Obtain the service account ID of the Apache Spark connectionfrom the TransUnion customer delivery team and grant it read and writeaccess to the dataset containing the input and output tables. We recommendproviding the service account ID with aBigQuery Data Editor roleon the dataset.
Invoke the application
You can invoke the application from within your environment by running thefollowing script.
Note: You can use multiple input tables, as long as they are mapped todifferent metadata configurations.call`<project_id>.<dataset_id>.TransUnion_get_identities`("<project_id>.<dataset_id>.TransUnion_er_metadata","1");--usingmetadatatable,and1=config_idforthebatchrunSupport
For technical issues, contactTransUnion Cloud Support.
Billing and usage
TransUnion tracks usage of the application and uses it for billing purposes.Active customers can contact their TransUnion delivery representativefor more information.
Workflow for identity providers
The following sections show identity providers how to configure entityresolution in BigQuery. For a visual representation of thecomplete setup, see thearchitecture for entity resolution.
Before you begin
- Create aCloud Runjob or aCloud Run functionto integrate with the remote function. Both options are suitable for thispurpose.
Note the name of the service account that's associated with theCloud Run or Cloud Run function:
In the Google Cloud console, go to theCloud Functions page.
Click the function's name, and then click theDetails tab.
In theGeneral Information pane, find and note the serviceaccount name for the remote function.
Create aremote function.
Collect end-user principals from the end user.
Required roles
To get the permissions that you need to run entity resolution jobs, ask your administrator to grant you the following IAM roles:
- For the service account that's associated with your function to read and write on associated datasets and launch jobs:
- BigQuery Data Editor (
roles/bigquery.dataEditor) on the project - BigQuery Job User (
roles/bigquery.jobUser) on the project
- BigQuery Data Editor (
- For the end-user principal to see and connect to the remote function:
- BigQuery Connection User (
roles/bigquery.connectionUser) on the connection - BigQuery Data Viewer (
roles/bigquery.dataViewer) on the control plane dataset with the remote function
- BigQuery Connection User (
For more information about granting roles, seeManage access to projects, folders, and organizations.
You might also be able to get the required permissions throughcustom roles or otherpredefined roles.
Share entity resolution remote function
Modify and share the following remote interface code with the end user. The enduser needs this code to start the entity resolution job.
`PARTNER_PROJECT_ID.DATASET_ID`.match`(LIST_OF_PARAMETERS)ReplaceLIST_OF_PARAMETERS with the list of parameters that arepassed to the remote function.
Optional: Provide job metadata
You can optionally provide job metadata by using a separate remote functionor by writing a new status table in the user's output dataset. Examples ofmetadata include job statuses and metrics.
Billing for identity providers
To streamline customer billing and onboarding, we recommend that you integrateyour entity resolution service with theGoogle Cloud Marketplace.This lets you set up apricing modelbased on the entity resolution job usage, with Google handling the billing foryou. For more information, seeOffering software as a service (SaaS) products.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-16 UTC.