Configure and use entity resolution in BigQuery

This document shows how to implemententity resolutionfor entity resolution end users (hereafter referred to asend users) andidentity providers.

End users can use this document to connect with an identity provider and use theprovider's service to match records. Identity providers can use this document toset up and configure services to share with end users on theGoogle Cloud Marketplace.

Workflow for end users

The following sections show end users how to configure entity resolution inBigQuery. For a visual representation of the complete setup, seethearchitecture for entity resolution.

Before you begin

Contact and establish a relationship with an identity provider.BigQuery supports entity resolution withLiveRamp andTransUnion.
Acquire the following items from the identity provider:
- Service account credentials
- Remote function signature
Create two datasets in your project:
- Input dataset
- Output dataset

Required roles

To get the permissions that you need to run entity resolution jobs, ask your administrator to grant you the following IAM roles:

For the identity provider's service account to read the input dataset and write to the output dataset:
- BigQuery Data Viewer (roles/bigquery.dataViewer) on the input dataset
- BigQuery Data Editor (roles/bigquery.dataEditor) on the output dataset

For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Translate or resolve entities

For specific identity provider instructions, refer to the following sections.

LiveRamp

Prerequisites

Configure LiveRamp Embedded Identity in BigQuery. For moreinformation, seeEnabling LiveRamp Embedded Identity in BigQuery.
Coordinate with LiveRamp to enable API credentials for use with EmbeddedIdentity. For more information, seeAuthentication.

Setup

The following steps are required when you use LiveRamp Embedded Identity for thefirst time. After setup is complete, only the input table and metadata tableneed to be modified between runs.

Create an input table

Create a table in the input dataset. Populate the table with RampIDs, targetdomains, and target types. For details and examples, seeInput Table Columns and Descriptions.

Create a metadata table

The metadata table is used to control the execution of LiveRamp EmbeddedIdentity on BigQuery. Create a metadata table in the input dataset.Populate the metadata table with client IDs, execution modes, targetdomains, and target types. For details and examples, seeMetadata Table Columns and Descriptions.

Share tables with LiveRamp

Grant the LiveRamp Google Cloud service account access to view andprocess data in your input dataset. For details and examples, seeShare Tables and Datasets with LiveRamp.

Run an embedded identity job

To run an embedded identity job with LiveRamp in BigQuery, do thefollowing:

Confirm that all RampIDs that were encoded in your domain are in your inputtable.
Confirm that your metadata table is still accurate before you run the job.
ContactLiveRampIdentitySupport@liveramp.comwith a job process request. Include the project ID, dataset ID, and tableID (if applicable) for your input table, metadata table, and outputdataset. For more information, seeNotify LiveRamp to Initiate Transcoding.

Results are generally delivered to your output dataset within three business days.

LiveRamp support

For support issues, contactLiveRamp Identity Support.

LiveRamp billing

LiveRamphandles billing for entity resolution.

TransUnion

Prerequisites

ContactTransUnion Cloud Supportto execute an agreement to access the service. Provide the details of yourGoogle Cloud project ID, input data types, use case, and data volume.
TransUnion Cloud Support enables the service for your Google Cloud projectand shares a detailed implementation guide that includes available outputdata.

Setup

The following steps are required when you use TransUnion's TruAudienceIdentity Resolution and Enrichment service in your BigQueryenvironment.

Create an external connection

Create a connection to an external data sourceof theVertex AI remote models, remote functions and BigLake (Cloud Resource)type. You will use this connection to trigger the identity resolution servicehosted in the TransUnion Google Cloud account from yourGoogle Cloud account.

Copy the connection ID and service account ID and share theseidentifiers with the TransUnion customer delivery team.

Create a remote function

Create a remote functionthat interacts with the service orchestrator endpoint that is hosted on theTransUnion Google Cloud project to pass the necessary metadata (including schemamappings) to the TransUnion service. Use the connection ID from the externalconnection that you created and the TransUnion-hosted cloud function endpointshared by the TransUnion customer delivery team.

Create an input table

Create a table in the input dataset. TransUnion supports name, postal address,email, phone, date of birth, IPv4 address, and device IDs as inputs. Followthe formatting guidelines in the implementation guide that TransUnion sharedwith you.

Create a metadata table

Create a metadata table that will store the configuration required by theidentity resolution service to process data, including schema mappings. Fordetails and examples, refer to the implementation guide that TransUnion sharedwith you.

Create a job status table

Create a table that will receive updates about the processing of an inputbatch. You can query this table to trigger other downstream processes in yourpipeline. The possible job statuses are as follows:RUNNING,COMPLETED, orERROR.

Create the service invocation

Use the following procedure to call the TransUnion identity resolution serviceafter collecting all the metadata, packaging it, and passing it to theinvocation cloud function endpoint hosted by TransUnion.

-- create service invocation procedureCREATEORREPLACEPROCEDURE`<project_id>.<dataset_id>.TransUnion_get_identities`(metadata_tableSTRING,config_idSTRING)begindeclaresql_querySTRING;declarejson_resultSTRING;declarebase64_resultSTRING;SETsql_query='''select to_json_string(array_agg(struct(config_id,key,value))) from `'''||metadata_table||'''` where  config_id="'''||config_id||'''" ''';EXECUTEimmediatesql_queryINTOjson_result;SETbase64_result=(SELECTto_base64(CAST(json_resultASbytes)));SELECT`<project_id>.<dataset_id>.remote_call_TransUnion_er`(base64_result);END;

Create the matching output table

Run the following SQL script to create the matching output table. This is thestandard output of the application, which includes match flags, scores,persistent individual IDs, and household IDs.

-- create output tableCREATETABLE`<project_id>.<dataset_id>.TransUnion_identity_output`(batchidSTRING,uniqueidSTRING,ekeySTRING,hhidSTRING,collaborationidSTRING,firstnamematchSTRING,lastnamematchSTRING,addressmatchesSTRING,addresslinkagescoresSTRING,phonematchesSTRING,phonelinkagescoresSTRING,emailmatchesSTRING,emaillinkagescoresSTRING,dobmatchesSTRING,doblinkagescoreSTRING,ipmatchesSTRING,iplinkagescoreSTRING,devicematchesSTRING,devicelinkagescoreSTRING,lastprocessedSTRING);

Configure metadata

Follow the implementation guide that TransUnion shared with you to map yourinput schema to the application schema. This metadata also configures thegeneration of collaboration IDs, which are shareable non-persistentidentifiers that can be used in data clean rooms.

Grant read and write access

Obtain the service account ID of the Apache Spark connectionfrom the TransUnion customer delivery team and grant it read and writeaccess to the dataset containing the input and output tables. We recommendproviding the service account ID with aBigQuery Data Editor roleon the dataset.

Invoke the application

You can invoke the application from within your environment by running thefollowing script.

Note: You can use multiple input tables, as long as they are mapped todifferent metadata configurations.

call`<project_id>.<dataset_id>.TransUnion_get_identities`("<project_id>.<dataset_id>.TransUnion_er_metadata","1");--usingmetadatatable,and1=config_idforthebatchrun

Support

For technical issues, contact TransUnion Cloud Support.

Billing and usage

TransUnion tracks usage of the application and uses it for billing purposes.Active customers can contact their TransUnion delivery representativefor more information.

Workflow for identity providers

The following sections show identity providers how to configure entityresolution in BigQuery. For a visual representation of thecomplete setup, see thearchitecture for entity resolution.

Before you begin

Create aCloud Runjob or aCloud Run functionto integrate with the remote function. Both options are suitable for thispurpose.
Note the name of the service account that's associated with theCloud Run or Cloud Run function:
1. In the Google Cloud console, go to theCloud Functions page.
  Go to Cloud Functions
2. Click the function's name, and then click theDetails tab.
3. In theGeneral Information pane, find and note the serviceaccount name for the remote function.
Create aremote function.
Collect end-user principals from the end user.

Required roles

To get the permissions that you need to run entity resolution jobs, ask your administrator to grant you the following IAM roles:

For the service account that's associated with your function to read and write on associated datasets and launch jobs:
- BigQuery Data Editor (roles/bigquery.dataEditor) on the project
- BigQuery Job User (roles/bigquery.jobUser) on the project
For the end-user principal to see and connect to the remote function:
- BigQuery Connection User (roles/bigquery.connectionUser) on the connection
- BigQuery Data Viewer (roles/bigquery.dataViewer) on the control plane dataset with the remote function

For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Share entity resolution remote function

Modify and share the following remote interface code with the end user. The enduser needs this code to start the entity resolution job.

`PARTNER_PROJECT_ID.DATASET_ID`.match`(LIST_OF_PARAMETERS)

ReplaceLIST_OF_PARAMETERS with the list of parameters that arepassed to the remote function.

Optional: Provide job metadata

You can optionally provide job metadata by using a separate remote functionor by writing a new status table in the user's output dataset. Examples ofmetadata include job statuses and metrics.

Billing for identity providers

To streamline customer billing and onboarding, we recommend that you integrateyour entity resolution service with theGoogle Cloud Marketplace.This lets you set up apricing modelbased on the entity resolution job usage, with Google handling the billing foryou. For more information, seeOffering software as a service (SaaS) products.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-16 UTC.

Movatterモバイル変換

Configure and use entity resolution in BigQuery

Workflow for end users

Before you begin

Required roles

Translate or resolve entities

LiveRamp

Prerequisites

Setup

Create an input table

Create a metadata table

Share tables with LiveRamp

Run an embedded identity job

LiveRamp support

LiveRamp billing

TransUnion

Prerequisites

Setup

Create an external connection

Create a remote function

Create an input table

Create a metadata table

Create a job status table

Create the service invocation

Create the matching output table

Configure metadata

Grant read and write access

Invoke the application

Support

Billing and usage

Workflow for identity providers

Before you begin

Required roles

Share entity resolution remote function

Optional: Provide job metadata

Billing for identity providers