Introduction to the BigQuery entity resolution framework
This document describes the architecture of the BigQuery entityresolution framework. Entity resolution matches records acrossshared data where no common identifier exists or augments shared data using anidentity service from a Google Cloud partner.
This document is for entity resolution end users and identity providers. Forimplementation details, seeConfigure and use entity resolution in BigQuery.
You can use BigQuery entity resolution for data prepared beforeyou contribute it to adata clean room.Entity resolution is available in on-demand and capacity pricing models and inall BigQuery editions.
Benefits
End users gain the following benefits from entity resolution:
- Resolve entities in place without data transfer fees. A subscriber orGoogle Cloud partner matches your data to their identity tableand writes the match results to a dataset in your Google Cloud project.
- Avoid managing extract, transform, and load (ETL) jobs.
Identity providers gain the following benefits from entity resolution:
- Offer entity resolution as a managed software as a service (SaaS)offering onGoogle Cloud Marketplace.
- Use proprietary identity graphs and match logic withoutrevealing them to users.
Architecture
BigQuery implements entity resolution using remote functioncalls that activate entity resolution processes in an identity provider'senvironment. Your data isn't copied or moved during this process.The following diagram and explanation describe the entity resolution workflow:
- The end user grants the identity provider's service account read accessto their input dataset and write access to their output dataset.
- The user calls the remote function that matches their input data withthe provider's identity graph data. The remote function passes matching parameters to the provider.
- The provider's service account reads and processes the input dataset.
- The provider's service account writes the entity resolution results tothe user's output dataset.
The following sections describe the end-user components and provider projects.
End-user components
End-user components include the following:
- Remote function call: a call that runs a procedure defined andimplemented by the identity provider. This call starts the entity resolutionprocess.
- Input dataset: the source dataset that contains the data to bematched. Optionally, the dataset can contain a metadata table withadditional parameters. Providers specify schema requirements for inputdatasets.
- Output dataset: the destination dataset where the provider storesthe matched results as an output table. Optionally, the provider can writea job status table that contains entity resolution job details to thisdataset. The output dataset can be the same as the input dataset.
Identity provider components
Identity provider components include the following:
- Control plane: contains aBigQuery remote functionthat orchestrates the matching process. This function can be implemented as aCloud Runjob, or aCloud Run function.The control plane can also contain other services, such as authentication andauthorization.
- Data plane: contains the identity graph dataset and the storedprocedure that implements the provider matching logic. The stored procedurecan be implemented as aSQL stored procedureor anApache Spark stored procedure.The identity graph dataset contains the tables that the end-user data ismatched against.
What's next
- Learn how toconfigure and use entity resolution.
- Learn aboutremote functions.
- Learn aboutstored procedures.
- Learn aboutdata clean rooms.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.