data-matching
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Here are 33 public repositories matching this topic...
Language:All
Sort:Most stars
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
- Updated
Jul 6, 2025 - Python
A powerful and modular toolkit for record linkage and duplicate detection in Python
- Updated
Feb 21, 2024 - Python
A list of free data matching and record linkage software.
- Updated
Feb 21, 2024
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
- Updated
Aug 9, 2022 - Python
🔎 Finds fuzzy matches between CSV files
- Updated
Mar 26, 2025 - Python
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
- Updated
Nov 18, 2022 - Jupyter Notebook
Resources for tackling record linkage / deduplication / data matching problems
- Updated
Feb 22, 2024
Link Wikidata items to large catalogs
- Updated
Mar 3, 2025 - Python
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
- Updated
Jul 14, 2025 - Python
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
- Updated
Jul 21, 2025 - Python
A browser user interface for manual labeling of record pairs.
- Updated
Jun 23, 2023 - JavaScript
Welcome to Snowman App – a Data Matching Benchmark Platform.
- Updated
Feb 9, 2023 - TypeScript
A maximum-strength name parser for record linkage.
- Updated
Jun 15, 2025 - Python
Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).
- Updated
May 24, 2024 - R
Compound AI toolchain for fast and accurate entity matching, powered by LLMs.
- Updated
Mar 25, 2025 - Python
🔎 Finds fuzzy matches between datasets
- Updated
Jun 1, 2025 - Python
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
- Updated
Jul 12, 2025 - Java
A collection of awesome resources regarding Record Linkage.
- Updated
Aug 16, 2024
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
- Updated
Oct 14, 2024 - HTML
Created by Halbert L. Dunn
Released 1946
- Followers
- 44 followers
- Organization
- entity-resolution
- Website
- github.com/topics/entity-resolution
- Wikipedia
- Wikipedia