record-linkage
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Here are 126 public repositories matching this topic...
Language:All
Sort:Most stars
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
- Updated
Nov 25, 2024 - Python
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
- Updated
Feb 10, 2025 - C
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
- Updated
Mar 19, 2025 - Python
A powerful and modular toolkit for record linkage and duplicate detection in Python
- Updated
Feb 21, 2024 - Python
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
- Updated
Jun 23, 2024 - JavaScript
🆔 Command line tool for deduplicating CSV files
- Updated
Mar 31, 2020 - Python
🆔 Examples for using the dedupe library
- Updated
Aug 10, 2024 - Python
A list of free data matching and record linkage software.
- Updated
Feb 21, 2024
Super Fast String Matching in Python
- Updated
Mar 14, 2025 - Python
🔎 Finds fuzzy matches between CSV files
- Updated
Feb 2, 2025 - Python
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
- Updated
Nov 18, 2022 - Jupyter Notebook
Link Discovery Framework for Metric Spaces.
- Updated
Aug 14, 2024 - JavaScript
Spark RDD with Lucene's query and entity linkage capabilities
- Updated
Jan 25, 2025 - Scala
Resources for tackling record linkage / deduplication / data matching problems
- Updated
Feb 22, 2024
A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
- Updated
Feb 21, 2025 - Python
Record Linkage ToolKit (Find and link entities)
- Updated
Aug 14, 2023 - Python
Link Wikidata items to large catalogs
- Updated
Mar 3, 2025 - Python
Python package for deduplication/entity resolution using active learning
- Updated
Aug 24, 2024 - Python
Python implementation of anonymous linkage using cryptographic linkage keys
- Updated
May 18, 2024 - Python
List of entity resolution software and resources.
- Updated
Feb 22, 2025
Created by Halbert L. Dunn
Released 1946
- Followers
- 39 followers
- Organization
- entity-resolution
- Wikipedia
- Wikipedia