jaccard-similarity
Here are 186 public repositories matching this topic...
Language:All
Sort:Most stars
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
- Updated
Nov 5, 2025 - Python
Go metrics for calculating string similarity and other string utility functions
- Updated
Oct 8, 2025 - Go
Compare html similarity using structural and style metrics
- Updated
May 11, 2023 - Python
A package to compute medical segmentation metrics.
- Updated
Jul 16, 2024 - Python
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
- Updated
Apr 9, 2025 - Python
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
- Updated
Apr 25, 2022 - Scala
A Clojure library for querying large data-sets on similarity
- Updated
Feb 17, 2019 - Clojure
Spark functions to run popular phonetic and string matching algorithms
- Updated
Feb 22, 2022 - Scala
SetSketch: Filling the Gap between MinHash and HyperLogLog
- Updated
Aug 11, 2021 - C++
ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity
- Updated
Oct 26, 2020 - C++
Calculate various string metrics efficiently in Haskell
- Updated
Nov 3, 2025 - Haskell
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
- Updated
Oct 9, 2021 - Python
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
- Updated
Aug 26, 2020 - C++
Minhash and maxhash library in Python, combining flexibility, expressivity, and performance.
- Updated
Dec 14, 2024 - C
This is an implementation of the paper written by Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett
- Updated
Oct 11, 2019 - Python
Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables
- Updated
May 18, 2025 - Python
Easy-to-use Java library for similarity checking of strings or numeric-series
- Updated
Jan 23, 2020 - Java
A text similarity computation using minhashing and Jaccard distance on reuters dataset
- Updated
Jun 11, 2018 - R
TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation
- Updated
Aug 3, 2025 - C++
insight data engineering fellow project
- Updated
Nov 14, 2016 - Python
Improve this page
Add a description, image, and links to thejaccard-similarity topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thejaccard-similarity topic, visit your repo's landing page and select "manage topics."