Movatterモバイル変換

rapidfuzz/rapidfuzz-cppPublic

NotificationsYou must be signed in to change notification settings
Fork55
Star337

Rapid fuzzy string matching in C++ using the Levenshtein Distance

rapidfuzz.github.io/rapidfuzz-cpp

License

MIT license

337 stars 55 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 424 Commits
.github		.github
bench		bench
cmake		cmake
docs/literature		docs/literature
examples		examples
extras		extras
fuzzing		fuzzing
rapidfuzz		rapidfuzz
rapidfuzz_reference		rapidfuzz_reference
test		test
tools		tools
.clang-format		.clang-format
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
Doxyfile		Doxyfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Repository files navigation

Rapid fuzzy string matching in C++ using the Levenshtein Distance

Description •Installation •Usage •License

Description

RapidFuzz is a fast string matching library for Python and C++, which is using the string similarity calculations fromFuzzyWuzzy. However, there are two aspects that set RapidFuzz apart from FuzzyWuzzy:

It is MIT licensed so it can be used whichever License you might want to choose for your project, while you're forced to adopt the GPL license when using FuzzyWuzzy
It is mostly written in C++ and on top of this comes with a lot of Algorithmic improvements to make string matching even faster, while still providing the same results. More details on these performance improvements in the form of benchmarks can be foundhere

The Library is split across multiple repositories for the different supported programming languages:

The C++ version is versioned in this repository
The Python version can be found atrapidfuzz/rapidfuzz

CMake Integration

There are severals ways to integraterapidfuzz in your CMake project.

By Installing it

git clone https://github.com/rapidfuzz/rapidfuzz-cpp.git rapidfuzz-cppcd rapidfuzz-cppmkdir build&&cd buildcmake .. -DCMAKE_BUILD_TYPE=Releasecmake --build.cmake --build. --target install

Then in your CMakeLists.txt:

find_package(rapidfuzz REQUIRED)add_executable(foo main.cpp)target_link_libraries(foo rapidfuzz::rapidfuzz)

Add this repository as a submodule

git submodule add https://github.com/rapidfuzz/rapidfuzz-cpp.git 3rdparty/RapidFuzz

Then you can either:

include it as a subdirectory

add_subdirectory(3rdparty/RapidFuzz)add_executable(foo main.cpp)target_link_libraries(foo rapidfuzz::rapidfuzz)

build it at configure time withFetchContent

FetchContent_Declare(  rapidfuzz  SOURCE_DIR${CMAKE_SOURCE_DIR}/3rdparty/RapidFuzzPREFIX${CMAKE_CURRENT_BINARY_DIR}/rapidfuzz  CMAKE_ARGS -DCMAKE_INSTALL_PREFIX:PATH=<INSTALL_DIR>"${CMAKE_OPT_ARGS}")FetchContent_MakeAvailable(rapidfuzz)add_executable(foo main.cpp)target_link_libraries(fooPRIVATE rapidfuzz::rapidfuzz)

Download it at configure time

If you don't want to addrapidfuzz-cpp as a submodule, you can also download it withFetchContent:

FetchContent_Declare(rapidfuzz  GIT_REPOSITORY https://github.com/rapidfuzz/rapidfuzz-cpp.git  GIT_TAG main)FetchContent_MakeAvailable(rapidfuzz)add_executable(foo main.cpp)target_link_libraries(fooPRIVATE rapidfuzz::rapidfuzz)

It will be downloaded each time you run CMake in a blank folder.

CMake option

There are CMake options available:

RAPIDFUZZ_BUILD_TESTING : to build test (default OFF and requiresCatch2)
RAPIDFUZZ_BUILD_BENCHMARKS : to build benchmarks (default OFF and requiresGoogle Benchmark)
RAPIDFUZZ_INSTALL : to install the library to local computer
- When configured independently, installation is on.
- When used as a subproject, the installation is turned off by default.
- For library developers, you might want to toggle the behavior depending on your project.
- If your project is exported viaCMake, turn installation on or export error will result.
- If your project publicly depends onRapidFuzz (includesrapidfuzz.hpp in header),turn installation on or apps depending on your project would face include errors.

Usage

#include<rapidfuzz/fuzz.hpp>

Simple Ratio

using rapidfuzz::fuzz::ratio;// score is 96.55171966552734double score = rapidfuzz::fuzz::ratio("this is a test","this is a test!");

Partial Ratio

// score is 100double score = rapidfuzz::fuzz::partial_ratio("this is a test","this is a test!");

Token Sort Ratio

// score is 90.90908813476562double score = rapidfuzz::fuzz::ratio("fuzzy wuzzy was a bear","wuzzy fuzzy was a bear")// score is 100double score = rapidfuzz::fuzz::token_sort_ratio("fuzzy wuzzy was a bear","wuzzy fuzzy was a bear")

Token Set Ratio

// score is 83.8709716796875double score = rapidfuzz::fuzz::token_sort_ratio("fuzzy was a bear","fuzzy fuzzy was a bear")// score is 100double score = rapidfuzz::fuzz::token_set_ratio("fuzzy was a bear","fuzzy fuzzy was a bear")

Process

In the Python implementation, there is a module process, which is used to compare e.g. a string to a list of strings.In Python, this both saves the time to implement those features yourself and can be a lot more efficient than repeated typeconversions between Python and C++. Implementing a similar function in C++ using templates is not easily possible and probably slower than implementing them on your own. That's why this section describes how users can implement those features with a couple of lines of code using the C++ library.

extract

The following function compares a query string to all strings in a list of choices. It returns allelements with a similarity over score_cutoff. Generally make use of the cached implementations when comparinga string to multiple strings.

template<typename Sentence1,typename Iterable,typename Sentence2 =typename Iterable::value_type>std::vector<std::pair<Sentence2,double>>extract(const Sentence1& query,const Iterable& choices,constdouble score_cutoff =0.0){  std::vector<std::pair<Sentence2,double>> results;  rapidfuzz::fuzz::CachedRatio<typename Sentence1::value_type>scorer(query);for (constauto& choice : choices) {double score = scorer.similarity(choice, score_cutoff);if (score >= score_cutoff) {      results.emplace_back(choice, score);    }  }return results;}

extractOne

The following function compares a query string to all strings in a list of choices.

template<typename Sentence1,typename Iterable,typename Sentence2 =typename Iterable::value_type>std::optional<std::pair<Sentence2,double>>extractOne(const Sentence1& query,const Iterable& choices,constdouble score_cutoff =0.0){bool match_found =false;double best_score = score_cutoff;  Sentence2 best_match;  rapidfuzz::fuzz::CachedRatio<typename Sentence1::value_type>scorer(query);for (constauto& choice : choices) {double score = scorer.similarity(choice, best_score);if (score >= best_score) {      match_found =true;      best_score = score;      best_match = choice;    }  }if (!match_found) {returnnullopt;  }returnstd::make_pair(best_match, best_score);}

multithreading

It is very simple to use those scorers e.g. with open OpenMP to achieve better performance.

template<typename Sentence1,typename Iterable,typename Sentence2 =typename Iterable::value_type>std::vector<std::pair<Sentence2,double>>extract(const Sentence1& query,const Iterable& choices,constdouble score_cutoff =0.0){  std::vector<std::pair<Sentence2,double>>results(choices.size());  rapidfuzz::fuzz::CachedRatio<typename Sentence1::value_type>scorer(query);  #pragma omp parallel forfor (size_t i =0; i < choices.size(); ++i) {double score = scorer.similarity(choices[i], score_cutoff);    results[i] =std::make_pair(choices[i], score);  }return results;}

License

RapidFuzz is licensed under the MIT license since I believe that everyone should be able to use it without being forced to adopt the GPL license. That's why the library is based on an older version of fuzzywuzzy that was MIT-licensed as well.This old version of fuzzywuzzy can be foundhere.

About

Rapid fuzzy string matching in C++ using the Levenshtein Distance

rapidfuzz.github.io/rapidfuzz-cpp

Releases46

Release 3.3.3 Latest

Aug 27, 2025

+ 45 releases

Contributors17

+ 3 contributors

Languages

C++99.3%
Other0.7%

Movatterモバイル変換

License

rapidfuzz/rapidfuzz-cpp

Folders and files

Latest commit

History

Repository files navigation

Rapid fuzzy string matching in C++ using the Levenshtein Distance

Description

CMake Integration

By Installing it

Add this repository as a submodule

Download it at configure time

CMake option

Usage

Simple Ratio

Partial Ratio

Token Sort Ratio

Token Set Ratio

Process

extract

extractOne

multithreading

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases46

Uh oh!

Contributors17

Uh oh!

Languages