
Provides a high-performance interface for calculating stringsimilarities and distances, leveraging the efficient C++ libraryRapidFuzzdeveloped by Max Bachmann and Adam Cohen. This package integrates theC++ implementation, allowing R users to access cutting-edge algorithmsfor fuzzy matching and text analysis.
You can install directly from CRAN or the development version ofpikchr fromGitHub with:
# install.packages("pak")pak::pak("StrategicProjects/RapidFuzz")library(RapidFuzz)TheRapidFuzz package is an R wrapper around the highlyefficient RapidFuzz C++ library. It provides implementations of multiplestring comparison and similarity metrics, such as Levenshtein,Jaro-Winkler, and Damerau-Levenshtein distances. This package isparticularly useful for applications like record linkage, approximatestring matching, and fuzzy text processing.
String comparison algorithms calculate distances and similaritiesbetween two sequences of characters. These distances help to quantifyhow similar two strings are. For example, the Levenshtein distancemeasures the minimum number of single-character edits required totransform one string into another.
RapidFuzz leverages advanced algorithms to ensure high performancewhile maintaining accuracy. The original library is open-source and canbe accessed onRapidFuzz GitHubRepository.
processString(): Process a string with options to trim,convert to lowercase, and transliterate to ASCII.opcodes_apply_str(): Apply Opcodes to transform astring.opcodes_apply_vec(): Apply Opcodes to transform astring into a character vector.get_editops(): Retrieve Edit Operations between twostrings.editops_apply_str(): Apply Edit Operations to transforma string.editops_apply_vec(): Apply Edit Operations to transforma string into a character vector.damerau_levenshtein_distance(): Calculate theDamerau-Levenshtein Distance.damerau_levenshtein_normalized_distance(): Calculatethe Normalized Damerau-Levenshtein Distance.damerau_levenshtein_normalized_similarity(): Calculatethe Normalized Damerau-Levenshtein Similarity.damerau_levenshtein_similarity(): Calculate theDamerau-Levenshtein Similarity.fuzz_QRatio(): Perform a Quick Ratio Calculation.fuzz_WRatio(): Perform a Weighted RatioCalculation.fuzz_partial_ratio(): Calculate Partial Ratio.fuzz_ratio(): Calculate a Simple Ratio.fuzz_token_ratio(): Calculate Combined TokenRatio.fuzz_token_set_ratio(): Perform Token Set RatioCalculation.fuzz_token_sort_ratio(): Perform Token Sort RatioCalculation.hamming_distance(): Calculate Hamming Distance.hamming_normalized_distance(): Calculate NormalizedHamming Distance.hamming_normalized_similarity(): Calculate NormalizedHamming Similarity.hamming_similarity(): Calculate HammingSimilarity.indel_distance(): Calculate Indel Distance.indel_normalized_distance(): Calculate Normalized IndelDistance.indel_normalized_similarity(): Calculate NormalizedIndel Similarity.indel_similarity(): Calculate Indel Similarity.jaro_distance(): Calculate Jaro Distance.jaro_normalized_distance(): Calculate Normalized JaroDistance.jaro_normalized_similarity(): Calculate Normalized JaroSimilarity.jaro_similarity(): Calculate Jaro Similarity.jaro_winkler_distance(): Calculate Jaro-WinklerDistance.jaro_winkler_normalized_distance(): CalculateNormalized Jaro-Winkler Distance.jaro_winkler_normalized_similarity(): CalculateNormalized Jaro-Winkler Similarity.jaro_winkler_similarity(): Calculate Jaro-WinklerSimilarity.lcs_seq_distance(): Calculate LCSseq Distance.lcs_seq_editops(): Retrieve LCSseq EditOperations.lcs_seq_normalized_distance(): Calculate NormalizedLCSseq Distance.lcs_seq_normalized_similarity(): Calculate NormalizedLCSseq Similarity.lcs_seq_similarity(): Calculate LCSseq Similarity.levenshtein_distance(): Calculate LevenshteinDistance.levenshtein_normalized_distance(): Calculate NormalizedLevenshtein Distance.levenshtein_normalized_similarity(): CalculateNormalized Levenshtein Similarity.levenshtein_similarity(): Calculate LevenshteinSimilarity.osa_distance(): Calculate Distance Using OSA.osa_editops(): Retrieve Edit Operations Using OSA.osa_normalized_distance(): Calculate NormalizedDistance Using OSA.osa_normalized_similarity(): Calculate NormalizedSimilarity Using OSA.osa_similarity(): Calculate Similarity Using OSA.prefix_distance(): Calculate the Prefix Distancebetween two strings.prefix_normalized_distance(): Calculate the NormalizedPrefix Distance between two strings.prefix_normalized_similarity(): Calculate theNormalized Prefix Similarity between two strings.prefix_similarity(): Calculate the Prefix Similaritybetween two strings.prefix_distance("abcdef","abcxyz")# Output: 3prefix_normalized_similarity("abcdef","abcxyz",score_cutoff =0.0)# Output: 0.5postfix_distance("abcdef","xyzdef")# Output: 3damerau_levenshtein_distance("abcdef","abcfed")# Output: 2# Example dataquery<-"new york jets"choices<-c("Atlanta Falcons","New York Jets","New York Giants","Dallas Cowboys")score_cutoff<-0.0# Find the best matchextract_matches(query, choices, score_cutoff,scorer ="PartialRatio")# Output:# choice score# 1 New York Jets 100.00000# 2 New York Giants 81.81818# 3 Atlanta Falcons 33.33333TheRapidFuzz package is a wrapper of theRapidFuzz C++ library,developed by Max Bachmann and Adam Cohen. The library implementsefficient algorithms for approximate string matching and comparison.