Movatterモバイル変換


[0]ホーム

URL:


RapidFuzzRapidFuzz website

Provides a high-performance interface for calculating stringsimilarities and distances, leveraging the efficient C++ libraryRapidFuzzdeveloped by Max Bachmann and Adam Cohen. This package integrates theC++ implementation, allowing R users to access cutting-edge algorithmsfor fuzzy matching and text analysis.

Installation

You can install directly from CRAN or the development version ofpikchr fromGitHub with:

# install.packages("pak")pak::pak("StrategicProjects/RapidFuzz")library(RapidFuzz)

Overview

TheRapidFuzz package is an R wrapper around the highlyefficient RapidFuzz C++ library. It provides implementations of multiplestring comparison and similarity metrics, such as Levenshtein,Jaro-Winkler, and Damerau-Levenshtein distances. This package isparticularly useful for applications like record linkage, approximatestring matching, and fuzzy text processing.

String comparison algorithms calculate distances and similaritiesbetween two sequences of characters. These distances help to quantifyhow similar two strings are. For example, the Levenshtein distancemeasures the minimum number of single-character edits required totransform one string into another.

RapidFuzz leverages advanced algorithms to ensure high performancewhile maintaining accuracy. The original library is open-source and canbe accessed onRapidFuzz GitHubRepository.


Functions

Process String Function

Opcode Functions

Edit Operation Utilities

Edit Operations Functions

Damerau-LevenshteinFunctions

Fuzz Ratio Functions

Hamming Functions

Indel Functions

Jaro Functions

Jaro-Winkler Functions

Longest CommonSubsequence (LCSseq) Functions

Levenshtein Functions

Optimal String Alignment(OSA) Functions

Prefix Functions


Example Usage

Prefix Functions

prefix_distance("abcdef","abcxyz")# Output: 3prefix_normalized_similarity("abcdef","abcxyz",score_cutoff =0.0)# Output: 0.5

Postfix Functions

postfix_distance("abcdef","xyzdef")# Output: 3

Damerau-LevenshteinFunctions

damerau_levenshtein_distance("abcdef","abcfed")# Output: 2

Extract Matches

# Example dataquery<-"new york jets"choices<-c("Atlanta Falcons","New York Jets","New York Giants","Dallas Cowboys")score_cutoff<-0.0# Find the best matchextract_matches(query, choices, score_cutoff,scorer ="PartialRatio")# Output:#            choice     score# 1   New York Jets 100.00000# 2 New York Giants  81.81818# 3 Atlanta Falcons  33.33333

Original Library

TheRapidFuzz package is a wrapper of theRapidFuzz C++ library,developed by Max Bachmann and Adam Cohen. The library implementsefficient algorithms for approximate string matching and comparison.

]

[8]ページ先頭

©2009-2025 Movatter.jp