agext/levenshteinPublic

NotificationsYou must be signed in to change notification settings
Fork8
Star90

Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.

License

Apache-2.0 license

90 stars 8 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
.travis.yml		.travis.yml
DCO		DCO
LICENSE		LICENSE
MAINTAINERS		MAINTAINERS
NOTICE		NOTICE
README.md		README.md
go.mod		go.mod
levenshtein.go		levenshtein.go
levenshtein_test.go		levenshtein_test.go
params.go		params.go
params_test.go		params_test.go
test.sh		test.sh

Repository files navigation

A Go package for calculating the Levenshtein distance between two strings

This package implements distance and similarity metrics for strings, based on the Levenshtein measure, inGo.

Project Status

v1.2.3 Stable: Guaranteed no breaking changes to the API in future v1.x releases. Probably safe to use in production, though provided on "AS IS" basis.

This package is being actively maintained. If you encounter any problems or have any suggestions for improvement, pleaseopen an issue. Pull requests are welcome.

Overview

The LevenshteinDistance between two strings is the minimum total cost of edits that would convert the first string into the second. The allowed edit operations are insertions, deletions, and substitutions, all at character (one UTF-8 code point) level. Each operation has a default cost of 1, but each can be assigned its own cost equal to or greater than 0.

ADistance of 0 means the two strings are identical, and the higher the value the more different the strings. Since in practice we are interested in finding if the two strings are "close enough", it often does not make sense to continue the calculation once the result is mathematically guaranteed to exceed a desired threshold. Providing this value to theDistance function allows it to take a shortcut and return a lower bound instead of an exact cost when the threshold is exceeded.

TheSimilarity function calculates the distance, then converts it into a normalized metric within the range 0..1, with 1 meaning the strings are identical, and 0 that they have nothing in common. A minimum similarity threshold can be provided to speed up the calculation of the metric for strings that are far too dissimilar for the purpose at hand. All values under this threshold are rounded down to 0.

TheMatch function provides a similarity metric, with the same range and meaning asSimilarity, but with a bonus for string pairs that share a common prefix and have a similarity above a "bonus threshold". It uses the same method as proposed by Winkler for the Jaro distance, and the reasoning behind it is that these string pairs are very likely spelling variations or errors, and they are more closely linked than the edit distance alone would suggest.

The underlyingCalculate function is also exported, to allow the building of other derivative metrics, if needed.

Installation

go get github.com/agext/levenshtein

License

Package levenshtein is released under the Apache 2.0 license. See theLICENSE file for details.

About

Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.

Releases6

v1.2.3 Latest

Mar 12, 2020

+ 5 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

A Go package for calculating the Levenshtein distance between two strings

Project Status

Overview

Installation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases6

Packages

Uh oh!

Contributors2

Languages

Movatterモバイル変換

License

agext/levenshtein

Folders and files

Latest commit

History

Repository files navigation

A Go package for calculating the Levenshtein distance between two strings

Project Status

Overview

Installation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases6

Packages0

Uh oh!

Contributors2

Languages

Packages