Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Simhash implementation in Javascript

License

NotificationsYou must be signed in to change notification settings

vkandy/simhash-js

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Javascript implementation of Charikar's hash for identification of similar documents.

What is Simhash?

Consider two documents A and B that differ in just a single byte.

Hash functions such as SHA-2 or MD5 will hash contents of these two documents into two completely different and unrelated hash values. The Hamming distance between md5(A) and md5(B) would be large. In fact, that is one of the goals of cryptographic hash functions such as SHA-2 or MD5 - to minimize collisions in hash values they generate.

By contrast, Simhash will hash contents of A and B to similar hash values. The Hamming distance between simhash(A) and simhash(B) would be small.

Usage

var sjs = require('simhash-js');var simhash = new sjs.SimHash();var x = simhash.hash("This is a test of the Emergency Blogcast System");var y = simhash.hash("This is a second test of the Emergency Blogcast System");var s = sjs.Comparator.similarity(x, y);

To Do

  • Implement an efficient priority queue
  • Accept a list of stop words to be removed from input prior to calculating hash

References

  • Charikar: Similarity Estimation Techniques from Rounding Algorithms, in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, ACM Press, 2002
  • Manku, Jain, Sarma: Detecting Near-Duplicates for Web Crawling. in Proceedings of the 16th international conference on World Wide Web, ACM Press, 2007

Contributors

Sincere thanks to:

About

Simhash implementation in Javascript

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp