Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Counter Data structure for Golang using CountMin Sketch with a fixed amount of memory

NotificationsYou must be signed in to change notification settings

dav009/abacus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abacus

Abacus let you count item frequencies in big datasets with a fixed amount of memory.

Unlike a regular counter it trades off accuracy for memory.This is useful for particular tasks, for example in NLP/ML related tasks you might want to count millions of itemshowever approximate counts are good enough.

Example:

counter:=abacus.New(maxMemoryMB=10)// abacus will use max 10MB to store your countscounter.Update([]string{"item1","item2","item2"})counter.Counts("item1")// 1 , counts for "item1"counter.Total()// 3 ,Total number of counts (sum of counts of all elements)counter.Cardinality()// 2 , How many different items are there?

Abacus lets you define how much memory you want to use and you go from there counting items.Of course there are some limitations, and if you set the memory threshold too low, you might get innacurate counts.

Benchmarks

  • Counting bigrams (words) fromWiki corpus.
  • Compared memory and accuracy ofAbacus vs using amap[string]int

Corpus Data Structure Used Memory Accuracy

CorpusData StructureUsed MemoryAccuracy
Half of Wiki corpus (English)Abacus (1000MB)1.75GB96%
Half of Wiki corpus (English)Abacus (Log8) (200MB)369MB70%
Half of Wiki corpus (English)Abacus (Log8) (400MB)407MB98%
Half of Wiki corpus (English)Map3.3GB100%
CorpusData StructureUsed MemoryAccuracy
Complete Wiki corpus (English)Abacus (2200MB)3.63GB98%
Complete Wiki corpus (English)Abacus (500MB)741MB15%
Complete Wiki corpus (English)Abacus (Log8) (500MB)760MB90%
Complete Wiki corpus (English)Abacus (Log8) (700MB)889MB97%
Complete Wiki corpus (English)Map10.46GB100%

Note: This is me playing with Golang again, heavily based onBounter

Under the hood

Count–min sketch

Used to count item frequencies.

HyperLogLog

Used to calculate the cardinality


Icon made byfree-icon

About

Counter Data structure for Golang using CountMin Sketch with a fixed amount of memory

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp