- Notifications
You must be signed in to change notification settings - Fork2
dav009/abacus
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Abacus let you count item frequencies in big datasets with a fixed amount of memory.
Unlike a regular counter it trades off accuracy for memory.This is useful for particular tasks, for example in NLP/ML related tasks you might want to count millions of itemshowever approximate counts are good enough.
Example:
counter:=abacus.New(maxMemoryMB=10)// abacus will use max 10MB to store your countscounter.Update([]string{"item1","item2","item2"})counter.Counts("item1")// 1 , counts for "item1"counter.Total()// 3 ,Total number of counts (sum of counts of all elements)counter.Cardinality()// 2 , How many different items are there?
Abacus lets you define how much memory you want to use and you go from there counting items.Of course there are some limitations, and if you set the memory threshold too low, you might get innacurate counts.
- Counting bigrams (words) fromWiki corpus.
- Compared memory and accuracy of
Abacus
vs using amap[string]int
Corpus Data Structure Used Memory Accuracy
Corpus | Data Structure | Used Memory | Accuracy |
---|---|---|---|
Half of Wiki corpus (English) | Abacus (1000MB) | 1.75GB | 96% |
Half of Wiki corpus (English) | Abacus (Log8) (200MB) | 369MB | 70% |
Half of Wiki corpus (English) | Abacus (Log8) (400MB) | 407MB | 98% |
Half of Wiki corpus (English) | Map | 3.3GB | 100% |
Corpus | Data Structure | Used Memory | Accuracy |
---|---|---|---|
Complete Wiki corpus (English) | Abacus (2200MB) | 3.63GB | 98% |
Complete Wiki corpus (English) | Abacus (500MB) | 741MB | 15% |
Complete Wiki corpus (English) | Abacus (Log8) (500MB) | 760MB | 90% |
Complete Wiki corpus (English) | Abacus (Log8) (700MB) | 889MB | 97% |
Complete Wiki corpus (English) | Map | 10.46GB | 100% |
Note: This is me playing with Golang again, heavily based onBounter
Used to count item frequencies.
Used to calculate the cardinality
Icon made byfree-icon