- Notifications
You must be signed in to change notification settings - Fork70
chaoslawful/ccard-lib
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
C library for estimating cardinality in data streams, in which case it isinfeasible to store all events in memory.
This library implements a series of cardinality estimating algorithms such asLinear Counting, LogLog Counting, HyperLogLog Counting and Adaptive Counting.For more information about these algorithms please read theReference section.
Building ccard-lib needsscons. Please readsconsuser guide formore information about it.
Building PHP extension of ccard-lib needsSWIG to beinstalled. Running unit-tests needsgoogletest to be installed.
Assuming you have scons installed, just build ccard-lib like this:
scons install
Scons will build and install ccard-lib to your system.
You can also run unit-tests to make sure the library works as expected:
sconstest
By default ccard-lib will be installed at/usr/local/lib
, if you want tochange the install directory please replace the "libdir" setting inSConsturct
file with your target directory.
The following command will build and install card-lib PHP extension:
scons install-php
SWIG is used to generate PHP extension, please installit before run this command.
If you want to uninstall ccard-lib from your system, use the followingcommands:
scons -c install-phpscons -c install
#include"ccard_common.h"#include"adaptive_counting.h"intmain(intargc,char**argv) {int64_ti,esti;/* construct context for cardinality estimator *//* use xxx_cnt_init to construct context */adp_cnt_ctx_t*ctx=adp_cnt_init(NULL,16,CCARD_HASH_MURMUR);printf("Adaptive Counting with Murmurhash:\n");/* add 500,000 elements to set */for (i=1;i <=500000L;i++) {/* use xxx_cnt_offer to add new element to set */adp_cnt_offer(ctx,&i,sizeof(int64_t));/* print estimate result every 50,000 elements has been added */if (i %50000==0) {/* use xxx_cnt_card to get estimate result */esti=adp_cnt_card(ctx);printf("actual: %9lu, estimated: %9lu, error: %+7.2f%%\n", (long unsignedint)i, (long unsignedint)esti, (double)(esti-i) /i*100); } }printf("\n");/* use xxx_cnt_fini to destory context */adp_cnt_fini(ctx);}
#include"ccard_common.h"#include"adaptive_counting.h"intmain(intargc,char**argv) {int64_ti,esti;/* for merging, contexts must have same length of bitmap and hash algorithm */adp_cnt_ctx_t*ctx=adp_cnt_init(NULL,16,CCARD_HASH_LOOKUP3);adp_cnt_ctx_t*tbm1=adp_cnt_init(NULL,16,CCARD_HASH_LOOKUP3);adp_cnt_ctx_t*tbm2=adp_cnt_init(NULL,16,CCARD_HASH_LOOKUP3);int32_tm=1 <<16;/* bitmaps */uint8_tbuf1[m+3],buf2[m+3];uint32_tlen1=m+3,len2=m+3;for (i=1;i <=20000L;i++) {adp_cnt_offer(ctx,&i,sizeof(uint64_t)); }for (i=10000L;i <=30000L;i++) {adp_cnt_offer(tbm1,&i,sizeof(uint64_t)); }/* use xxx_cnt_get_bytes to get bitmap from context */adp_cnt_get_bytes(tbm1,buf1,&len1);for (i=20000L;i <=40000L;i++) {adp_cnt_offer(tbm2,&i,sizeof(uint64_t)); }adp_cnt_get_bytes(tbm2,buf2,&len2);/* use xxx_cnt_merge_bytes to merge bitmaps to context */adp_cnt_merge_bytes(ctx,buf1,len1,buf2,len2,NULL);esti=adp_cnt_card(ctx);printf("actual:40000, estimated: %9lu, error: %+7.2f%%\n", (long unsignedint)esti, (double)(esti-40000) /40000*100);adp_cnt_fini(tbm2);adp_cnt_fini(tbm1);adp_cnt_fini(ctx);}
Source codes should always be formatted before committing by running scriptutil/indent-src
in top-dir. It utilizedastyle to do the job, so you probably want toinstall it first.Make sure you install astyle v2.03 or later, as theindenting result differs from previous versions (seehere for details)
- K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor.[A Linear-TimeProbabilistic Counting Algorithm for Database Applications](http://dblab.kaist.ac.kr/Publication/pdf/ACM90_TODS_v15n2.pdf). ACMTransactions on Database Systems, 15(2):208-229, 1990.
- Marianne Durand and Philippe Flajolet.LogLog counting of largecardinalities.In ESA03, volume 2832 of LNCS, pages 605-617, 2003.
- Min Cai, Jianping Pan, Yu K. Kwok, and Kai Hwang.[Fast and accuratetraffic matrix measurement using adaptive cardinality counting](http://gridsec.usc.edu/files/tr/tr-2005-12.pdf). InMineNet '05: Proceedings of the 2005 ACM SIGCOMM workshop onMining network data, pages 205-206, New York, NY, USA, 2005. ACM.
- P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier.Hyperloglog: The analysis of a near-optimal cardinalityestimation algorithm.Disc. Math. and Theor. Comp. Sci., AH:127-146, 2007.
- Stefan Heule, Marc Nunkesser, Alex Hall.HyperLogLog in Practice:Algorithmic Engineering of a State of The Art Cardinality EstimationAlgorithm.In Proceedings of the EDBT 2013 Conference, ACM, Genoa, Italy.
The implemention refersstream-lib.
The following estimating results is calculated using bitmap with length of 2^16(64k) bytes:
Linear Counting with Murmurhash:actual: 50000, estimated: 50062, error: 0.12%actual: 100000, estimated: 99924, error: 0.08%actual: 150000, estimated: 149865, error: 0.09%actual: 200000, estimated: 199916, error: 0.04%actual: 250000, estimated: 250123, error: 0.05%actual: 300000, estimated: 299942, error: 0.02%actual: 350000, estimated: 349801, error: 0.06%actual: 400000, estimated: 400101, error: 0.03%actual: 450000, estimated: 449955, error: 0.01%actual: 500000, estimated: 500065, error: 0.01%Linear Counting with Lookup3hash:actual: 50000, estimated: 49835, error: 0.33%actual: 100000, estimated: 99461, error: 0.54%actual: 150000, estimated: 149006, error: 0.66%actual: 200000, estimated: 198501, error: 0.75%actual: 250000, estimated: 248365, error: 0.65%actual: 300000, estimated: 298065, error: 0.65%actual: 350000, estimated: 347504, error: 0.71%actual: 400000, estimated: 397292, error: 0.68%actual: 450000, estimated: 446700, error: 0.73%actual: 500000, estimated: 495944, error: 0.81%Hyperloglog Counting with Murmurhash:actual: 50000, estimated: 50015, error: 0.03%actual: 100000, estimated: 100048, error: 0.05%actual: 150000, estimated: 149709, error: 0.19%actual: 200000, estimated: 201595, error: 0.80%actual: 250000, estimated: 250168, error: 0.07%actual: 300000, estimated: 299864, error: 0.05%actual: 350000, estimated: 348571, error: 0.41%actual: 400000, estimated: 398583, error: 0.35%actual: 450000, estimated: 448632, error: 0.30%actual: 500000, estimated: 498330, error: 0.33%Hyperloglog Counting with Lookup3hash:actual: 50000, estimated: 49628, error: 0.74%actual: 100000, estimated: 99357, error: 0.64%actual: 150000, estimated: 148880, error: 0.75%actual: 200000, estimated: 200475, error: 0.24%actual: 250000, estimated: 249362, error: 0.26%actual: 300000, estimated: 299119, error: 0.29%actual: 350000, estimated: 349225, error: 0.22%actual: 400000, estimated: 398805, error: 0.30%actual: 450000, estimated: 448373, error: 0.36%actual: 500000, estimated: 498183, error: 0.36%Adaptive Counting with Murmurhash:actual: 50000, estimated: 50015, error: 0.03%actual: 100000, estimated: 100048, error: 0.05%actual: 150000, estimated: 149709, error: 0.19%actual: 200000, estimated: 201059, error: 0.53%actual: 250000, estimated: 249991, error: 0.00%actual: 300000, estimated: 300067, error: 0.02%actual: 350000, estimated: 349610, error: 0.11%actual: 400000, estimated: 399875, error: 0.03%actual: 450000, estimated: 450348, error: 0.08%actual: 500000, estimated: 500977, error: 0.20%Adaptive Counting with Lookup3hash:actual: 50000, estimated: 49628, error: 0.74%actual: 100000, estimated: 99357, error: 0.64%actual: 150000, estimated: 148880, error: 0.75%actual: 200000, estimated: 199895, error: 0.05%actual: 250000, estimated: 249563, error: 0.17%actual: 300000, estimated: 299047, error: 0.32%actual: 350000, estimated: 348665, error: 0.38%actual: 400000, estimated: 399266, error: 0.18%actual: 450000, estimated: 450196, error: 0.04%actual: 500000, estimated: 499516, error: 0.10%Loglog Counting with Murmurhash:actual: 50000, estimated: 59857, error: 19.71%actual: 100000, estimated: 103108, error: 3.11%actual: 150000, estimated: 150917, error: 0.61%actual: 200000, estimated: 201059, error: 0.53%actual: 250000, estimated: 249991, error: 0.00%actual: 300000, estimated: 300067, error: 0.02%actual: 350000, estimated: 349610, error: 0.11%actual: 400000, estimated: 399875, error: 0.03%actual: 450000, estimated: 450348, error: 0.08%actual: 500000, estimated: 500977, error: 0.20%Loglog Counting with Lookup3hash:actual: 50000, estimated: 59870, error: 19.74%actual: 100000, estimated: 103044, error: 3.04%actual: 150000, estimated: 150435, error: 0.29%actual: 200000, estimated: 199895, error: 0.05%actual: 250000, estimated: 249563, error: 0.17%actual: 300000, estimated: 299047, error: 0.32%actual: 350000, estimated: 348665, error: 0.38%actual: 400000, estimated: 399266, error: 0.18%actual: 450000, estimated: 450196, error: 0.04%actual: 500000, estimated: 499516, error: 0.10%HyperloglogPlus Counting with Murmurhash 64bit:actual: 50000, estimated: 49801, error: 0.40%actual: 100000, estimated: 101098, error: 1.10%actual: 150000, estimated: 151488, error: 0.99%actual: 200000, estimated: 201337, error: 0.67%actual: 250000, estimated: 252130, error: 0.85%actual: 300000, estimated: 301995, error: 0.66%actual: 350000, estimated: 352194, error: 0.63%actual: 400000, estimated: 402413, error: 0.60%actual: 450000, estimated: 454293, error: 0.95%actual: 500000, estimated: 503228, error: 0.65%
About
C library for estimating cardinality in streams for which it is infeasible to store all events in memory
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors6
Uh oh!
There was an error while loading.Please reload this page.