- Notifications
You must be signed in to change notification settings - Fork70
chaoslawful/ccard-lib
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
C library for estimating cardinality in data streams, in which case it isinfeasible to store all events in memory.
This library implements a series of cardinality estimating algorithms such asLinear Counting, LogLog Counting, HyperLogLog Counting and Adaptive Counting.For more information about these algorithms please read theReference section.
Building ccard-lib needsscons. Please readsconsuser guide formore information about it.
Building PHP extension of ccard-lib needsSWIG to beinstalled. Running unit-tests needsgoogletest to be installed.
Assuming you have scons installed, just build ccard-lib like this:
scons install
Scons will build and install ccard-lib to your system.
You can also run unit-tests to make sure the library works as expected:
sconstest
By default ccard-lib will be installed at/usr/local/lib
, if you want tochange the install directory please replace the "libdir" setting inSConsturct
file with your target directory.
The following command will build and install card-lib PHP extension:
scons install-php
SWIG is used to generate PHP extension, please installit before run this command.
If you want to uninstall ccard-lib from your system, use the followingcommands:
scons -c install-phpscons -c install
#include"ccard_common.h"#include"adaptive_counting.h"intmain(intargc,char**argv) {int64_ti,esti;/* construct context for cardinality estimator *//* use xxx_cnt_init to construct context */adp_cnt_ctx_t*ctx=adp_cnt_init(NULL,16,CCARD_HASH_MURMUR);printf("Adaptive Counting with Murmurhash:\n");/* add 500,000 elements to set */for (i=1;i <=500000L;i++) {/* use xxx_cnt_offer to add new element to set */adp_cnt_offer(ctx,&i,sizeof(int64_t));/* print estimate result every 50,000 elements has been added */if (i %50000==0) {/* use xxx_cnt_card to get estimate result */esti=adp_cnt_card(ctx);printf("actual: %9lu, estimated: %9lu, error: %+7.2f%%\n", (long unsignedint)i, (long unsignedint)esti, (double)(esti-i) /i*100); } }printf("\n");/* use xxx_cnt_fini to destory context */adp_cnt_fini(ctx);}
#include"ccard_common.h"#include"adaptive_counting.h"intmain(intargc,char**argv) {int64_ti,esti;/* for merging, contexts must have same length of bitmap and hash algorithm */adp_cnt_ctx_t*ctx=adp_cnt_init(NULL,16,CCARD_HASH_LOOKUP3);adp_cnt_ctx_t*tbm1=adp_cnt_init(NULL,16,CCARD_HASH_LOOKUP3);adp_cnt_ctx_t*tbm2=adp_cnt_init(NULL,16,CCARD_HASH_LOOKUP3);int32_tm=1 <<16;/* bitmaps */uint8_tbuf1[m+3],buf2[m+3];uint32_tlen1=m+3,len2=m+3;for (i=1;i <=20000L;i++) {adp_cnt_offer(ctx,&i,sizeof(uint64_t)); }for (i=10000L;i <=30000L;i++) {adp_cnt_offer(tbm1,&i,sizeof(uint64_t)); }/* use xxx_cnt_get_bytes to get bitmap from context */adp_cnt_get_bytes(tbm1,buf1,&len1);for (i=20000L;i <=40000L;i++) {adp_cnt_offer(tbm2,&i,sizeof(uint64_t)); }adp_cnt_get_bytes(tbm2,buf2,&len2);/* use xxx_cnt_merge_bytes to merge bitmaps to context */adp_cnt_merge_bytes(ctx,buf1,len1,buf2,len2,NULL);esti=adp_cnt_card(ctx);printf("actual:40000, estimated: %9lu, error: %+7.2f%%\n", (long unsignedint)esti, (double)(esti-40000) /40000*100);adp_cnt_fini(tbm2);adp_cnt_fini(tbm1);adp_cnt_fini(ctx);}
Source codes should always be formatted before committing by running scriptutil/indent-src
in top-dir. It utilizedastyle to do the job, so you probably want toinstall it first.Make sure you install astyle v2.03 or later, as theindenting result differs from previous versions (seehere for details)
- K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor.[A Linear-TimeProbabilistic Counting Algorithm for Database Applications](http://dblab.kaist.ac.kr/Publication/pdf/ACM90_TODS_v15n2.pdf). ACMTransactions on Database Systems, 15(2):208-229, 1990.
- Marianne Durand and Philippe Flajolet.LogLog counting of largecardinalities.In ESA03, volume 2832 of LNCS, pages 605-617, 2003.
- Min Cai, Jianping Pan, Yu K. Kwok, and Kai Hwang.[Fast and accuratetraffic matrix measurement using adaptive cardinality counting](http://gridsec.usc.edu/files/tr/tr-2005-12.pdf). InMineNet '05: Proceedings of the 2005 ACM SIGCOMM workshop onMining network data, pages 205-206, New York, NY, USA, 2005. ACM.
- P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier.Hyperloglog: The analysis of a near-optimal cardinalityestimation algorithm.Disc. Math. and Theor. Comp. Sci., AH:127-146, 2007.
- Stefan Heule, Marc Nunkesser, Alex Hall.HyperLogLog in Practice:Algorithmic Engineering of a State of The Art Cardinality EstimationAlgorithm.In Proceedings of the EDBT 2013 Conference, ACM, Genoa, Italy.
The implemention refersstream-lib.
The following estimating results is calculated using bitmap with length of 2^16(64k) bytes:
Linear Counting with Murmurhash:actual: 50000, estimated: 50062, error: 0.12%actual: 100000, estimated: 99924, error: 0.08%actual: 150000, estimated: 149865, error: 0.09%actual: 200000, estimated: 199916, error: 0.04%actual: 250000, estimated: 250123, error: 0.05%actual: 300000, estimated: 299942, error: 0.02%actual: 350000, estimated: 349801, error: 0.06%actual: 400000, estimated: 400101, error: 0.03%actual: 450000, estimated: 449955, error: 0.01%actual: 500000, estimated: 500065, error: 0.01%Linear Counting with Lookup3hash:actual: 50000, estimated: 49835, error: 0.33%actual: 100000, estimated: 99461, error: 0.54%actual: 150000, estimated: 149006, error: 0.66%actual: 200000, estimated: 198501, error: 0.75%actual: 250000, estimated: 248365, error: 0.65%actual: 300000, estimated: 298065, error: 0.65%actual: 350000, estimated: 347504, error: 0.71%actual: 400000, estimated: 397292, error: 0.68%actual: 450000, estimated: 446700, error: 0.73%actual: 500000, estimated: 495944, error: 0.81%Hyperloglog Counting with Murmurhash:actual: 50000, estimated: 50015, error: 0.03%actual: 100000, estimated: 100048, error: 0.05%actual: 150000, estimated: 149709, error: 0.19%actual: 200000, estimated: 201595, error: 0.80%actual: 250000, estimated: 250168, error: 0.07%actual: 300000, estimated: 299864, error: 0.05%actual: 350000, estimated: 348571, error: 0.41%actual: 400000, estimated: 398583, error: 0.35%actual: 450000, estimated: 448632, error: 0.30%actual: 500000, estimated: 498330, error: 0.33%Hyperloglog Counting with Lookup3hash:actual: 50000, estimated: 49628, error: 0.74%actual: 100000, estimated: 99357, error: 0.64%actual: 150000, estimated: 148880, error: 0.75%actual: 200000, estimated: 200475, error: 0.24%actual: 250000, estimated: 249362, error: 0.26%actual: 300000, estimated: 299119, error: 0.29%actual: 350000, estimated: 349225, error: 0.22%actual: 400000, estimated: 398805, error: 0.30%actual: 450000, estimated: 448373, error: 0.36%actual: 500000, estimated: 498183, error: 0.36%Adaptive Counting with Murmurhash:actual: 50000, estimated: 50015, error: 0.03%actual: 100000, estimated: 100048, error: 0.05%actual: 150000, estimated: 149709, error: 0.19%actual: 200000, estimated: 201059, error: 0.53%actual: 250000, estimated: 249991, error: 0.00%actual: 300000, estimated: 300067, error: 0.02%actual: 350000, estimated: 349610, error: 0.11%actual: 400000, estimated: 399875, error: 0.03%actual: 450000, estimated: 450348, error: 0.08%actual: 500000, estimated: 500977, error: 0.20%Adaptive Counting with Lookup3hash:actual: 50000, estimated: 49628, error: 0.74%actual: 100000, estimated: 99357, error: 0.64%actual: 150000, estimated: 148880, error: 0.75%actual: 200000, estimated: 199895, error: 0.05%actual: 250000, estimated: 249563, error: 0.17%actual: 300000, estimated: 299047, error: 0.32%actual: 350000, estimated: 348665, error: 0.38%actual: 400000, estimated: 399266, error: 0.18%actual: 450000, estimated: 450196, error: 0.04%actual: 500000, estimated: 499516, error: 0.10%Loglog Counting with Murmurhash:actual: 50000, estimated: 59857, error: 19.71%actual: 100000, estimated: 103108, error: 3.11%actual: 150000, estimated: 150917, error: 0.61%actual: 200000, estimated: 201059, error: 0.53%actual: 250000, estimated: 249991, error: 0.00%actual: 300000, estimated: 300067, error: 0.02%actual: 350000, estimated: 349610, error: 0.11%actual: 400000, estimated: 399875, error: 0.03%actual: 450000, estimated: 450348, error: 0.08%actual: 500000, estimated: 500977, error: 0.20%Loglog Counting with Lookup3hash:actual: 50000, estimated: 59870, error: 19.74%actual: 100000, estimated: 103044, error: 3.04%actual: 150000, estimated: 150435, error: 0.29%actual: 200000, estimated: 199895, error: 0.05%actual: 250000, estimated: 249563, error: 0.17%actual: 300000, estimated: 299047, error: 0.32%actual: 350000, estimated: 348665, error: 0.38%actual: 400000, estimated: 399266, error: 0.18%actual: 450000, estimated: 450196, error: 0.04%actual: 500000, estimated: 499516, error: 0.10%HyperloglogPlus Counting with Murmurhash 64bit:actual: 50000, estimated: 49801, error: 0.40%actual: 100000, estimated: 101098, error: 1.10%actual: 150000, estimated: 151488, error: 0.99%actual: 200000, estimated: 201337, error: 0.67%actual: 250000, estimated: 252130, error: 0.85%actual: 300000, estimated: 301995, error: 0.66%actual: 350000, estimated: 352194, error: 0.63%actual: 400000, estimated: 402413, error: 0.60%actual: 450000, estimated: 454293, error: 0.95%actual: 500000, estimated: 503228, error: 0.65%