Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

C library for estimating cardinality in streams for which it is infeasible to store all events in memory

NotificationsYou must be signed in to change notification settings

chaoslawful/ccard-lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stories in Ready

Build Status

C library for estimating cardinality in data streams, in which case it isinfeasible to store all events in memory.

This library implements a series of cardinality estimating algorithms such asLinear Counting, LogLog Counting, HyperLogLog Counting and Adaptive Counting.For more information about these algorithms please read theReference section.

Building

Building ccard-lib needsscons. Please readsconsuser guide formore information about it.

Building PHP extension of ccard-lib needsSWIG to beinstalled. Running unit-tests needsgoogletest to be installed.

Building as Library

Assuming you have scons installed, just build ccard-lib like this:

scons install

Scons will build and install ccard-lib to your system.

You can also run unit-tests to make sure the library works as expected:

sconstest

By default ccard-lib will be installed at/usr/local/lib, if you want tochange the install directory please replace the "libdir" setting inSConsturct file with your target directory.

Building as PHP Extension

The following command will build and install card-lib PHP extension:

scons install-php

SWIG is used to generate PHP extension, please installit before run this command.

Uninstall

If you want to uninstall ccard-lib from your system, use the followingcommands:

scons -c install-phpscons -c install

Synopsis

Estimate Cardinality

#include"ccard_common.h"#include"adaptive_counting.h"intmain(intargc,char**argv) {int64_ti,esti;/* construct context for cardinality estimator *//* use xxx_cnt_init to construct context */adp_cnt_ctx_t*ctx=adp_cnt_init(NULL,16,CCARD_HASH_MURMUR);printf("Adaptive Counting with Murmurhash:\n");/* add 500,000 elements to set */for (i=1;i <=500000L;i++) {/* use xxx_cnt_offer to add new element to set */adp_cnt_offer(ctx,&i,sizeof(int64_t));/* print estimate result every 50,000 elements has been added */if (i %50000==0) {/* use xxx_cnt_card to get estimate result */esti=adp_cnt_card(ctx);printf("actual: %9lu, estimated: %9lu, error: %+7.2f%%\n",                   (long unsignedint)i, (long unsignedint)esti, (double)(esti-i) /i*100);        }    }printf("\n");/* use xxx_cnt_fini to destory context */adp_cnt_fini(ctx);}

Merge Bitmaps

#include"ccard_common.h"#include"adaptive_counting.h"intmain(intargc,char**argv) {int64_ti,esti;/* for merging, contexts must have same length of bitmap and hash algorithm */adp_cnt_ctx_t*ctx=adp_cnt_init(NULL,16,CCARD_HASH_LOOKUP3);adp_cnt_ctx_t*tbm1=adp_cnt_init(NULL,16,CCARD_HASH_LOOKUP3);adp_cnt_ctx_t*tbm2=adp_cnt_init(NULL,16,CCARD_HASH_LOOKUP3);int32_tm=1 <<16;/* bitmaps */uint8_tbuf1[m+3],buf2[m+3];uint32_tlen1=m+3,len2=m+3;for (i=1;i <=20000L;i++) {adp_cnt_offer(ctx,&i,sizeof(uint64_t));    }for (i=10000L;i <=30000L;i++) {adp_cnt_offer(tbm1,&i,sizeof(uint64_t));    }/* use xxx_cnt_get_bytes to get bitmap from context */adp_cnt_get_bytes(tbm1,buf1,&len1);for (i=20000L;i <=40000L;i++) {adp_cnt_offer(tbm2,&i,sizeof(uint64_t));    }adp_cnt_get_bytes(tbm2,buf2,&len2);/* use xxx_cnt_merge_bytes to merge bitmaps to context */adp_cnt_merge_bytes(ctx,buf1,len1,buf2,len2,NULL);esti=adp_cnt_card(ctx);printf("actual:40000, estimated: %9lu, error: %+7.2f%%\n",           (long unsignedint)esti, (double)(esti-40000) /40000*100);adp_cnt_fini(tbm2);adp_cnt_fini(tbm1);adp_cnt_fini(ctx);}

For Developers

Source codes should always be formatted before committing by running scriptutil/indent-src in top-dir. It utilizedastyle to do the job, so you probably want toinstall it first.Make sure you install astyle v2.03 or later, as theindenting result differs from previous versions (seehere for details)

Reference

Linear Counting

LogLog Counting and Adaptive Counting

  • Marianne Durand and Philippe Flajolet.LogLog counting of largecardinalities.In ESA03, volume 2832 of LNCS, pages 605-617, 2003.
  • Min Cai, Jianping Pan, Yu K. Kwok, and Kai Hwang.[Fast and accuratetraffic matrix measurement using adaptive cardinality counting](http://gridsec.usc.edu/files/tr/tr-2005-12.pdf). InMineNet '05: Proceedings of the 2005 ACM SIGCOMM workshop onMining network data, pages 205-206, New York, NY, USA, 2005. ACM.

HyperLogLog Counting and HyperLogLog++ Counting

The implemention refersstream-lib.

Experiment

The following estimating results is calculated using bitmap with length of 2^16(64k) bytes:

Linear Counting with Murmurhash:actual: 50000,  estimated: 50062,  error: 0.12%actual: 100000, estimated: 99924,  error: 0.08%actual: 150000, estimated: 149865, error: 0.09%actual: 200000, estimated: 199916, error: 0.04%actual: 250000, estimated: 250123, error: 0.05%actual: 300000, estimated: 299942, error: 0.02%actual: 350000, estimated: 349801, error: 0.06%actual: 400000, estimated: 400101, error: 0.03%actual: 450000, estimated: 449955, error: 0.01%actual: 500000, estimated: 500065, error: 0.01%Linear Counting with Lookup3hash:actual: 50000,  estimated: 49835,  error: 0.33%actual: 100000, estimated: 99461,  error: 0.54%actual: 150000, estimated: 149006, error: 0.66%actual: 200000, estimated: 198501, error: 0.75%actual: 250000, estimated: 248365, error: 0.65%actual: 300000, estimated: 298065, error: 0.65%actual: 350000, estimated: 347504, error: 0.71%actual: 400000, estimated: 397292, error: 0.68%actual: 450000, estimated: 446700, error: 0.73%actual: 500000, estimated: 495944, error: 0.81%Hyperloglog Counting with Murmurhash:actual: 50000,  estimated: 50015,  error: 0.03%actual: 100000, estimated: 100048, error: 0.05%actual: 150000, estimated: 149709, error: 0.19%actual: 200000, estimated: 201595, error: 0.80%actual: 250000, estimated: 250168, error: 0.07%actual: 300000, estimated: 299864, error: 0.05%actual: 350000, estimated: 348571, error: 0.41%actual: 400000, estimated: 398583, error: 0.35%actual: 450000, estimated: 448632, error: 0.30%actual: 500000, estimated: 498330, error: 0.33%Hyperloglog Counting with Lookup3hash:actual: 50000,  estimated: 49628,  error: 0.74%actual: 100000, estimated: 99357,  error: 0.64%actual: 150000, estimated: 148880, error: 0.75%actual: 200000, estimated: 200475, error: 0.24%actual: 250000, estimated: 249362, error: 0.26%actual: 300000, estimated: 299119, error: 0.29%actual: 350000, estimated: 349225, error: 0.22%actual: 400000, estimated: 398805, error: 0.30%actual: 450000, estimated: 448373, error: 0.36%actual: 500000, estimated: 498183, error: 0.36%Adaptive Counting with Murmurhash:actual: 50000,  estimated: 50015,  error: 0.03%actual: 100000, estimated: 100048, error: 0.05%actual: 150000, estimated: 149709, error: 0.19%actual: 200000, estimated: 201059, error: 0.53%actual: 250000, estimated: 249991, error: 0.00%actual: 300000, estimated: 300067, error: 0.02%actual: 350000, estimated: 349610, error: 0.11%actual: 400000, estimated: 399875, error: 0.03%actual: 450000, estimated: 450348, error: 0.08%actual: 500000, estimated: 500977, error: 0.20%Adaptive Counting with Lookup3hash:actual: 50000,  estimated: 49628,  error: 0.74%actual: 100000, estimated: 99357,  error: 0.64%actual: 150000, estimated: 148880, error: 0.75%actual: 200000, estimated: 199895, error: 0.05%actual: 250000, estimated: 249563, error: 0.17%actual: 300000, estimated: 299047, error: 0.32%actual: 350000, estimated: 348665, error: 0.38%actual: 400000, estimated: 399266, error: 0.18%actual: 450000, estimated: 450196, error: 0.04%actual: 500000, estimated: 499516, error: 0.10%Loglog Counting with Murmurhash:actual: 50000,  estimated: 59857,  error: 19.71%actual: 100000, estimated: 103108, error: 3.11%actual: 150000, estimated: 150917, error: 0.61%actual: 200000, estimated: 201059, error: 0.53%actual: 250000, estimated: 249991, error: 0.00%actual: 300000, estimated: 300067, error: 0.02%actual: 350000, estimated: 349610, error: 0.11%actual: 400000, estimated: 399875, error: 0.03%actual: 450000, estimated: 450348, error: 0.08%actual: 500000, estimated: 500977, error: 0.20%Loglog Counting with Lookup3hash:actual: 50000,  estimated: 59870,  error: 19.74%actual: 100000, estimated: 103044, error: 3.04%actual: 150000, estimated: 150435, error: 0.29%actual: 200000, estimated: 199895, error: 0.05%actual: 250000, estimated: 249563, error: 0.17%actual: 300000, estimated: 299047, error: 0.32%actual: 350000, estimated: 348665, error: 0.38%actual: 400000, estimated: 399266, error: 0.18%actual: 450000, estimated: 450196, error: 0.04%actual: 500000, estimated: 499516, error: 0.10%HyperloglogPlus Counting with Murmurhash 64bit:actual: 50000,  estimated: 49801,  error: 0.40%actual: 100000, estimated: 101098, error: 1.10%actual: 150000, estimated: 151488, error: 0.99%actual: 200000, estimated: 201337, error: 0.67%actual: 250000, estimated: 252130, error: 0.85%actual: 300000, estimated: 301995, error: 0.66%actual: 350000, estimated: 352194, error: 0.63%actual: 400000, estimated: 402413, error: 0.60%actual: 450000, estimated: 454293, error: 0.95%actual: 500000, estimated: 503228, error: 0.65%

About

C library for estimating cardinality in streams for which it is infeasible to store all events in memory

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp