Count sketch is a type ofdimensionality reduction that is particularly efficient instatistics,machine learning andalgorithms.[1][2]It was invented by Moses Charikar, Kevin Chen and Martin Farach-Colton[3] in an effort to speed up theAMS Sketch by Alon, Matias and Szegedy for approximating thefrequency moments of streams[4] (these calculations require counting of the number of occurrences for the distinct elements of the stream).
The sketch is nearly identical[citation needed] to theFeature hashing algorithm by John Moody,[5] but differs in its use of hash functions with low dependence, which makes it more practical.In order to still have a high probability of success, themedian trick is used to aggregate multiple count sketches, rather than the mean.
These properties allow use for explicitkernel methods, bilinearpooling inneural networks and is a cornerstone in many numerical linear algebra algorithms.[6]
The inventors of this data structure offer the following iterative explanation of its operation:[3]
at the simplest level, the output of a singlehash functions mapping stream elementsq into {+1, -1} is feeding a singleup/down counterC. After a single pass over the data, the frequency of a stream elementq can be approximated, although extremely poorly, by theexpected value;
a straightforward way to improve thevariance of the previous estimate is to use an array of different hash functions, each connected to its own counter. For each elementq, the still holds, so averaging across thei range will tighten the approximation;
the previous construct still has a major deficiency: if a lower-frequency-but-still-important output elementa exhibits ahash collision with a high-frequency element, estimate can be significantly affected. Avoiding this requires reducing the frequency of collision counter updates between any two distinct elements. This is achieved by replacing each in the previous construct with an array ofm counters (making the counter set into a two-dimensional matrix), with indexj of a particular counter to be incremented/decremented selected via another set of hash functions that map elementq into the range {1..m}. Since, averaging across all values ofi will work.
1. For constants and (to be defined later) independently choose random hash functions and such that and.It is necessary that the hash families from which and are chosen be pairwise independent.
2. For each item in the stream, add to theth bucket of theth hash.
At the end of this process, one has sums where
To estimate the count ofs one computes the following value:
The values are unbiased estimates of how many times has appeared in the stream.
The estimate has variance, where is the length of the stream and is.[7]
Furthermore, is guaranteed to never be more than off from the true value, with probability.
^Alon, Noga, Yossi Matias, and Mario Szegedy. "The space complexity of approximating the frequency moments." Journal of Computer and system sciences 58.1 (1999): 137-147.
^Moody, John. "Fast learning in multi-resolution hierarchies." Advances in neural information processing systems. 1989.
^Woodruff, David P. "Sketching as a Tool for Numerical Linear Algebra." Theoretical Computer Science 10.1-2 (2014): 1–157.
^Larsen, Kasper Green, Rasmus Pagh, and Jakub Tětek. "CountSketches, Feature Hashing and the Median of Three." International Conference on Machine Learning. PMLR, 2021.
^Ninh, Pham;Pagh, Rasmus (2013).Fast and scalable polynomial kernels via explicit feature maps. SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery.doi:10.1145/2487575.2487591.