Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Universal hashing

From Wikipedia, the free encyclopedia
Technique for selecting hash functions

Inmathematics andcomputing,universal hashing (in arandomized algorithm or data structure) refers to selecting ahash function at random from a family of hash functions with a certain mathematical property (see definition below). This guarantees a low number of collisions inexpectation, even if the data is chosen by an adversary. Many universal families are known (for hashing integers, vectors, strings), and their evaluation is often very efficient. Universal hashing has numerous uses in computer science, for example in implementations ofhash tables,randomized algorithms, andcryptography.

Introduction

[edit]
See also:Hash function

Assume we want to map keys from some universeU{\displaystyle U} intom{\displaystyle m} bins (labelled[m]={0,,m1}{\displaystyle [m]=\{0,\dots ,m-1\}}). The algorithm will have to handle some data setSU{\displaystyle S\subseteq U} of|S|=n{\displaystyle |S|=n} keys, which is not known in advance. Usually, the goal of hashing is to obtain a low number of collisions (keys fromS{\displaystyle S} that land in the same bin). A deterministic hash function cannot offer any guarantee in an adversarial setting if|U|>mn{\displaystyle |U|>m\cdot n}, since the adversary may chooseS{\displaystyle S} to be precisely thepreimage of a bin. This means that all data keys land in the same bin, making hashing useless. Furthermore, a deterministic hash function does not allow forrehashing: sometimes the input data turns out to be bad for the hash function (e.g. there are too many collisions), so one would like to change the hash function.

The solution to these problems is to pick a function randomly from a family of hash functions. A family of functionsH={h:U[m]}{\displaystyle H=\{h:U\to [m]\}} is called auniversal family if,x,yU, xy:  |{hH:h(x)=h(y)}||H|m{\displaystyle \forall x,y\in U,~x\neq y:~~|\{h\in H:h(x)=h(y)\}|\leq {\frac {|H|}{m}}}.

In other words, any two different keys of the universe collide with probability at most1/m{\displaystyle 1/m} when the hash functionh{\displaystyle h} is drawn uniformly at random fromH{\displaystyle H}. This is exactly the probability of collision we would expect if the hash function assigned truly random hash codes to every key.

Sometimes, the definition is relaxed by a constant factor, only requiring collision probabilityO(1/m){\displaystyle O(1/m)} rather than1/m{\displaystyle \leq 1/m}. This concept was introduced by Carter and Wegman[1] in 1977, and has found numerous applications in computer science (see, forexample[2]).

If we have an upper bound ofϵ<1{\displaystyle \epsilon <1} on the collision probability, we say that we haveϵ{\displaystyle \epsilon }-almost universality. So for example, a universal family has1/m{\displaystyle 1/m}-almost universality.

Many, but not all, universal families have the following strongeruniform difference property:

x,yU, xy{\displaystyle \forall x,y\in U,~x\neq y}, whenh{\displaystyle h} is drawn randomly from the familyH{\displaystyle H}, the differenceh(x)h(y) mod m{\displaystyle h(x)-h(y)~{\bmod {~}}m} is uniformly distributed in[m]{\displaystyle [m]}.

Note that the definition of universality is only concerned with whetherh(x)h(y)=0{\displaystyle h(x)-h(y)=0}, which counts collisions. The uniform difference property is stronger.

(Similarly, a universal family can be XOR universal ifx,yU, xy{\displaystyle \forall x,y\in U,~x\neq y}, the valueh(x)h(y) mod m{\displaystyle h(x)\oplus h(y)~{\bmod {~}}m} is uniformly distributed in[m]{\displaystyle [m]} where{\displaystyle \oplus } is the bitwise exclusive or operation. This is only possible ifm{\displaystyle m} is a power of two.)

An even stronger condition ispairwise independence: we have this property whenx,yU, xy{\displaystyle \forall x,y\in U,~x\neq y} we have the probability thatx,y{\displaystyle x,y} will hash to any pair of hash valuesz1,z2{\displaystyle z_{1},z_{2}} is as if they were perfectly random:P(h(x)=z1h(y)=z2)=1/m2{\displaystyle P(h(x)=z_{1}\land h(y)=z_{2})=1/m^{2}}. Pairwise independence is sometimes called strong universality.

Another property is uniformity. We say that a family is uniform if all hash values are equally likely:P(h(x)=z)=1/m{\displaystyle P(h(x)=z)=1/m} for any hash valuez{\displaystyle z}. Universality does not imply uniformity. However, strong universality does imply uniformity.

Given a family with the uniform distance property, one can produce a pairwise independent or strongly universal hash family by adding a uniformly distributed random constant with values in[m]{\displaystyle [m]} to the hash functions. (Similarly, ifm{\displaystyle m} is a power of two, we can achieve pairwise independence from an XOR universal hash family by doing an exclusive or with a uniformly distributed random constant.) Since a shift by a constant is sometimes irrelevant in applications (e.g. hash tables), a careful distinction between the uniform distance property and pairwise independent is sometimes not made.[3]

For some applications (such as hash tables), it is important for the least significant bits of the hash values to be also universal. When a family is strongly universal, this is guaranteed: ifH{\displaystyle H} is a strongly universal family withm=2L{\displaystyle m=2^{L}}, then the family made of the functionshmod2L{\displaystyle h{\bmod {2^{L'}}}} for allhH{\displaystyle h\in H} is also strongly universal forLL{\displaystyle L'\leq L}. Unfortunately, the same is not true of (merely) universal families. For example, the family made of the identity functionh(x)=x{\displaystyle h(x)=x} is clearly universal, but the family made of the functionh(x)=xmod2L{\displaystyle h(x)=x{\bmod {2^{L'}}}} fails to be universal.

UMAC andPoly1305-AES and several othermessage authentication code algorithms are based on universal hashing.[4][5]In such applications, the software chooses a new hash function for every message, based on a unique nonce for that message.

Several hash table implementations are based on universal hashing.In such applications, typically the software chooses a new hash function only after it notices that "too many" keys have collided; until then, the same hash function continues to be used over and over.(Some collision resolution schemes, such asdynamic perfect hashing, pick a new hash function every time there is a collision. Other collision resolution schemes, such ascuckoo hashing and2-choice hashing, allow a number of collisions before picking a new hash function). A survey of fastest known universal and strongly universal hash functions for integers, vectors, andstrings is found in.[6]

Mathematical guarantees

[edit]

For any fixed setS{\displaystyle S} ofn{\displaystyle n} keys, using a universal family guarantees the following properties.

  1. For any fixedx{\displaystyle x} inS{\displaystyle S}, the expected number of keys in the binh(x){\displaystyle h(x)} isn/m{\displaystyle n/m}. When implementing hash tables bychaining, this number is proportional to the expected running time of an operation involving the keyx{\displaystyle x} (for example a query, insertion or deletion).
  2. The expected number of pairs of keysx,y{\displaystyle x,y} inS{\displaystyle S} withxy{\displaystyle x\neq y} that collide (h(x)=h(y){\displaystyle h(x)=h(y)}) is bounded above by(n2)1/m=n(n1)/2m{\displaystyle {\binom {n}{2}}\cdot 1/m=n(n-1)/2m}, which is of orderO(n2/m){\displaystyle O(n^{2}/m)}. When the number of bins,m{\displaystyle m} is chosen linear inn{\displaystyle n} (i.e., is determined by a function inΩ(n){\displaystyle \Omega (n)}), the expected number of collisions isO(n){\displaystyle O(n)}. When hashing inton2{\displaystyle n^{2}} bins, there are no collisions at all with probability at least a half.
  3. The expected number of keys in bins with at leastt{\displaystyle t} keys in them is bounded above by2n/(t2(n/m)+1){\displaystyle 2n/(t-2(n/m)+1)}.[7] Thus, if the capacity of each bin is capped to three times the average size (t=3n/m{\displaystyle t=3n/m}), the total number of keys in overflowing bins is at mostO(m){\displaystyle O(m)}. This only holds with a hash family whose collision probability is bounded above by1/m{\displaystyle 1/m}. If a weaker definition is used, bounding it byO(1/m){\displaystyle O(1/m)}, this result is no longer true.[7]

As the above guarantees hold for any fixed setS{\displaystyle S}, they hold if the data set is chosen by an adversary. However, the adversary has to make this choice before (or independent of) the algorithm's random choice of a hash function. If the adversary can observe the random choice of the algorithm, randomness serves no purpose, and the situation is the same as deterministic hashing.

The second and third guarantee are typically used in conjunction withrehashing. For instance, a randomized algorithm may be prepared to handle someO(n){\displaystyle O(n)} number of collisions. If it observes too many collisions, it chooses another randomh{\displaystyle h} from the family and repeats. Universality guarantees that the number of repetitions is ageometric random variable.

Constructions

[edit]

Since any computer data can be represented as one or more machine words, one generally needs hash functions for three types of domains: machine words ("integers"); fixed-length vectors of machine words; and variable-length vectors ("strings").

Hashing integers

[edit]

This section refers to the case of hashing integers that fit in machines words; thus, operations like multiplication, addition, division, etc. are cheap machine-level instructions. Let the universe to be hashed be{0,,|U|1}{\displaystyle \{0,\dots ,|U|-1\}}.

The original proposal of Carter and Wegman[1] was to pick a primep|U|{\displaystyle p\geq |U|} and define

ha,b(x)=((ax+b) mod p) mod m{\displaystyle h_{a,b}(x)=((ax+b)~{\bmod {~}}p)~{\bmod {~}}m}

wherea,b{\displaystyle a,b} are randomly chosen integers modulop{\displaystyle p} witha0{\displaystyle a\neq 0}. (This is a single iteration of alinear congruential generator.)

To see thatH={ha,b}{\displaystyle H=\{h_{a,b}\}} is a universal family, note thath(x)=h(y){\displaystyle h(x)=h(y)} only holds when

ax+bay+b+im(modp){\displaystyle ax+b\equiv ay+b+i\cdot m{\pmod {p}}}

for some integeri{\displaystyle i} between0{\displaystyle 0} and(p1)/m{\displaystyle (p-1)/m}. Sincep|U|{\displaystyle p\geq |U|}, ifxy{\displaystyle x\neq y} their differencexy{\displaystyle x-y} is nonzero and has an inverse modulop{\displaystyle p}. Solving fora{\displaystyle a} yields

aim(xy)1(modp){\displaystyle a\equiv i\cdot m\cdot (x-y)^{-1}{\pmod {p}}}.

There arep1{\displaystyle p-1} possible choices fora{\displaystyle a} (sincea=0{\displaystyle a=0} is excluded) and, varyingi{\displaystyle i} in the allowed range,(p1)/m{\displaystyle \lfloor (p-1)/m\rfloor } possible non-zero values for the right hand side. Thus the collision probability is

(p1)/m/(p1)((p1)/m)/(p1)=1/m{\displaystyle \lfloor (p-1)/m\rfloor /(p-1)\leq ((p-1)/m)/(p-1)=1/m}.

Another way to seeH{\displaystyle H} is a universal family is via the notion ofstatistical distance. Write the differenceh(x)h(y){\displaystyle h(x)-h(y)} as

h(x)h(y)(a(xy) mod p)(modm){\displaystyle h(x)-h(y)\equiv (a(x-y)~{\bmod {~}}p){\pmod {m}}}.

Sincexy{\displaystyle x-y} is nonzero anda{\displaystyle a} is uniformly distributed in{1,,p1}{\displaystyle \{1,\dots ,p-1\}}, it follows thata(xy){\displaystyle a(x-y)} modulop{\displaystyle p} is also uniformly distributed in{1,,p1}{\displaystyle \{1,\dots ,p-1\}}. The distribution of(h(x)h(y)) mod m{\displaystyle (h(x)-h(y))~{\bmod {~}}m} is thus almost uniform, up to a difference in probability of±1/p{\displaystyle \pm 1/p} between the samples. As a result, the statistical distance to a uniform family isO(m/p){\displaystyle O(m/p)}, which becomes negligible whenpm{\displaystyle p\gg m}.

The family of simpler hash functions

ha(x)=(ax mod p) mod m{\displaystyle h_{a}(x)=(ax~{\bmod {~}}p)~{\bmod {~}}m}

is onlyapproximately universal:Pr{ha(x)=ha(y)}2/m{\displaystyle \Pr\{h_{a}(x)=h_{a}(y)\}\leq 2/m} for allxy{\displaystyle x\neq y}.[1] Moreover, this analysis is nearly tight; Carter and Wegman[1] show thatPr{ha(1)=ha(m+1)}2/(m+1){\displaystyle \Pr\{h_{a}(1)=h_{a}(m+1)\}\geq 2/(m+1)} whenever(p1) mod m=1{\displaystyle (p-1)~{\bmod {~}}m=1}.

Avoiding modular arithmetic

[edit]

The state of the art for hashing integers is themultiply-shift scheme described by Dietzfelbinger et al. in 1997.[8] By avoiding modular arithmetic, this method is much easier to implement and also runs significantly faster in practice (usually by at least a factor of four[9]). The scheme assumes the number of bins is a power of two,m=2M{\displaystyle m=2^{M}}. Letw{\displaystyle w} be the number of bits in a machine word. Then the hash functions are parametrised over odd positive integersa<2w{\displaystyle a<2^{w}} (that fit in a word ofw{\displaystyle w} bits). To evaluateha(x){\displaystyle h_{a}(x)}, multiplyx{\displaystyle x} bya{\displaystyle a} modulo2w{\displaystyle 2^{w}} and then keep the high orderM{\displaystyle M} bits as the hash code. In mathematical notation, this is

ha(x)=(axmod2w)div2wM.{\displaystyle h_{a}(x)=(a\cdot x\,\,{\bmod {\,}}2^{w})\,\,\mathrm {div} \,\,2^{w-M}.}

This scheme doesnot satisfy the uniform difference property and is only2/m{\displaystyle 2/m}-almost-universal; for anyxy{\displaystyle x\neq y},Pr{ha(x)=ha(y)}2/m{\displaystyle \Pr\{h_{a}(x)=h_{a}(y)\}\leq 2/m}.

To understand the behavior of the hash function, notice that, ifaxmod2w{\displaystyle ax{\bmod {2}}^{w}} andaymod2w{\displaystyle ay{\bmod {2}}^{w}} have the same highest-order 'M' bits, thena(xy)mod2w{\displaystyle a(x-y){\bmod {2}}^{w}} has either all 1's or all 0's as its highest order M bits (depending on whetheraxmod2w{\displaystyle ax{\bmod {2}}^{w}} oraymod2w{\displaystyle ay{\bmod {2}}^{w}} is larger).Assume that the least significant set bit ofxy{\displaystyle x-y} appears on positionwc{\displaystyle w-c}. Sincea{\displaystyle a} is a random odd integer and odd integers have inverses in theringZ2w{\displaystyle Z_{2^{w}}}, it follows thata(xy)mod2w{\displaystyle a(x-y){\bmod {2}}^{w}} will be uniformly distributed amongw{\displaystyle w}-bit integers with the least significant set bit on positionwc{\displaystyle w-c}. The probability that these bits are all 0's or all 1's is therefore at most2/2M=2/m{\displaystyle 2/2^{M}=2/m}.On the other hand, ifc<M{\displaystyle c<M}, then higher-order M bits ofa(xy)mod2w{\displaystyle a(x-y){\bmod {2}}^{w}} contain both 0's and 1's, so it is certain thath(x)h(y){\displaystyle h(x)\neq h(y)}. Finally, ifc=M{\displaystyle c=M} then bitwM{\displaystyle w-M} ofa(xy)mod2w{\displaystyle a(x-y){\bmod {2}}^{w}} is 1 andha(x)=ha(y){\displaystyle h_{a}(x)=h_{a}(y)} if and only if bitsw1,,wM+1{\displaystyle w-1,\ldots ,w-M+1} are also 1, which happens with probability1/2M1=2/m{\displaystyle 1/2^{M-1}=2/m}.

This analysis is tight, as can be shown with the examplex=2wM2{\displaystyle x=2^{w-M-2}} andy=3x{\displaystyle y=3x}.To obtain a truly 'universal' hash function, one can use the multiply-add-shift scheme that picks higher-order bits

ha,b(x)=((ax+b)mod2w+M)div2w,{\displaystyle h_{a,b}(x)=((ax+b){\bmod {2}}^{w+M})\,\mathrm {div} \,2^{w},}

wherea{\displaystyle a} is a random positive integer witha<22w{\displaystyle a<2^{2w}} andb{\displaystyle b} is a random non-negative integer withb<22w{\displaystyle b<2^{2w}}.This requires doing arithmetic on2w{\displaystyle 2w}-bit unsigned integers.This version of multiply-shift is due to Dietzfelbinger, and was later analyzed more precisely by Woelfel.[10]

Hashing vectors

[edit]

This section is concerned with hashing a fixed-length vector of machine words. Interpret the input as a vectorx¯=(x0,,xk1){\displaystyle {\bar {x}}=(x_{0},\dots ,x_{k-1})} ofk{\displaystyle k} machine words (integers ofw{\displaystyle w} bits each). IfH{\displaystyle H} is a universal family with the uniform difference property, the following family (dating back to Carter and Wegman[1]) also has the uniform difference property (and hence is universal):

h(x¯)=(i=0k1hi(xi))mod m{\displaystyle h({\bar {x}})=\left(\sum _{i=0}^{k-1}h_{i}(x_{i})\right)\,{\bmod {~}}m}, where eachhiH{\displaystyle h_{i}\in H} is chosen independently at random.

Ifm{\displaystyle m} is a power of two, one may replace summation by exclusive or.[11]

In practice, if double-precision arithmetic is available, this is instantiated with the multiply-shift hash family of hash functions.[12] Initialize the hash function with a vectora¯=(a0,,ak1){\displaystyle {\bar {a}}=(a_{0},\dots ,a_{k-1})} of randomodd integers on2w{\displaystyle 2w} bits each. Then if the number of bins ism=2M{\displaystyle m=2^{M}} forMw{\displaystyle M\leq w}:

ha¯(x¯)=((i=0k1xiai) mod 22w)div22wM{\displaystyle h_{\bar {a}}({\bar {x}})=\left({\big (}\sum _{i=0}^{k-1}x_{i}\cdot a_{i}{\big )}~{\bmod {~}}2^{2w}\right)\,\,\mathrm {div} \,\,2^{2w-M}}.

It is possible to halve the number of multiplications, which roughly translates to a two-fold speed-up in practice.[11] Initialize the hash function with a vectora¯=(a0,,ak1){\displaystyle {\bar {a}}=(a_{0},\dots ,a_{k-1})} of randomodd integers on2w{\displaystyle 2w} bits each. The following hash family is universal:[13]

ha¯(x¯)=((i=0k/2(x2i+a2i)(x2i+1+a2i+1))mod 22w)div22wM{\displaystyle h_{\bar {a}}({\bar {x}})=\left({\Big (}\sum _{i=0}^{\lceil k/2\rceil }(x_{2i}+a_{2i})\cdot (x_{2i+1}+a_{2i+1}){\Big )}{\bmod {~}}2^{2w}\right)\,\,\mathrm {div} \,\,2^{2w-M}}.

If double-precision operations are not available, one can interpret the input as a vector of half-words (w/2{\displaystyle w/2}-bit integers). The algorithm will then usek/2{\displaystyle \lceil k/2\rceil } multiplications, wherek{\displaystyle k} was the number of half-words in the vector. Thus, the algorithm runs at a "rate" of one multiplication per word of input.

The same scheme can also be used for hashing integers, by interpreting their bits as vectors of bytes. In this variant, the vector technique is known astabulation hashing and it provides a practical alternative to multiplication-based universal hashing schemes.[14]

Strong universality at high speed is also possible.[15] Initialize the hash function with a vectora¯=(a0,,ak){\displaystyle {\bar {a}}=(a_{0},\dots ,a_{k})} of random integers on2w{\displaystyle 2w} bits. Compute

ha¯(x¯)strong=(a0+i=0k1ai+1ximod 22w)div2w{\displaystyle h_{\bar {a}}({\bar {x}})^{\mathrm {strong} }=(a_{0}+\sum _{i=0}^{k-1}a_{i+1}x_{i}{\bmod {~}}2^{2w})\,\,\mathrm {div} \,\,2^{w}}.

The result is strongly universal onw{\displaystyle w} bits. Experimentally, it was found to run at 0.2 CPU cycle per byte on recent Intel processors forw=32{\displaystyle w=32}.

Hashing strings

[edit]

This refers to hashing avariable-sized vector of machine words. If the length of the string can be bounded by a small number, it is best to use the vector solution from above (conceptually padding the vector with zeros up to the upper bound). The space required is the maximal length of the string, but the time to evaluateh(s){\displaystyle h(s)} is just the length ofs{\displaystyle s}. As long as zeroes are forbidden in the string, the zero-padding can be ignored when evaluating the hash function without affecting universality.[11] Note that if zeroes are allowed in the string, then it might be best to append a fictitious non-zero (e.g., 1) character to all strings prior to padding: this will ensure that universality is not affected.[15]

Now assume we want to hashx¯=(x0,,x){\displaystyle {\bar {x}}=(x_{0},\dots ,x_{\ell })}, where a good bound on{\displaystyle \ell } is not known a priori. A universal family proposed by[12] treats the stringx{\displaystyle x} as the coefficients of a polynomial modulo a large prime. Ifxi[u]{\displaystyle x_{i}\in [u]}, letpmax{u,m}{\displaystyle p\geq \max\{u,m\}} be a prime and define:

ha(x¯)=hint((i=0xiai)mod p){\displaystyle h_{a}({\bar {x}})=h_{\mathrm {int} }\left({\big (}\sum _{i=0}^{\ell }x_{i}\cdot a^{\ell -i}{\big )}{\bmod {~}}p\right)}, wherea[p]{\displaystyle a\in [p]} is uniformly random andhint{\displaystyle h_{\mathrm {int} }} is chosen randomly from a universal family mapping integer domain[p][m]{\displaystyle [p]\mapsto [m]}.

Using properties of modular arithmetic, above can be computed without producing large numbers for large strings as follows:[16]

uinthash(Stringx,inta,intp)uinth=INITIAL_VALUEfor(uinti=0;i<x.length;++i)h=((h*a)+x[i])modpreturnh

ThisRabin-Karp rolling hash is based on alinear congruential generator.[17]Above algorithm is also known asMultiplicative hash function.[18] In practice, themod operator and the parameterp can be avoided altogether by simply allowing integer to overflow because it is equivalent tomod (Max-Int-Value + 1) in many programming languages. Below table shows values chosen to initializeh and a for some of the popular implementations.

ImplementationINITIAL_VALUEa
Bernstein's hash functiondjb2[19]538133
STLPort 4.6.205
Kernighan andRitchie's hash function[20]031
java.lang.String.hashCode()[21]031

Consider two stringsx¯,y¯{\displaystyle {\bar {x}},{\bar {y}}} and let{\displaystyle \ell } be length of the longer one; for the analysis, the shorter string is conceptually padded with zeros up to length{\displaystyle \ell }. A collision before applyinghint{\displaystyle h_{\mathrm {int} }} implies thata{\displaystyle a} is a root of the polynomial with coefficientsx¯y¯{\displaystyle {\bar {x}}-{\bar {y}}}. This polynomial has at most{\displaystyle \ell } roots modulop{\displaystyle p}, so the collision probability is at most/p{\displaystyle \ell /p}. The probability of collision through the randomhint{\displaystyle h_{\mathrm {int} }} brings the total collision probability to1m+p{\displaystyle {\frac {1}{m}}+{\frac {\ell }{p}}}. Thus, if the primep{\displaystyle p} is sufficiently large compared to the length of strings hashed, the family is very close to universal (instatistical distance).

Other universal families of hash functions used to hash unknown-length strings to fixed-length hash values include theRabin fingerprint and theBuzhash.

Avoiding modular arithmetic

[edit]

To mitigate the computational penalty of modular arithmetic, three tricks are used in practice:[11]

  1. One chooses the primep{\displaystyle p} to be close to a power of two, such as aMersenne prime. This allows arithmetic modulop{\displaystyle p} to be implemented without division (using faster operations like addition and shifts). For instance, on modern architectures one can work withp=2611{\displaystyle p=2^{61}-1}, whilexi{\displaystyle x_{i}}'s are 32-bit values.
  2. One can apply vector hashing to blocks. For instance, one applies vector hashing to each 16-word block of the string, and applies string hashing to thek/16{\displaystyle \lceil k/16\rceil } results. Since the slower string hashing is applied on a substantially smaller vector, this will essentially be as fast as vector hashing.
  3. One chooses a power-of-two as the divisor, allowing arithmetic modulo2w{\displaystyle 2^{w}} to be implemented without division (using faster operations ofbit masking). TheNH hash-function family takes this approach.

See also

[edit]

References

[edit]
  1. ^abcdeCarter, Larry;Wegman, Mark N. (1979)."Universal Classes of Hash Functions".Journal of Computer and System Sciences.18 (2):143–154.doi:10.1016/0022-0000(79)90044-8. Conference version in STOC'77.
  2. ^Miltersen, Peter Bro."Universal Hashing"(PDF). Archived fromthe original(PDF) on 24 May 2011. Retrieved24 June 2009.
  3. ^Motwani, Rajeev; Raghavan, Prabhakar (1995).Randomized Algorithms. Cambridge University Press. p. 221.ISBN 0-521-47465-5.
  4. ^David Wagner, ed."Advances in Cryptology - CRYPTO 2008".p. 145.
  5. ^Jean-Philippe Aumasson, Willi Meier, Raphael Phan, Luca Henzen."The Hash Function BLAKE".2014.p. 10.
  6. ^Thorup, Mikkel (2015). "High Speed Hashing for Integers and Strings".arXiv:1504.06804 [cs.DS].
  7. ^abBaran, Ilya; Demaine, Erik D.;Pătraşcu, Mihai (2008)."Subquadratic Algorithms for 3SUM"(PDF).Algorithmica.50 (4):584–596.doi:10.1007/s00453-007-9036-3.S2CID 9855995.
  8. ^Dietzfelbinger, Martin; Hagerup, Torben; Katajainen, Jyrki; Penttonen, Martti (1997)."A Reliable Randomized Algorithm for the Closest-Pair Problem"(Postscript).Journal of Algorithms.25 (1):19–51.doi:10.1006/jagm.1997.0873. Retrieved10 February 2011.
  9. ^Thorup, Mikkel (18 December 2009)."Text-book algorithms at SODA".
  10. ^Woelfel, Philipp (1999).Efficient Strongly Universal and Optimally Universal Hashing. Mathematical Foundations of Computer Science 1999. LNCS. Vol. 1672. pp. 262–272.doi:10.1007/3-540-48340-3_24.
  11. ^abcdThorup, Mikkel (2009).String hashing for linear probing.Proc. 20th ACM-SIAM Symposium on Discrete Algorithms (SODA). pp. 655–664.CiteSeerX 10.1.1.215.4253.doi:10.1137/1.9781611973068.72.ISBN 978-0-89871-680-1., section 5.3
  12. ^abDietzfelbinger, Martin; Gil, Joseph; Matias, Yossi; Pippenger, Nicholas (1992).Polynomial Hash Functions Are Reliable (Extended Abstract).Proc. 19th International Colloquium on Automata, Languages and Programming (ICALP). pp. 235–246.
  13. ^Black, J.; Halevi, S.; Krawczyk, H.; Krovetz, T. (1999).UMAC: Fast and Secure Message Authentication(PDF).Advances in Cryptology (CRYPTO '99)., Equation 1
  14. ^Pătraşcu, Mihai;Thorup, Mikkel (2011).The power of simple tabulation hashing.Proceedings of the 43rd annual ACM Symposium on Theory of Computing (STOC '11). pp. 1–10.arXiv:1011.5200.doi:10.1145/1993636.1993638.ISBN 9781450306911.
  15. ^abKaser, Owen; Lemire, Daniel (2013). "Strongly universal string hashing is fast".Computer Journal.57 (11). Oxford University Press:1624–1638.arXiv:1202.4961.doi:10.1093/comjnl/bxt070.
  16. ^"Hebrew University Course Slides"(PDF).
  17. ^Robert Uzgalis."Library Hash Functions".1996.
  18. ^Kankowsk, Peter."Hash functions: An empirical comparison".
  19. ^Yigit, Ozan."String hash functions".
  20. ^Kernighan; Ritchie (1988)."6".The C Programming Language (2nd ed.). Prentice Hall. pp. 118.ISBN 0-13-110362-8.{{cite book}}: CS1 maint: multiple names: authors list (link)
  21. ^"String (Java Platform SE 6)".docs.oracle.com. Retrieved2015-06-10.

Further reading

[edit]

External links

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Universal_hashing&oldid=1264860348"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp