I offer you a new hash function for hash table lookup that isfaster and more thorough than the one you are using now. I also giveyou a way to verify that it is more thorough.
Over the past two years I've built a general hash function for hashtable lookup. Most of the two dozen old hashes I've replaced havehad owners who wouldn't accept a new hash unless it was a plug-inreplacement for their old hash, and was demonstrably better than theold hash.
These old hashes defined my requirements:
Without further ado, here's the fastest hash I've been able todesign that meets all the requirements. The comments describe how touse it.
typedef unsigned long int ub4; /* unsigned 4-byte quantities */typedef unsigned char ub1; /* unsigned 1-byte quantities */#define hashsize(n) ((ub4)1<<(n))#define hashmask(n) (hashsize(n)-1)/*--------------------------------------------------------------------mix -- mix 3 32-bit values reversibly.For every delta with one or two bits set, and the deltas of all three high bits or all three low bits, whether the original value of a,b,c is almost all zero or is uniformly distributed,* If mix() is run forward or backward, at least 32 bits in a,b,c have at least 1/4 probability of changing.* If mix() is run forward, every bit of c will change between 1/3 and 2/3 of the time. (Well, 22/100 and 78/100 for some 2-bit deltas.)mix() was built out of 36 single-cycle latency instructions in a structure that could supported 2x parallelism, like so: a -= b; a -= c; x = (c>>13); b -= c; a ^= x; b -= a; x = (a<<8); c -= a; b ^= x; c -= b; x = (b>>13); ... Unfortunately, superscalar Pentiums and Sparcs can't take advantage of that parallelism. They've also turned some of those single-cycle latency instructions into multi-cycle latency instructions. Still, this is the fastest good hash I could find. There were about 2^^68 to choose from. I only looked at a billion or so.--------------------------------------------------------------------*/#define mix(a,b,c) \{ \ a -= b; a -= c; a ^= (c>>13); \ b -= c; b -= a; b ^= (a<<8); \ c -= a; c -= b; c ^= (b>>13); \ a -= b; a -= c; a ^= (c>>12); \ b -= c; b -= a; b ^= (a<<16); \ c -= a; c -= b; c ^= (b>>5); \ a -= b; a -= c; a ^= (c>>3); \ b -= c; b -= a; b ^= (a<<10); \ c -= a; c -= b; c ^= (b>>15); \}/*--------------------------------------------------------------------hash() -- hash a variable-length key into a 32-bit value k : the key (the unaligned variable-length array of bytes) len : the length of the key, counting by bytes initval : can be any 4-byte valueReturns a 32-bit value. Every bit of the key affects every bit ofthe return value. Every 1-bit and 2-bit delta achieves avalanche.About 6*len+35 instructions.The best hash table sizes are powers of 2. There is no need to domod a prime (mod is sooo slow!). If you need less than 32 bits,use a bitmask. For example, if you need only 10 bits, do h = (h & hashmask(10));In which case, the hash table should have hashsize(10) elements.If you are hashing n strings (ub1 **)k, do it like this: for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);By Bob Jenkins, 1996. bob_jenkins@burtleburtle.net. You may use thiscode any way you wish, private, educational, or commercial. It's free.See http://burtleburtle.net/bob/hash/evahash.htmlUse for hash table lookup, or anything where one collision in 2^^32 isacceptable. Do NOT use for cryptographic purposes.--------------------------------------------------------------------*/ub4 hash( k, length, initval)register ub1 *k; /* the key */register ub4 length; /* the length of the key */register ub4 initval; /* the previous hash, or an arbitrary value */{ register ub4 a,b,c,len; /* Set up the internal state */ len = length; a = b = 0x9e3779b9; /* the golden ratio; an arbitrary value */ c = initval; /* the previous hash value */ /*---------------------------------------- handle most of the key */ while (len >= 12) { a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) +((ub4)k[3]<<24)); b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) +((ub4)k[7]<<24)); c += (k[8] +((ub4)k[9]<<8) +((ub4)k[10]<<16)+((ub4)k[11]<<24)); mix(a,b,c); k += 12; len -= 12; } /*------------------------------------- handle the last 11 bytes */ c += length; switch(len) /* all the case statements fall through */ { case 11: c+=((ub4)k[10]<<24); case 10: c+=((ub4)k[9]<<16); case 9 : c+=((ub4)k[8]<<8); /* the first byte of c is reserved for the length */ case 8 : b+=((ub4)k[7]<<24); case 7 : b+=((ub4)k[6]<<16); case 6 : b+=((ub4)k[5]<<8); case 5 : b+=k[4]; case 4 : a+=((ub4)k[3]<<24); case 3 : a+=((ub4)k[2]<<16); case 2 : a+=((ub4)k[1]<<8); case 1 : a+=k[0]; /* case 0: nothing left to add */ } mix(a,b,c); /*-------------------------------------------- report the result */ return c;}Most hashes can be modeled like this:
initialize(internal state) for (each text block) { combine(internal state, text block); mix(internal state); } return postprocess(internal state);In the new hash, mix() takes 3n of the 6n+35 instructions needed tohash n bytes. Blocks of text are combined with the internal state(a,b,c) by addition. This combining step is the rest of the hashfunction, consuming the remaining 3n instructions. The onlypostprocessing is to choose c out of (a,b,c) to be the result.
Three tricks promote speed:
The golden ratio really is an arbitrary value. Its purpose isto avoid mapping all zeros to all zeros.
The most interesting requirement was that the hash must be betterthan its competition. What does it mean for a hash to be good forhash table lookup?
A good hash function distributes hash values uniformly. If youdon't know the keys before choosing the function, the best you can do ismap an equal number of possible keys to each hash value. If keys weredistributed uniformly, an excellent hash would be to choose the firstfew bytes of the key and use that as the hash value. Unfortunately,real keys aren't uniformly distributed. Choosing the first few bytesworks quite poorly in practice.
The real requirement then is that a good hash function shoulddistribute hash values uniformly for the keys that users actually use.
How do we test that? Let's look at some typical user data. (Since Iwork at Oracle, I'll use Oracle's standard example: the EMP table.)
| EMPNO | ENAME | JOB | MGR | HIREDATE | SAL | COMM | DEPTNO |
|---|---|---|---|---|---|---|---|
| 7369 | SMITH | CLERK | 7902 | 17-DEC-80 | 800 | 20 | |
| 7499 | ALLEN | SALESMAN | 7698 | 20-FEB-81 | 1600 | 300 | 30 |
| 7521 | WARD | SALESMAN | 7698 | 22-FEB-81 | 1250 | 500 | 30 |
| 7566 | JONES | MANAGER | 7839 | 02-APR-81 | 2975 | 20 | |
| 7654 | MARTIN | SALESMAN | 7898 | 28-SEP-81 | 1250 | 1400 | 30 |
| 7698 | BLAKE | MANAGER | 7539 | 01-MAY-81 | 2850 | 30 | |
| 7782 | CLARK | MANAGER | 7566 | 09-JUN-81 | 2450 | 10 | |
| 7788 | SCOTT | ANALYST | 7698 | 19-APR-87 | 3000 | 20 | |
| 7839 | KING | PRESIDENT | 17-NOV-81 | 5000 | 10 | ||
| 7844 | TURNER | SALESMAN | 7698 | 08-SEP-81 | 1500 | 30 | |
| 7876 | ADAMS | CLERK | 7788 | 23-MAY-87 | 1100 | 0 | 20 |
| 7900 | JAMES | CLERK | 7698 | 03-DEC-81 | 950 | 30 | |
| 7902 | FORD | ANALYST | 7566 | 03-DEC-81 | 3000 | 20 | |
| 7934 | MILLER | CLERK | 7782 | 23-JAN-82 | 1300 | 10 |
Consider each horizontal row to be a key. Some patterns appear.
Some patterns are easy to handle. If the length is included in thedata being hashed, then lengths are not a problem. If the hash doesnot treat text blocks commutatively, then substrings are not aproblem. Strings that are mostly zeros can be tested by listing allstrings with only one bit set and checking if that set of stringsproduces too many collisions.
The remaining pattern is that keys often differ in only a few bits.If a hash allows small sets of input bits to cancel each other out,and the user keys differ in only those bits, then all keys will map tothe same handful of hash values.
Usually, when a small set of input bits cancel each other out, it isbecause those input bits affect only a smaller set of bits in theinternal state.
Consider this hash function:
for (hash=0, i=0; i<hash; ++i) hash = ((hash<<5)^(hash>>27))^key[i]; return (hash % prime);This function maps the strings "EXXXXXB" and "AXXXXXC" to the same value.These keys differ in bit 3 of the first byte and bit 1 of the seventhbyte. After the seventh bit is combined, any further postprocessingwill do no good because the internal states are already the same.
Any time n input bits can only affect m output bits, and n > m, thenthe 2n keys that differ in those input bits can onlyproduce 2m distinct hash values. The same is true if ninput bits can only affect m bits of the internal state -- latermixing may make the 2m results look uniformly distributed,but there will still be only 2m results.
The function above has many sets of 2 bits that affect only 1 bitof the internal state. If there are n input bits, there are (n choose2)=(n*n/2 - n/2) pairs of input bits, only a few of whichmatch weaknesses in the function above. It is a common pattern for keys todiffer in only a few bits. If those bits match one of a hash's weaknesses,which is a rare but not negligible event, the hash will do extremelybad. In most cases, though, it will do just fine. (This allows afunction to slip through sanity checks, like hashing an Englishdictionary uniformly, while still frequently bombing on user data.)
In hashes built of repeated combine-mix steps, this is what usuallycauses this weakness:
The same thing can happen in reverse:
(If the mixing function is not a permutation of the internal state,it is not reversible. Instead, it loses information about theearlier blocks every time it is applied, so keys differing only in thefirst few input blocks are more likely to collide. The mixingfunction ought to be a permutation.)
It is easy to test whether this weakness exists: if the mixing stepcauses any bit of the internal state to affect fewer bits of theinternal state than there are output bits, the weakness exists. Thistest should be run on the reverse of the mixing function as well. Itcan also be run with all sets of 2 internal state bits, or all sets of3.
Another way this weakness can happen is if any bit in the finalinput block does not affect every bit of the output. (The user mightchoose to use only the unaffected output bit, then that's 1 input bitthat affects 0 output bits.)
We now have a new hash function and some theory for evaluating hashfunctions. Let's see how various hash functions stack up.
ub4 additive(char *key, ub4 len, ub4 prime){ ub4 hash, i; for (hash=len, i=0; i<len; ++i) hash += key[i]; return (hash % prime);}This takes 5n+3 instructions. There is no mixing step. Thecombining step handles one byte at a time. Input bytes commute. Thetable length must be prime, and can't be much bigger than one bytebecause the value of variablehash is never much bigger thanone byte.ub4 rotating(char *key, ub4 len, ub4 prime){ ub4 hash, i; for (hash=len, i=0; i<len; ++i) hash = (hash<<4)^(hash>>28)^key[i]; return (hash % prime);}This takes 8n+3 instructions. This is the same as the additivehash, except it has a mixing step (a circular shift by 4) and thecombining step is exclusive-or instead of addition. The table size isa prime, but the prime can be any size.char pearson(char *key, ub4 len, char tab[256]){ char hash; ub4 i; for (hash=len, i=0; i<len; ++i) hash=tab[hash^key[i]]; return (hash);}This preinitializestab[] to an arbitrary permutation of 0.. 255. It takes 6n+2 instructions, but produces only a 1-byteresult. Larger results can be made by running it several times withdifferent initial hash values.ub4 universal(char *key, ub4 len, ub4 mask, ub4 tab[MAXBITS]){ ub4 hash, i; for (hash=len, i=0; i<(len<<3); i+=8) { register char k = key[i>>3]; if (k&0x01) hash ^= tab[i+0]; if (k&0x02) hash ^= tab[i+1]; if (k&0x04) hash ^= tab[i+2]; if (k&0x08) hash ^= tab[i+3]; if (k&0x10) hash ^= tab[i+4]; if (k&0x20) hash ^= tab[i+5]; if (k&0x40) hash ^= tab[i+6]; if (k&0x80) hash ^= tab[i+7]; } return (hash & mask);}This takes 52n+3 instructions. The size of tab[] is themaximum number of input bits. Values in tab[] are chosen at random.Universal hashing can be implemented faster by a Zobrist hash withcarefully chosen table values.ub4 zobrist( char *key, ub4 len, ub4 mask, ub4 tab[MAXBYTES][256]){ ub4 hash, i; for (hash=len, i=0; i<len; ++i) hash ^= tab[i][key[i]]; return (hash & mask);}This takes 10n+3 instructions. Thesize of tab[][256] is the maximum number of input bytes. Values oftab[][256] are chosen at random. This can implement universalhashing, but is more general than universal hashing.Zobrist hashes are especially favored for chess, checkers, othello,and other situations where you have the hash for one state and youwant to compute the hash for a closely related state. You xor to theold hash the table values that you're removing from the state, thenxor the table values that you're adding. For chess, for example,that's 2 xors to get the hash for the next position given the hash ofthe current position.
This takes 6n+35 instructions.
The table below compares all these hash functions.
| NAME | SIZE-1000 | SPEED | INLINE | FUNNEL-15 | FUNNEL-100 | COLLIDE-32 | COLLIDE-1000 |
|---|---|---|---|---|---|---|---|
| Additive | 1009 | 5n+3 | n+2 | 15 into 2 | 100 into 2 | 37006 | +806.02 |
| Rotating | 1009 | 6n+3 | 2n+2 | 4 into 1 | 25 into 1 | 24 | +1.24 |
| One-at-a-Time | 1024 | 9n+9 | 5n+8 | none | none | 0 | -0.05 |
| Bernstein | 1024 | 7n+3 | 3n+2 | 3 into 2 | 3 into 2 | 4 | +1.69 |
| FNV | 1024 | ? | ? | ? | ? | ? | ? |
| Pearson | 1024 | 12n+5 | 4n+3 | none | none | 0 | +1.65 |
| CRC | 1024 | 9n+3 | 5n+2 | 2 into 1 | 11 into 10 | 0 | +0.07 |
| Generalized | 1024 | 9n+3 | 5n+2 | none | none | 0 | -1.83 |
| Universal | 1024 | 52n+3 | 48n+2 | 4 into 3 | 50 into 28 | 0 | +0.20 |
| Zobrist | 1024 | 10n+3 | 6n+2 | none | none | 1 | -0.03 |
| Paul Hsieh's | 1024 | 5n+17 | N/A | 3 into 2 | 3 into 2 | 1 | +1.12 |
| My Hash | 1024 | 6n+35 | N/A | none | none | 0 | +0.33 |
| lookup3.c | 1024 | 5n+20 | N/A | none | none | 0 | -0.08 |
| MD4 | 1024 | 9.5n+230 | N/A | none | none | 1 | +0.73 |
Testimonials: