NotificationsYou must be signed in to change notification settings
Fork5
Star27

Commit7ab7467

committed

I've attached a patch which implements Bob Jenkin's hash function for

PostgreSQL. This hash function replaces the one used by hash indexes andthe catalog cache. Hash joins use a different, relatively poor-qualityhash function, but I'll fix that later.As suggested by Tom Lane, this patch also changes the size of the fixedhash table used by the catalog cache to be a power-of-2 (instead of aprime: I chose 256 instead of 257). This allows the catcache to lookuphash buckets using a simple bitmask. This should improve the performanceof the catalog cache slightly, since the previous method (modulo aprime) was slow.In my tests, this improves the performance of hash indexes by between 4%and 8%; the performance when using btree indexes or seqscans isbasically unchanged.Neil Conway <neilconway@rogers.com>

1 parent5b5cef9 commit7ab7467Copy full SHA for 7ab7467

File tree

10 files changed

+127

-97

lines changed

src
- backend
  - access/hash
  - executor
    - nodeHash.c
  - utils/cache
    - catcache.c
- include
  - access
    - hash.h
  - utils
    - catcache.h

10 files changed

+127

-97

lines changed

`‎src/backend/access/hash/hash.c‎`

Lines changed: 1 addition & 5 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,7 +8,7 @@`
`8`	`8`	`*`
`9`	`9`	`*`
`10`	`10`	`* IDENTIFICATION`
`11`		`- * $Header: /cvsroot/pgsql/src/backend/access/hash/hash.c,v 1.54 2002/03/02 21:39:16 momjian Exp $`
	`11`	`+ * $Header: /cvsroot/pgsql/src/backend/access/hash/hash.c,v 1.55 2002/03/06 20:49:37 momjian Exp $`
`12`	`12`	`*`
`13`	`13`	`* NOTES`
`14`	`14`	`* This file contains only the public interface routines.`
`@@ -165,9 +165,6 @@ hashinsert(PG_FUNCTION_ARGS)`
`165`	`165`	`charnulls= (char)PG_GETARG_POINTER(2);`
`166`	`166`	`ItemPointerht_ctid= (ItemPointer)PG_GETARG_POINTER(3);`
`167`	`167`
`168`		`-#ifdefNOT_USED`
`169`		`-RelationheapRel= (Relation)PG_GETARG_POINTER(4);`
`170`		`-#endif`
`171`	`168`	`InsertIndexResultres;`
`172`	`169`	`HashItemhitem;`
`173`	`170`	`IndexTupleitup;`
`@@ -333,7 +330,6 @@ hashendscan(PG_FUNCTION_ARGS)`
`333`	`330`
`334`	`331`	`/*`
`335`	`332`	`*hashmarkpos() -- save current scan position`
`336`		`- *`
`337`	`333`	`*/`
`338`	`334`	`Datum`
`339`	`335`	`hashmarkpos(PG_FUNCTION_ARGS)`

`‎src/backend/access/hash/hashfunc.c‎`

Lines changed: 83 additions & 53 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,7 +8,7 @@`
`8`	`8`	`*`
`9`	`9`	`*`
`10`	`10`	`* IDENTIFICATION`
`11`		`- * $Header: /cvsroot/pgsql/src/backend/access/hash/hashfunc.c,v 1.31 2002/02/25 04:06:47 momjian Exp $`
	`11`	`+ * $Header: /cvsroot/pgsql/src/backend/access/hash/hashfunc.c,v 1.32 2002/03/06 20:49:38 momjian Exp $`
`12`	`12`	`*`
`13`	`13`	`* NOTES`
`14`	`14`	`* These functions are stored in pg_amproc.For each operator class`
`@@ -95,6 +95,8 @@ hashname(PG_FUNCTION_ARGS)`
`95`	`95`	`{`
`96`	`96`	`charkey=NameStr(PG_GETARG_NAME(0));`
`97`	`97`
	`98`	`+Assert(strlen(key) <=NAMEDATALEN);`
	`99`	`+`
`98`	`100`	`returnhash_any(key,strlen(key));`
`99`	`101`	`}`
`100`	`102`
`@@ -116,61 +118,89 @@ hashvarlena(PG_FUNCTION_ARGS)`
`116`	`118`	`returnresult;`
`117`	`119`	`}`
`118`	`120`
	`121`	`+/* This hash function was written by Bob Jenkins`
	`122`	`+ * (bob_jenkins@burtleburtle.net), and superficially adapted`
	`123`	`+ * for PostgreSQL by Neil Conway. For more information on this`
	`124`	`+ * hash function, see http://burtleburtle.net/bob/hash/doobs.html`
	`125`	`+ */`
`119`	`126`
`120`	`127`	`/*`
`121`		`- * hash_any --- compute a hash function for any specified chunk of memory`
`122`		`- *`
`123`		`- * This can be used as the underlying hash function for any pass-by-reference`
`124`		`- * data type in which there are no non-significant bits.`
`125`		`- *`
`126`		`- * (Comment from the original db3 hashing code: )`
`127`		`- *`
`128`		`- * This is INCREDIBLY ugly, but fast. We break the string up into 8 byte`
`129`		`- * units. On the first time through the loop we get the 'leftover bytes'`
`130`		`- * (strlen % 8). On every later iteration, we perform 8 HASHC's so we handle`
`131`		`- * all 8 bytes. Essentially, this saves us 7 cmp & branch instructions. If`
`132`		`- * this routine is heavily used enough, it's worth the ugly coding.`
`133`		`- *`
`134`		`- * "OZ's original sdbm hash"`
	`128`	`+ * mix -- mix 3 32-bit values reversibly.`
	`129`	`+ * For every delta with one or two bits set, and the deltas of all three`
	`130`	`+ * high bits or all three low bits, whether the original value of a,b,c`
	`131`	`+ * is almost all zero or is uniformly distributed,`
	`132`	`+ * - If mix() is run forward or backward, at least 32 bits in a,b,c`
	`133`	`+ * have at least 1/4 probability of changing.`
	`134`	`+ * - If mix() is run forward, every bit of c will change between 1/3 and`
	`135`	`+ * 2/3 of the time. (Well, 22/100 and 78/100 for some 2-bit deltas.)`
	`136`	`+ */`
	`137`	`+#definemix(a,b,c) \`
	`138`	`+{ \`
	`139`	`+ a -= b; a -= c; a ^= (c>>13); \`
	`140`	`+ b -= c; b -= a; b ^= (a<<8); \`
	`141`	`+ c -= a; c -= b; c ^= (b>>13); \`
	`142`	`+ a -= b; a -= c; a ^= (c>>12); \`
	`143`	`+ b -= c; b -= a; b ^= (a<<16); \`
	`144`	`+ c -= a; c -= b; c ^= (b>>5); \`
	`145`	`+ a -= b; a -= c; a ^= (c>>3); \`
	`146`	`+ b -= c; b -= a; b ^= (a<<10); \`
	`147`	`+ c -= a; c -= b; c ^= (b>>15); \`
	`148`	`+}`
	`149`	`+`
	`150`	`+/*`
	`151`	`+ * hash_any() -- hash a variable-length key into a 32-bit value`
	`152`	`+ * k : the key (the unaligned variable-length array of bytes)`
	`153`	`+ * len : the length of the key, counting by bytes`
	`154`	`+ * Returns a 32-bit value. Every bit of the key affects every bit of`
	`155`	`+ * the return value. Every 1-bit and 2-bit delta achieves avalanche.`
	`156`	`+ * About 6*len+35 instructions. The best hash table sizes are powers`
	`157`	`+ * of 2. There is no need to do mod a prime (mod is sooo slow!).`
	`158`	`+ * If you need less than 32 bits, use a bitmask.`
`135`	`159`	`*/`
`136`	`160`	`Datum`
`137`		`-hash_any(constchar*keydata,intkeylen)`
	`161`	`+hash_any(registerconstchar*k, registerintkeylen)`
`138`	`162`	`{`
`139`		`-uint32n;`
`140`		`-intloop;`
`141`		`-`
`142`		`-#defineHASHCn = keydata++ + 65599 n`
`143`		`-`
`144`		`-n=0;`
`145`		`-if (keylen>0)`
`146`		`-{`
`147`		`-loop= (keylen+8-1) >>3;`
`148`		`-`
`149`		`-switch (keylen& (8-1))`
`150`		`-{`
`151`		`-case0:`
`152`		`-do`
`153`		`-{/* All fall throughs */`
`154`		`-HASHC;`
`155`		`-case7:`
`156`		`-HASHC;`
`157`		`-case6:`
`158`		`-HASHC;`
`159`		`-case5:`
`160`		`-HASHC;`
`161`		`-case4:`
`162`		`-HASHC;`
`163`		`-case3:`
`164`		`-HASHC;`
`165`		`-case2:`
`166`		`-HASHC;`
`167`		`-case1:`
`168`		`-HASHC;`
`169`		`-}while (--loop);`
`170`		`-}`
`171`		`-}`
`172`		`-`
`173`		`-#undef HASHC`
`174`		`-`
`175`		`-PG_RETURN_UINT32(n);`
	`163`	`+ registerDatuma,b,c,len;`
	`164`	`+`
	`165`	`+/* Set up the internal state */`
	`166`	`+len=keylen;`
	`167`	`+a=b=0x9e3779b9;/* the golden ratio; an arbitrary value */`
	`168`	`+/* Another arbitrary value. If the hash function is called`
	`169`	`+ * multiple times, this could be the previously generated`
	`170`	`+ * hash value; however, the interface currently doesn't allow`
	`171`	`+ * this. AFAIK this isn't a big deal.`
	`172`	`+ */`
	`173`	`+c=3923095;`
	`174`	`+`
	`175`	`+/* handle most of the key */`
	`176`	`+while (len >=12)`
	`177`	`+ {`
	`178`	`+a+= (k[0]+((Datum)k[1]<<8)+((Datum)k[2]<<16)+((Datum)k[3]<<24));`
	`179`	`+b+= (k[4]+((Datum)k[5]<<8)+((Datum)k[6]<<16)+((Datum)k[7]<<24));`
	`180`	`+c+= (k[8]+((Datum)k[9]<<8)+((Datum)k[10]<<16)+((Datum)k[11]<<24));`
	`181`	`+mix(a,b,c);`
	`182`	`+k+=12;len-=12;`
	`183`	`+ }`
	`184`	`+`
	`185`	`+/* handle the last 11 bytes */`
	`186`	`+c+=keylen;`
	`187`	`+switch(len)/* all the case statements fall through */`
	`188`	`+ {`
	`189`	`+case11:c+=((Datum)k[10]<<24);`
	`190`	`+case10:c+=((Datum)k[9]<<16);`
	`191`	`+case9 :c+=((Datum)k[8]<<8);`
	`192`	`+/* the first byte of c is reserved for the length */`
	`193`	`+case8 :b+=((Datum)k[7]<<24);`
	`194`	`+case7 :b+=((Datum)k[6]<<16);`
	`195`	`+case6 :b+=((Datum)k[5]<<8);`
	`196`	`+case5 :b+=k[4];`
	`197`	`+case4 :a+=((Datum)k[3]<<24);`
	`198`	`+case3 :a+=((Datum)k[2]<<16);`
	`199`	`+case2 :a+=((Datum)k[1]<<8);`
	`200`	`+case1 :a+=k[0];`
	`201`	`+/* case 0: nothing left to add */`
	`202`	`+ }`
	`203`	`+mix(a,b,c);`
	`204`	`+/* report the result */`
	`205`	`+returnc;`
`176`	`206`	`}`

`‎src/backend/access/hash/hashinsert.c‎`

Lines changed: 3 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,7 +8,7 @@`
`8`	`8`	`*`
`9`	`9`	`*`
`10`	`10`	`* IDENTIFICATION`
`11`		`- * $Header: /cvsroot/pgsql/src/backend/access/hash/hashinsert.c,v 1.23 2001/10/25 05:49:21 momjian Exp $`
	`11`	`+ * $Header: /cvsroot/pgsql/src/backend/access/hash/hashinsert.c,v 1.24 2002/03/06 20:49:40 momjian Exp $`
`12`	`12`	`*`
`13`	`13`	`*-------------------------------------------------------------------------`
`14`	`14`	`*/`
`@@ -49,7 +49,7 @@ _hash_doinsert(Relation rel, HashItem hitem)`
`49`	`49`	`itup=&(hitem->hash_itup);`
`50`	`50`	`if ((natts=rel->rd_rel->relnatts)!=1)`
`51`	`51`	`elog(ERROR,"Hash indices valid for only one index key.");`
`52`		`-itup_scankey=_hash_mkscankey(rel,itup,metap);`
	`52`	`+itup_scankey=_hash_mkscankey(rel,itup);`
`53`	`53`
`54`	`54`	`/*`
`55`	`55`	`* find the first page in the bucket chain containing this key and`
`@@ -232,7 +232,7 @@ _hash_pgaddtup(Relation rel,`
`232`	`232`	`RelationGetRelationName(rel));`
`233`	`233`
`234`	`234`	`/* write the buffer, but hold our lock */`
`235`		`-_hash_wrtnorelbuf(rel,buf);`
	`235`	`+_hash_wrtnorelbuf(buf);`
`236`	`236`
`237`	`237`	`returnitup_off;`
`238`	`238`	`}`

`‎src/backend/access/hash/hashovfl.c‎`

Lines changed: 4 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,7 +8,7 @@`
`8`	`8`	`*`
`9`	`9`	`*`
`10`	`10`	`* IDENTIFICATION`
`11`		`- * $Header: /cvsroot/pgsql/src/backend/access/hash/hashovfl.c,v 1.31 2001/10/25 05:49:21 momjian Exp $`
	`11`	`+ * $Header: /cvsroot/pgsql/src/backend/access/hash/hashovfl.c,v 1.32 2002/03/06 20:49:41 momjian Exp $`
`12`	`12`	`*`
`13`	`13`	`* NOTES`
`14`	`14`	`* Overflow pages look like ordinary relation pages.`
`@@ -73,11 +73,11 @@ _hash_addovflpage(Relation rel, Buffer *metabufp, Buffer buf)`
`73`	`73`	`ovflopaque->hasho_flag=LH_OVERFLOW_PAGE;`
`74`	`74`	`ovflopaque->hasho_oaddr=oaddr;`
`75`	`75`	`ovflopaque->hasho_bucket=pageopaque->hasho_bucket;`
`76`		`-_hash_wrtnorelbuf(rel,ovflbuf);`
	`76`	`+_hash_wrtnorelbuf(ovflbuf);`
`77`	`77`
`78`	`78`	`/* logically chain overflow page to previous page */`
`79`	`79`	`pageopaque->hasho_nextblkno=ovflblkno;`
`80`		`-_hash_wrtnorelbuf(rel,buf);`
	`80`	`+_hash_wrtnorelbuf(buf);`
`81`	`81`	`returnovflbuf;`
`82`	`82`	`}`
`83`	`83`
`@@ -574,7 +574,7 @@ _hash_squeezebucket(Relation rel,`
`574`	`574`	`* the "next" ItemId.`
`575`	`575`	`*/`
`576`	`576`	`PageIndexTupleDelete(rpage,roffnum);`
`577`		`-_hash_wrtnorelbuf(rel,rbuf);`
	`577`	`+_hash_wrtnorelbuf(rbuf);`
`578`	`578`
`579`	`579`	`/*`
`580`	`580`	`* if the "read" page is now empty because of the deletion, free`

`‎src/backend/access/hash/hashpage.c‎`

Lines changed: 7 additions & 7 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,7 +8,7 @@`
`8`	`8`	`*`
`9`	`9`	`*`
`10`	`10`	`* IDENTIFICATION`
`11`		`- * $Header: /cvsroot/pgsql/src/backend/access/hash/hashpage.c,v 1.34 2002/01/15 22:14:16 tgl Exp $`
	`11`	`+ * $Header: /cvsroot/pgsql/src/backend/access/hash/hashpage.c,v 1.35 2002/03/06 20:49:42 momjian Exp $`
`12`	`12`	`*`
`13`	`13`	`* NOTES`
`14`	`14`	`* Postgres hash pages look like ordinary relation pages. The opaque`
`@@ -151,7 +151,7 @@ _hash_metapinit(Relation rel)`
`151`	`151`	`elog(ERROR,"Problem with _hash_initbitmap.");`
`152`	`152`
`153`	`153`	`/* all done */`
`154`		`-_hash_wrtnorelbuf(rel,metabuf);`
	`154`	`+_hash_wrtnorelbuf(metabuf);`
`155`	`155`
`156`	`156`	`/*`
`157`	`157`	`* initialize the first two buckets`
`@@ -260,7 +260,7 @@ _hash_wrtbuf(Relation rel, Buffer buf)`
`260`	`260`	`*or a reference to the buffer.`
`261`	`261`	`*/`
`262`	`262`	`void`
`263`		`-_hash_wrtnorelbuf(Relationrel,Bufferbuf)`
	`263`	`+_hash_wrtnorelbuf(Bufferbuf)`
`264`	`264`	`{`
`265`	`265`	`BlockNumberblkno;`
`266`	`266`
`@@ -383,7 +383,7 @@ _hash_pagedel(Relation rel, ItemPointer tid)`
`383`	`383`	`opaque= (HashPageOpaque)PageGetSpecialPointer(page);`
`384`	`384`
`385`	`385`	`PageIndexTupleDelete(page,offno);`
`386`		`-_hash_wrtnorelbuf(rel,buf);`
	`386`	`+_hash_wrtnorelbuf(buf);`
`387`	`387`
`388`	`388`	`if (PageIsEmpty(page)&& (opaque->hasho_flag&LH_OVERFLOW_PAGE))`
`389`	`389`	`{`
`@@ -505,7 +505,7 @@ _hash_splitpage(Relation rel,`
`505`	`505`	`nopaque->hasho_flag=LH_BUCKET_PAGE;`
`506`	`506`	`nopaque->hasho_oaddr=InvalidOvflAddress;`
`507`	`507`	`nopaque->hasho_bucket=nbucket;`
`508`		`-_hash_wrtnorelbuf(rel,nbuf);`
	`508`	`+_hash_wrtnorelbuf(nbuf);`
`509`	`509`
`510`	`510`	`/*`
`511`	`511`	`* make sure the old bucket isn't empty. advance 'opage' and friends`
`@@ -628,7 +628,7 @@ _hash_splitpage(Relation rel,`
`628`	`628`	`==InvalidOffsetNumber)`
`629`	`629`	`elog(ERROR,"_hash_splitpage: failed to add index item to %s",`
`630`	`630`	`RelationGetRelationName(rel));`
`631`		`-_hash_wrtnorelbuf(rel,nbuf);`
	`631`	`+_hash_wrtnorelbuf(nbuf);`
`632`	`632`
`633`	`633`	`/*`
`634`	`634`	`* now delete the tuple from the old bucket. after this`
`@@ -640,7 +640,7 @@ _hash_splitpage(Relation rel,`
`640`	`640`	`* instead of calling PageGetMaxOffsetNumber.`
`641`	`641`	`*/`
`642`	`642`	`PageIndexTupleDelete(opage,ooffnum);`
`643`		`-_hash_wrtnorelbuf(rel,obuf);`
	`643`	`+_hash_wrtnorelbuf(obuf);`
`644`	`644`	`omaxoffnum=OffsetNumberPrev(omaxoffnum);`
`645`	`645`
`646`	`646`	`/*`

`‎src/backend/access/hash/hashutil.c‎`

Lines changed: 4 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,14 +1,14 @@`
`1`	`1`	`/*-------------------------------------------------------------------------`
`2`	`2`	`*`
`3`		`- *btutils.c`
`4`		`- * Utility code for Postgresbtree implementation.`
	`3`	`+ *hashutil.c`
	`4`	`+ * Utility code for Postgreshash implementation.`
`5`	`5`	`*`
`6`	`6`	`* Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group`
`7`	`7`	`* Portions Copyright (c) 1994, Regents of the University of California`
`8`	`8`	`*`
`9`	`9`	`*`
`10`	`10`	`* IDENTIFICATION`
`11`		`- * $Header: /cvsroot/pgsql/src/backend/access/hash/hashutil.c,v 1.27 2001/10/0623:21:43tgl Exp $`
	`11`	`+ * $Header: /cvsroot/pgsql/src/backend/access/hash/hashutil.c,v 1.28 2002/03/0620:49:43momjian Exp $`
`12`	`12`	`*`
`13`	`13`	`*-------------------------------------------------------------------------`
`14`	`14`	`*/`
`@@ -21,7 +21,7 @@`
`21`	`21`
`22`	`22`
`23`	`23`	`ScanKey`
`24`		`-_hash_mkscankey(Relationrel,IndexTupleitup,HashMetaPagemetap)`
	`24`	`+_hash_mkscankey(Relationrel,IndexTupleitup)`
`25`	`25`	`{`
`26`	`26`	`ScanKeyskey;`
`27`	`27`	`TupleDescitupdesc;`

`‎src/backend/executor/nodeHash.c‎`

Lines changed: 2 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -7,7 +7,7 @@`
`7`	`7`	`* Portions Copyright (c) 1994, Regents of the University of California`
`8`	`8`	`*`
`9`	`9`	`*`
`10`		`- *$Id: nodeHash.c,v 1.60 2001/10/25 05:49:28 momjian Exp $`
	`10`	`+ *$Id: nodeHash.c,v 1.61 2002/03/06 20:49:44 momjian Exp $`
`11`	`11`	`*`
`12`	`12`	`*-------------------------------------------------------------------------`
`13`	`13`	`*/`
`@@ -115,8 +115,7 @@ ExecInitHash(Hash node, EState estate, Plan *parent)`
`115`	`115`	`HashState*hashstate;`
`116`	`116`	`Plan*outerPlan;`
`117`	`117`
`118`		`-SO1_printf("ExecInitHash: %s\n",`
`119`		`-"initializing hash node");`
	`118`	`+SO_printf("ExecInitHash: initializing hash node\n");`
`120`	`119`
`121`	`120`	`/*`
`122`	`121`	`* assign the node's execution state`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit7ab7467

File tree

10 files changed

10 files changed

`‎src/backend/access/hash/hash.c‎`

`‎src/backend/access/hash/hashfunc.c‎`

`‎src/backend/access/hash/hashinsert.c‎`

`‎src/backend/access/hash/hashovfl.c‎`

`‎src/backend/access/hash/hashpage.c‎`

`‎src/backend/access/hash/hashutil.c‎`

`‎src/backend/executor/nodeHash.c‎`

0 commit comments