Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit7ab7467

Browse files
committed
I've attached a patch which implements Bob Jenkin's hash function for
PostgreSQL. This hash function replaces the one used by hash indexes andthe catalog cache. Hash joins use a different, relatively poor-qualityhash function, but I'll fix that later.As suggested by Tom Lane, this patch also changes the size of the fixedhash table used by the catalog cache to be a power-of-2 (instead of aprime: I chose 256 instead of 257). This allows the catcache to lookuphash buckets using a simple bitmask. This should improve the performanceof the catalog cache slightly, since the previous method (modulo aprime) was slow.In my tests, this improves the performance of hash indexes by between 4%and 8%; the performance when using btree indexes or seqscans isbasically unchanged.Neil Conway <neilconway@rogers.com>
1 parent5b5cef9 commit7ab7467

File tree

10 files changed

+127
-97
lines changed

10 files changed

+127
-97
lines changed

‎src/backend/access/hash/hash.c

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/hash/hash.c,v 1.54 2002/03/02 21:39:16 momjian Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/hash/hash.c,v 1.55 2002/03/06 20:49:37 momjian Exp $
1212
*
1313
* NOTES
1414
* This file contains only the public interface routines.
@@ -165,9 +165,6 @@ hashinsert(PG_FUNCTION_ARGS)
165165
char*nulls= (char*)PG_GETARG_POINTER(2);
166166
ItemPointerht_ctid= (ItemPointer)PG_GETARG_POINTER(3);
167167

168-
#ifdefNOT_USED
169-
RelationheapRel= (Relation)PG_GETARG_POINTER(4);
170-
#endif
171168
InsertIndexResultres;
172169
HashItemhitem;
173170
IndexTupleitup;
@@ -333,7 +330,6 @@ hashendscan(PG_FUNCTION_ARGS)
333330

334331
/*
335332
*hashmarkpos() -- save current scan position
336-
*
337333
*/
338334
Datum
339335
hashmarkpos(PG_FUNCTION_ARGS)

‎src/backend/access/hash/hashfunc.c

Lines changed: 83 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashfunc.c,v 1.31 2002/02/25 04:06:47 momjian Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashfunc.c,v 1.32 2002/03/06 20:49:38 momjian Exp $
1212
*
1313
* NOTES
1414
* These functions are stored in pg_amproc.For each operator class
@@ -95,6 +95,8 @@ hashname(PG_FUNCTION_ARGS)
9595
{
9696
char*key=NameStr(*PG_GETARG_NAME(0));
9797

98+
Assert(strlen(key) <=NAMEDATALEN);
99+
98100
returnhash_any(key,strlen(key));
99101
}
100102

@@ -116,61 +118,89 @@ hashvarlena(PG_FUNCTION_ARGS)
116118
returnresult;
117119
}
118120

121+
/* This hash function was written by Bob Jenkins
122+
* (bob_jenkins@burtleburtle.net), and superficially adapted
123+
* for PostgreSQL by Neil Conway. For more information on this
124+
* hash function, see http://burtleburtle.net/bob/hash/doobs.html
125+
*/
119126

120127
/*
121-
* hash_any --- compute a hash function for any specified chunk of memory
122-
*
123-
* This can be used as the underlying hash function for any pass-by-reference
124-
* data type in which there are no non-significant bits.
125-
*
126-
* (Comment from the original db3 hashing code: )
127-
*
128-
* This is INCREDIBLY ugly, but fast. We break the string up into 8 byte
129-
* units. On the first time through the loop we get the 'leftover bytes'
130-
* (strlen % 8). On every later iteration, we perform 8 HASHC's so we handle
131-
* all 8 bytes. Essentially, this saves us 7 cmp & branch instructions. If
132-
* this routine is heavily used enough, it's worth the ugly coding.
133-
*
134-
* "OZ's original sdbm hash"
128+
* mix -- mix 3 32-bit values reversibly.
129+
* For every delta with one or two bits set, and the deltas of all three
130+
* high bits or all three low bits, whether the original value of a,b,c
131+
* is almost all zero or is uniformly distributed,
132+
* - If mix() is run forward or backward, at least 32 bits in a,b,c
133+
* have at least 1/4 probability of changing.
134+
* - If mix() is run forward, every bit of c will change between 1/3 and
135+
* 2/3 of the time. (Well, 22/100 and 78/100 for some 2-bit deltas.)
136+
*/
137+
#definemix(a,b,c) \
138+
{ \
139+
a -= b; a -= c; a ^= (c>>13); \
140+
b -= c; b -= a; b ^= (a<<8); \
141+
c -= a; c -= b; c ^= (b>>13); \
142+
a -= b; a -= c; a ^= (c>>12); \
143+
b -= c; b -= a; b ^= (a<<16); \
144+
c -= a; c -= b; c ^= (b>>5); \
145+
a -= b; a -= c; a ^= (c>>3); \
146+
b -= c; b -= a; b ^= (a<<10); \
147+
c -= a; c -= b; c ^= (b>>15); \
148+
}
149+
150+
/*
151+
* hash_any() -- hash a variable-length key into a 32-bit value
152+
* k : the key (the unaligned variable-length array of bytes)
153+
* len : the length of the key, counting by bytes
154+
* Returns a 32-bit value. Every bit of the key affects every bit of
155+
* the return value. Every 1-bit and 2-bit delta achieves avalanche.
156+
* About 6*len+35 instructions. The best hash table sizes are powers
157+
* of 2. There is no need to do mod a prime (mod is sooo slow!).
158+
* If you need less than 32 bits, use a bitmask.
135159
*/
136160
Datum
137-
hash_any(constchar*keydata,intkeylen)
161+
hash_any(registerconstchar*k, registerintkeylen)
138162
{
139-
uint32n;
140-
intloop;
141-
142-
#defineHASHCn = *keydata++ + 65599 * n
143-
144-
n=0;
145-
if (keylen>0)
146-
{
147-
loop= (keylen+8-1) >>3;
148-
149-
switch (keylen& (8-1))
150-
{
151-
case0:
152-
do
153-
{/* All fall throughs */
154-
HASHC;
155-
case7:
156-
HASHC;
157-
case6:
158-
HASHC;
159-
case5:
160-
HASHC;
161-
case4:
162-
HASHC;
163-
case3:
164-
HASHC;
165-
case2:
166-
HASHC;
167-
case1:
168-
HASHC;
169-
}while (--loop);
170-
}
171-
}
172-
173-
#undef HASHC
174-
175-
PG_RETURN_UINT32(n);
163+
registerDatuma,b,c,len;
164+
165+
/* Set up the internal state */
166+
len=keylen;
167+
a=b=0x9e3779b9;/* the golden ratio; an arbitrary value */
168+
/* Another arbitrary value. If the hash function is called
169+
* multiple times, this could be the previously generated
170+
* hash value; however, the interface currently doesn't allow
171+
* this. AFAIK this isn't a big deal.
172+
*/
173+
c=3923095;
174+
175+
/* handle most of the key */
176+
while (len >=12)
177+
{
178+
a+= (k[0]+((Datum)k[1]<<8)+((Datum)k[2]<<16)+((Datum)k[3]<<24));
179+
b+= (k[4]+((Datum)k[5]<<8)+((Datum)k[6]<<16)+((Datum)k[7]<<24));
180+
c+= (k[8]+((Datum)k[9]<<8)+((Datum)k[10]<<16)+((Datum)k[11]<<24));
181+
mix(a,b,c);
182+
k+=12;len-=12;
183+
}
184+
185+
/* handle the last 11 bytes */
186+
c+=keylen;
187+
switch(len)/* all the case statements fall through */
188+
{
189+
case11:c+=((Datum)k[10]<<24);
190+
case10:c+=((Datum)k[9]<<16);
191+
case9 :c+=((Datum)k[8]<<8);
192+
/* the first byte of c is reserved for the length */
193+
case8 :b+=((Datum)k[7]<<24);
194+
case7 :b+=((Datum)k[6]<<16);
195+
case6 :b+=((Datum)k[5]<<8);
196+
case5 :b+=k[4];
197+
case4 :a+=((Datum)k[3]<<24);
198+
case3 :a+=((Datum)k[2]<<16);
199+
case2 :a+=((Datum)k[1]<<8);
200+
case1 :a+=k[0];
201+
/* case 0: nothing left to add */
202+
}
203+
mix(a,b,c);
204+
/* report the result */
205+
returnc;
176206
}

‎src/backend/access/hash/hashinsert.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashinsert.c,v 1.23 2001/10/25 05:49:21 momjian Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashinsert.c,v 1.24 2002/03/06 20:49:40 momjian Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -49,7 +49,7 @@ _hash_doinsert(Relation rel, HashItem hitem)
4949
itup=&(hitem->hash_itup);
5050
if ((natts=rel->rd_rel->relnatts)!=1)
5151
elog(ERROR,"Hash indices valid for only one index key.");
52-
itup_scankey=_hash_mkscankey(rel,itup,metap);
52+
itup_scankey=_hash_mkscankey(rel,itup);
5353

5454
/*
5555
* find the first page in the bucket chain containing this key and
@@ -232,7 +232,7 @@ _hash_pgaddtup(Relation rel,
232232
RelationGetRelationName(rel));
233233

234234
/* write the buffer, but hold our lock */
235-
_hash_wrtnorelbuf(rel,buf);
235+
_hash_wrtnorelbuf(buf);
236236

237237
returnitup_off;
238238
}

‎src/backend/access/hash/hashovfl.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashovfl.c,v 1.31 2001/10/25 05:49:21 momjian Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashovfl.c,v 1.32 2002/03/06 20:49:41 momjian Exp $
1212
*
1313
* NOTES
1414
* Overflow pages look like ordinary relation pages.
@@ -73,11 +73,11 @@ _hash_addovflpage(Relation rel, Buffer *metabufp, Buffer buf)
7373
ovflopaque->hasho_flag=LH_OVERFLOW_PAGE;
7474
ovflopaque->hasho_oaddr=oaddr;
7575
ovflopaque->hasho_bucket=pageopaque->hasho_bucket;
76-
_hash_wrtnorelbuf(rel,ovflbuf);
76+
_hash_wrtnorelbuf(ovflbuf);
7777

7878
/* logically chain overflow page to previous page */
7979
pageopaque->hasho_nextblkno=ovflblkno;
80-
_hash_wrtnorelbuf(rel,buf);
80+
_hash_wrtnorelbuf(buf);
8181
returnovflbuf;
8282
}
8383

@@ -574,7 +574,7 @@ _hash_squeezebucket(Relation rel,
574574
* the "next" ItemId.
575575
*/
576576
PageIndexTupleDelete(rpage,roffnum);
577-
_hash_wrtnorelbuf(rel,rbuf);
577+
_hash_wrtnorelbuf(rbuf);
578578

579579
/*
580580
* if the "read" page is now empty because of the deletion, free

‎src/backend/access/hash/hashpage.c

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashpage.c,v 1.34 2002/01/15 22:14:16 tgl Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashpage.c,v 1.35 2002/03/06 20:49:42 momjian Exp $
1212
*
1313
* NOTES
1414
* Postgres hash pages look like ordinary relation pages. The opaque
@@ -151,7 +151,7 @@ _hash_metapinit(Relation rel)
151151
elog(ERROR,"Problem with _hash_initbitmap.");
152152

153153
/* all done */
154-
_hash_wrtnorelbuf(rel,metabuf);
154+
_hash_wrtnorelbuf(metabuf);
155155

156156
/*
157157
* initialize the first two buckets
@@ -260,7 +260,7 @@ _hash_wrtbuf(Relation rel, Buffer buf)
260260
*or a reference to the buffer.
261261
*/
262262
void
263-
_hash_wrtnorelbuf(Relationrel,Bufferbuf)
263+
_hash_wrtnorelbuf(Bufferbuf)
264264
{
265265
BlockNumberblkno;
266266

@@ -383,7 +383,7 @@ _hash_pagedel(Relation rel, ItemPointer tid)
383383
opaque= (HashPageOpaque)PageGetSpecialPointer(page);
384384

385385
PageIndexTupleDelete(page,offno);
386-
_hash_wrtnorelbuf(rel,buf);
386+
_hash_wrtnorelbuf(buf);
387387

388388
if (PageIsEmpty(page)&& (opaque->hasho_flag&LH_OVERFLOW_PAGE))
389389
{
@@ -505,7 +505,7 @@ _hash_splitpage(Relation rel,
505505
nopaque->hasho_flag=LH_BUCKET_PAGE;
506506
nopaque->hasho_oaddr=InvalidOvflAddress;
507507
nopaque->hasho_bucket=nbucket;
508-
_hash_wrtnorelbuf(rel,nbuf);
508+
_hash_wrtnorelbuf(nbuf);
509509

510510
/*
511511
* make sure the old bucket isn't empty. advance 'opage' and friends
@@ -628,7 +628,7 @@ _hash_splitpage(Relation rel,
628628
==InvalidOffsetNumber)
629629
elog(ERROR,"_hash_splitpage: failed to add index item to %s",
630630
RelationGetRelationName(rel));
631-
_hash_wrtnorelbuf(rel,nbuf);
631+
_hash_wrtnorelbuf(nbuf);
632632

633633
/*
634634
* now delete the tuple from the old bucket. after this
@@ -640,7 +640,7 @@ _hash_splitpage(Relation rel,
640640
* instead of calling PageGetMaxOffsetNumber.
641641
*/
642642
PageIndexTupleDelete(opage,ooffnum);
643-
_hash_wrtnorelbuf(rel,obuf);
643+
_hash_wrtnorelbuf(obuf);
644644
omaxoffnum=OffsetNumberPrev(omaxoffnum);
645645

646646
/*

‎src/backend/access/hash/hashutil.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
/*-------------------------------------------------------------------------
22
*
3-
*btutils.c
4-
* Utility code for Postgresbtree implementation.
3+
*hashutil.c
4+
* Utility code for Postgreshash implementation.
55
*
66
* Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
77
* Portions Copyright (c) 1994, Regents of the University of California
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashutil.c,v 1.27 2001/10/0623:21:43tgl Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/hash/hashutil.c,v 1.28 2002/03/0620:49:43momjian Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -21,7 +21,7 @@
2121

2222

2323
ScanKey
24-
_hash_mkscankey(Relationrel,IndexTupleitup,HashMetaPagemetap)
24+
_hash_mkscankey(Relationrel,IndexTupleitup)
2525
{
2626
ScanKeyskey;
2727
TupleDescitupdesc;

‎src/backend/executor/nodeHash.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* Portions Copyright (c) 1994, Regents of the University of California
88
*
99
*
10-
*$Id: nodeHash.c,v 1.60 2001/10/25 05:49:28 momjian Exp $
10+
*$Id: nodeHash.c,v 1.61 2002/03/06 20:49:44 momjian Exp $
1111
*
1212
*-------------------------------------------------------------------------
1313
*/
@@ -115,8 +115,7 @@ ExecInitHash(Hash *node, EState *estate, Plan *parent)
115115
HashState*hashstate;
116116
Plan*outerPlan;
117117

118-
SO1_printf("ExecInitHash: %s\n",
119-
"initializing hash node");
118+
SO_printf("ExecInitHash: initializing hash node\n");
120119

121120
/*
122121
* assign the node's execution state

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp