Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitb83033c

Browse files
committed
Further cosmetic review of hashfn_unstable.h
In follow-up toe97b672,* Flesh out comments explaining the incremental interface* Clarify detection of zero bytes when hashing aligned C stringsThe latter was suggested and reviewed by Jeff DavisDiscussion:https://postgr.es/m/48e8f8bbe0be9c789f98776c7438244ab7a7cc63.camel%40j-davis.com
1 parent9ed3ee5 commitb83033c

File tree

1 file changed

+49
-27
lines changed

1 file changed

+49
-27
lines changed

‎src/include/common/hashfn_unstable.h

Lines changed: 49 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -53,24 +53,40 @@
5353
* fasthash as implemented here has two interfaces:
5454
*
5555
* 1) Standalone functions, e.g. fasthash32() for a single value with a
56-
* known length.
56+
* known length. These return the same hash code as the original, at
57+
* least on little-endian machines.
5758
*
5859
* 2) Incremental interface. This can used for incorporating multiple
59-
* inputs. The standalone functions use this internally, so see fasthash64()
60-
* for an an example of how this works.
61-
*
62-
* The incremental interface is especially useful if any of the inputs
63-
* are NUL-terminated C strings, since the length is not needed ahead
64-
* of time. This avoids needing to call strlen(). This case is optimized
65-
* in fasthash_accum_cstring() :
60+
* inputs. First, initialize the hash state (here with a zero seed):
6661
*
6762
* fasthash_state hs;
6863
* fasthash_init(&hs, 0);
69-
* len = fasthash_accum_cstring(&hs, *str);
64+
*
65+
* If the inputs are of types that can be trivially cast to uint64, it's
66+
* sufficient to do:
67+
*
68+
* hs.accum = value1;
69+
* fasthash_combine(&hs);
70+
* hs.accum = value2;
71+
* fasthash_combine(&hs);
7072
* ...
71-
* return fasthash_final32(&hs, len);
7273
*
73-
* The length is computed on-the-fly. Experimentation has found that
74+
* For longer or variable-length input, fasthash_accum() is a more
75+
* flexible, but more verbose method. The standalone functions use this
76+
* internally, so see fasthash64() for an an example of this.
77+
*
78+
* After all inputs have been mixed in, finalize the hash:
79+
*
80+
* hashcode = fasthash_final32(&hs, 0);
81+
*
82+
* The incremental interface allows an optimization for NUL-terminated
83+
* C strings:
84+
*
85+
* len = fasthash_accum_cstring(&hs, str);
86+
* hashcode = fasthash_final32(&hs, len);
87+
*
88+
* By handling the terminator on-the-fly, we can avoid needing a strlen()
89+
* call to tell us how many bytes to hash. Experimentation has found that
7490
* SMHasher fails unless we incorporate the length, so it is passed to
7591
* the finalizer as a tweak.
7692
*/
@@ -204,26 +220,33 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
204220
{
205221
constchar*conststart=str;
206222
intremainder;
207-
uint64zero_bytes_le;
223+
uint64zero_byte_low;
208224

209225
Assert(PointerIsAligned(start,uint64));
226+
227+
/*
228+
* For every chunk of input, check for zero bytes before mixing into the
229+
* hash. The chunk with zeros must contain the NUL terminator. We arrange
230+
* so that zero_byte_low tells us not only that a zero exists, but also
231+
* where it is, so we can hash the remainder of the string.
232+
*
233+
* The haszero64 calculation will set bits corresponding to the lowest
234+
* byte where a zero exists, so that suffices for little-endian machines.
235+
* For big-endian machines, we would need bits set for the highest zero
236+
* byte in the chunk, since the trailing junk past the terminator could
237+
* contain additional zeros. haszero64 does not give us that, so we
238+
* byteswap the chunk first.
239+
*/
210240
for (;;)
211241
{
212242
uint64chunk=*(uint64*)str;
213243

214-
/*
215-
* With little-endian representation, we can use this calculation,
216-
* which sets bits in the first byte in the result word that
217-
* corresponds to a zero byte in the original word. The rest of the
218-
* bytes are indeterminate, so cannot be used on big-endian machines
219-
* without either swapping or a bytewise check.
220-
*/
221244
#ifdefWORDS_BIGENDIAN
222-
zero_bytes_le=haszero64(pg_bswap64(chunk));
245+
zero_byte_low=haszero64(pg_bswap64(chunk));
223246
#else
224-
zero_bytes_le=haszero64(chunk);
247+
zero_byte_low=haszero64(chunk);
225248
#endif
226-
if (zero_bytes_le)
249+
if (zero_byte_low)
227250
break;
228251

229252
hs->accum=chunk;
@@ -232,12 +255,11 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
232255
}
233256

234257
/*
235-
* For the last word, only use bytes up to the NUL for the hash. Bytes
236-
* with set bits will be 0x80, so calculate the first occurrence of a zero
237-
* byte within the input word by counting the number of trailing (because
238-
* little-endian) zeros and dividing the result by 8.
258+
* The byte corresponding to the NUL will be 0x80, so the rightmost bit
259+
* position will be in the range 7, 15, ..., 63. Turn this into byte
260+
* position by dividing by 8.
239261
*/
240-
remainder=pg_rightmost_one_pos64(zero_bytes_le) /BITS_PER_BYTE;
262+
remainder=pg_rightmost_one_pos64(zero_byte_low) /BITS_PER_BYTE;
241263
fasthash_accum(hs,str,remainder);
242264
str+=remainder;
243265

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp