Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
gh-136599: Improve long_hash#136600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
gh-136599: Improve long_hash#136600
Conversation
Lib/test/test_long.py Outdated
assert hash(10) == 10 | ||
assert hash(-1) == -2 | ||
assert hash(-2**61) != -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
asserthash(10)==10 | |
asserthash(-1)==-2 | |
asserthash(-2**61)!=-1 | |
assertEqual(hash(10),10) | |
assertEqual(hash(-1),-2) | |
assertNotEqual(hash(-2**61),-1) |
Such methods are generally preferred (e.g. the above test) as they provide better errors.
--i; | ||
x = ((x << PyLong_SHIFT)); | ||
x += v->long_value.ob_digit[i]; | ||
assert(x < _PyHASH_MODULUS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
assert(x<_PyHASH_MODULUS); | |
if(x>=_PyHASH_MODULUS) | |
x-=_PyHASH_MODULUS; |
This assert will trigger for values likehash(2**31)
for 32bit targets which use the default#define PYLONG_BITS_IN_DIGIT 30
Lines 132 to 138 in0d4fd10
/* PYLONG_BITS_IN_DIGIT describes the number of bits per "digit" (limb) in the | |
* PyLongObject implementation (longintrepr.h). It's currently either 30 or 15, | |
* defaulting to 30. The 15-bit digit option may be removed in the future. | |
*/ | |
#ifndef PYLONG_BITS_IN_DIGIT | |
#definePYLONG_BITS_IN_DIGIT30 | |
#endif |
because
Line 170 in0d4fd10
typedef Py_ssize_t Py_hash_t; |
and thus
PyHASH_BITS
is 31cpython/Include/cpython/pyhash.h
Lines 12 to 18 in0d4fd10
#ifSIZEOF_VOID_P >=8 | |
# definePyHASH_BITS 61 | |
#else | |
# definePyHASH_BITS 31 | |
#endif | |
#definePyHASH_MODULUS (((size_t)1 << _PyHASH_BITS) - 1) |
I suggest to add more tests totest_long_hash
around these corner cases for such build configurations:
>>> hash(-2**31 - 2)-3>>> hash(-2**31 - 1)-2>>> hash(-2**31)-2>>> hash(2**31)1>>> hash(2**31 + 1)2>>> hash(2**31 + 2)3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I added a check thathash(-2**31)
is not-1
. For the others I has a bit hesitant as the values are implementation details (document as implementation details here:https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types). To add tests I would have to get the (private) value ofPyHASH_MODULUS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Interesting, your link also mentions thatPyHASH_MODULUS
is available viahttps://docs.python.org/3/library/sys.html#sys.hash_info.modulus, which is e.g. used in
_PyHASH_MODULUS=sys.hash_info.modulus |
and some other tests. But yeah, seems to be an implementation detail. Don't know for sure whether it's ok to use it in tests - but I think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
PyHASH_MODULUS is not private anymore. And it was documented in the algorithm description before.
Lib/test/test_long.py Outdated
@@ -1693,5 +1693,11 @@ class MyInt(int): | |||
# GH-117195 -- This shouldn't crash | |||
object.__sizeof__(1) | |||
def test_long_hash(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
+1 on introducing hash tests intest_long.py
even though there are
cpython/Lib/test/test_builtin.py
Line 1165 in3dbe02c
deftest_hash(self): |
and
Lib/test/test_hash.py
There are special tests with respect to hashing in many other test modules, e.g.
cpython/Lib/test/test_float.py
Line 635 in9e5cebd
deftest_hash(self): |
so IMHO this is a good fit 👍
This may be not so on a build with 15-bit digits. In general, this should be conditional, depending on the size of the compact representation and the size of the digit, at compile time. |
Uh oh!
There was an error while loading.Please reload this page.
long_hash
: the final checklong_hash
by unrolling the first two digitsFor reviewers: the first commit adds a test, the second and third commit unroll the hash calculation for the digits. The performance gain is not a lot, so I would be fine with reverting the last two commits. A test scripts:
On my system I get a fairly consistent performance increase of 10% for the
set(ints)
test. For the other benchmarks my system (thermal throttled laptop) is not stable enough (the benchmarks are also a bit dominated by the creation of new integers).