Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-136599: Improve long_hash#136600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
eendebakpt wants to merge9 commits intopython:main
base:main
Choose a base branch
Loading
fromeendebakpt:long_hash

Conversation

eendebakpt
Copy link
Contributor

@eendebakpteendebakpt commentedJul 12, 2025
edited by bedevere-appbot
Loading

  • There is a corner case in the C code that not tested inlong_hash: the final check
    if (x == (Py_uhash_t)-1)        x = (Py_uhash_t)-2;
  • We can improve performance oflong_hash by unrolling the first two digits

For reviewers: the first commit adds a test, the second and third commit unroll the hash calculation for the digits. The performance gain is not a lot, so I would be fine with reverting the last two commits. A test scripts:

import pyperfsetup = """z = 2 << 30two_digit_ints = list(range(z, z + 2000))def long_hash_small_int():    for _ in range(4):        for ii in range(0,250):            hash(ii)def long_hash_one_digit():    z = 1000    for ii in range(z, z + 1000):        hash(ii)def long_hash_two_digit():    z = 2 << 30    for ii in range(z, z + 1000):        hash(ii)def long_hash_multi_digit():    z = 1 << 30 << 30 << 30 << 30    for ii in range(z, z + 1000):        hash(ii)"""runner = pyperf.Runner()runner.timeit(name="long_hash_small_int", stmt="long_hash_small_int()", setup=setup)runner.timeit(name="long_hash_one_digit", stmt="long_hash_one_digit()", setup=setup)runner.timeit(name="long_hash_two_digit", stmt="long_hash_two_digit()", setup=setup)runner.timeit(name="long_hash_multi_digit", stmt="long_hash_multi_digit()", setup=setup)runner.timeit(name="set(ints)", stmt="set(two_digit_ints)", setup=setup)

On my system I get a fairly consistent performance increase of 10% for theset(ints) test. For the other benchmarks my system (thermal throttled laptop) is not stable enough (the benchmarks are also a bit dominated by the creation of new integers).

Comment on lines 1698 to 1700
assert hash(10) == 10
assert hash(-1) == -2
assert hash(-2**61) != -1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
asserthash(10)==10
asserthash(-1)==-2
asserthash(-2**61)!=-1
assertEqual(hash(10),10)
assertEqual(hash(-1),-2)
assertNotEqual(hash(-2**61),-1)

Such methods are generally preferred (e.g. the above test) as they provide better errors.

chris-eibl reacted with thumbs up emoji
--i;
x = ((x << PyLong_SHIFT));
x += v->long_value.ob_digit[i];
assert(x < _PyHASH_MODULUS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
assert(x<_PyHASH_MODULUS);
if(x>=_PyHASH_MODULUS)
x-=_PyHASH_MODULUS;

This assert will trigger for values likehash(2**31) for 32bit targets which use the default#define PYLONG_BITS_IN_DIGIT 30

/* PYLONG_BITS_IN_DIGIT describes the number of bits per "digit" (limb) in the
* PyLongObject implementation (longintrepr.h). It's currently either 30 or 15,
* defaulting to 30. The 15-bit digit option may be removed in the future.
*/
#ifndef PYLONG_BITS_IN_DIGIT
#definePYLONG_BITS_IN_DIGIT30
#endif

because
typedef Py_ssize_t Py_hash_t;

and thusPyHASH_BITS is 31
#ifSIZEOF_VOID_P >=8
# definePyHASH_BITS 61
#else
# definePyHASH_BITS 31
#endif
#definePyHASH_MODULUS (((size_t)1 << _PyHASH_BITS) - 1)

I suggest to add more tests totest_long_hash around these corner cases for such build configurations:

>>> hash(-2**31 - 2)-3>>> hash(-2**31 - 1)-2>>> hash(-2**31)-2>>> hash(2**31)1>>> hash(2**31 + 1)2>>> hash(2**31 + 2)3

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I added a check thathash(-2**31) is not-1. For the others I has a bit hesitant as the values are implementation details (document as implementation details here:https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types). To add tests I would have to get the (private) value ofPyHASH_MODULUS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Interesting, your link also mentions thatPyHASH_MODULUS is available viahttps://docs.python.org/3/library/sys.html#sys.hash_info.modulus, which is e.g. used in

_PyHASH_MODULUS=sys.hash_info.modulus

and some other tests. But yeah, seems to be an implementation detail. Don't know for sure whether it's ok to use it in tests - but I think so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

PyHASH_MODULUS is not private anymore. And it was documented in the algorithm description before.

@@ -1693,5 +1693,11 @@ class MyInt(int):
# GH-117195 -- This shouldn't crash
object.__sizeof__(1)

def test_long_hash(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

+1 on introducing hash tests intest_long.py even though there are

deftest_hash(self):

andLib/test/test_hash.py

There are special tests with respect to hashing in many other test modules, e.g.

deftest_hash(self):

so IMHO this is a good fit 👍

@rhettingerrhettinger requested a review fromtim-oneJuly 14, 2025 03:14
@serhiy-storchaka
Copy link
Member

This may be not so on a build with 15-bit digits.

In general, this should be conditional, depending on the size of the compact representation and the size of the digit, at compile time.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@skirpichevskirpichevskirpichev left review comments

@StanFromIrelandStanFromIrelandStanFromIreland left review comments

@chris-eiblchris-eiblchris-eibl left review comments

@tim-onetim-oneAwaiting requested review from tim-one

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@eendebakpt@serhiy-storchaka@skirpichev@StanFromIreland@chris-eibl

[8]ページ先頭

©2009-2025 Movatter.jp