NotificationsYou must be signed in to change notification settings
Fork32.4k
Star67.9k

gh-111545: Add Py_HashDouble() function#113115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

vstinner wants to merge2 commits intopython:mainfromvstinner:hash_double4

Closed

gh-111545: Add Py_HashDouble() function#113115

vstinner wants to merge2 commits intopython:mainfromvstinner:hash_double4

Conversation

Copy link

Member

vstinner commentedDec 14, 2023•
edited by github-actionsbot
Loading

Add tests: Modules/_testcapi/hash.c and
Lib/test/test_capi/test_hash.py.

Issue:Make_Py_HashDouble public again as "unstable" API #111545

📚 Documentation preview 📚:https://cpython-previews--113115.org.readthedocs.build/

pythongh-111545: Add Py_HashDouble() function

9b00e3e

Add tests: Modules/_testcapi/hash.c andLib/test/test_capi/test_hash.py.

vstinner requested a review fromtiran as acode owner

December 14, 2023 14:48

bedevere-appbot added the awaiting core review label

Dec 14, 2023

bedevere-appbot mentioned this pull request

Dec 14, 2023

Make_Py_HashDouble public again as "unstable" API#111545

Closed

vstinner mentioned this pull request

Dec 14, 2023

gh-111545: Add Py_HashDouble() function#112449

Closed

Fix Sphinx syntax

67b4eb8

serhiy-storchaka approved these changes

Dec 15, 2023

View reviewed changes

Python/pyhash.c

		@@ -84,17 +84,20 @@ static Py_ssize_t hashstats[Py_HASH_STATS_MAX + 1] = {0};
		*/

		Py_hash_t
		_Py_HashDouble(PyObject *inst, double v)
		Py_HashDouble(double v)
		{
		int e, sign;
		double m;
		Py_uhash_t x, y;

		if (!Py_IS_FINITE(v)) {

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What if remove this and keep onlyPy_IS_INFINITY(v) check?

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If prefer to have a deterministic behavior and always return the same hash value (0) ifvalue is NaN. There are legit use cases to treat NaN as hash value 0.

With the following change, only check for,Py_HashDouble() hangs (fail to exit the loop) ifvalue is NaN.

diff --git a/Python/pyhash.c b/Python/pyhash.cindex f64edde4043..23aa2dac7cc 100644--- a/Python/pyhash.c+++ b/Python/pyhash.c@@ -90,14 +90,8 @@ Py_HashDouble(double v)     double m;     Py_uhash_t x, y;-    if (!Py_IS_FINITE(v)) {-        if (Py_IS_INFINITY(v)) {-            return (v > 0 ? _PyHASH_INF : -_PyHASH_INF);-        }-        else {-            assert(Py_IS_NAN(v));-            return 0;-        }+    if (Py_IS_INFINITY(v)) {+        return (v > 0 ? _PyHASH_INF : -_PyHASH_INF);     }      m = frexp(v, &e);

With the following change, Py_HashDouble() returns-_PyHASH_INF ifvalue is NaN, sinceNaN > 0 is false:

diff --git a/Python/pyhash.c b/Python/pyhash.cindex f64edde4043..a853d6dad99 100644--- a/Python/pyhash.c+++ b/Python/pyhash.c@@ -91,13 +91,8 @@ Py_HashDouble(double v)     Py_uhash_t x, y;      if (!Py_IS_FINITE(v)) {-        if (Py_IS_INFINITY(v)) {-            return (v > 0 ? _PyHASH_INF : -_PyHASH_INF);-        }-        else {-            assert(Py_IS_NAN(v));-            return 0;-        }+        // v can be NaN+        return (v > 0 ? _PyHASH_INF : -_PyHASH_INF);     }      m = frexp(v, &e);

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What if usePy_IS_INFINITY() instead of!Py_IS_FINITE()?

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My first attempt (first patch in my comment) leads to a hang if you pass NaN.

Why do you want to avoid!Py_IS_FINITE +Py_IS_INFINITY check? Are you worried about performance?

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Recipe of What's New in Python 3.13:

Py_hash_thash_double(PyObject*obj,doublevalue)     {if (!Py_IS_NAN(value)) {returnPy_HashDouble(value);         }else {returnPy_HashPointer(obj);         }     }

Using this recipe and the current implementation, there are 3 code paths:

NaN: 1 test (Py_IS_NAN()),hash_double() callsPy_HashPointer().
infinity: 3 tests (!Py_IS_NAN(),!Py_IS_FINITE(),Py_IS_INFINITY()),return (v > 0 ? _PyHASH_INF : -_PyHASH_INF).
finite: 2 tests (!Py_IS_NAN(),Py_IS_FINITE()), the loop.

I don't think that it's a big deal to add 1 or 2 tests per float point number. I care more about the API, having a deterministic behavior for the 3 cases.

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I want to avoid any promises about NaN. It should be recommended to not use this function for NaN.

Doc/c-api/hash.rst

Comment on lines +55 to +66

		* If value is positive infinity, return :data:`sys.hash_info.inf
		<sys.hash_info>`.
		* If value is negative infinity, return :data:`-sys.hash_info.inf
		<sys.hash_info>`.
		* If value is not-a-number (NaN), return :data:`sys.hash_info.nan
		<sys.hash_info>` (``0``).
		* Otherwise, return the hash value of the finite value number.

		.. note::
		Return the hash value ``0`` for the floating point numbers ``-0.0`` and
		``+0.0``, and for not-a-number (NaN). ``Py_IS_NAN(value)`` can be used to
		check if value is not-a-number.

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It exposes too much implementation details why already exposed in different place. Why not simply say that it is equivalent to hash() of Python float object if it is not a NaN? And if it is a NaN, you should use other value to avoid collisions.

Copy link

MemberAuthor

vstinnerDec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Are you talking about the note, or describing the 3 cases and return values? I can just remove the note. My idea is to suggest using Py_IS_NAN() to treate NaN differently. But I'm not sure which implementation to suggest.

@zooba says that if you have a Python object, just callPyObject_Hash(obj) on it 😁

Copy link

Member

serhiy-storchakaDec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

About describing all 3 cases. It should already be described in other place (documentation forsys.hash_info orfloat orhash()), and if it is not described in details, than it is not necessary for users. You should only document that for non-NaN values it returns the same result as for hash() for Python float object.

bedevere-appbot added awaiting merge and removed awaiting core review labels

Dec 15, 2023

vstinner mentioned this pull request

Dec 16, 2023

Add Py_HashDouble() functioncapi-workgroup/decisions#2

Closed

4 tasks

Copy link

MemberAuthor

vstinner commentedDec 20, 2023

I created PR#112095 more than 1 month ago. I spent time to run benchmark, implement different APIs, try to collect feedback on each API, and discuss in length advantages and disadvantages of each API. Sadly, we failed to reach a consensus on the API. Nowanother API is being discussed. The API looks simple to me, I didn't expect to spend more than one month on a single function.

I need to take a break from that topic. I don't have the energy to dig into these discussions. I prefer to close the PR for now.