Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
gh-111545: Add Py_HashDouble() function#112449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
@encukou@mdickinson@serhiy-storchaka: Would you mind to review this change? It's a new API for PyHash_Double() function: other APIs were discussed in PR#112095. |
I renamed the function from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The API looks fine to me. I'm wondering whether we want to consider switching the meaning of the return value, so that0
is used for non-NaNs and1
for NaNs; I want to think of the return value as essentially anis_nan
check, and from that perspective the current API has an extra negation to get my brain around.
But either way is probably fine (so long as it's documented).
Uh oh!
There was an error while loading.Please reload this page.
I rebased the PR on the main branch.
Ok, I modified the API to return 1 if the float is not-a-number (NaN), and return 0 otherwise. |
encukou commentedDec 4, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
In similar API, 1 means that you're getting a meaningful IMO, that's a good convention to establish (here, with |
Fair enough. That convention does feel somewhat in conflict with the common "zero-return = success", "non-zero-return = failure" API that we have elsewhere, but that's a wider issue. |
Agreed. |
If I'm reading the room correctly, we're reserving only |
I reverted my change to move back to the previous API:
@mdickinson@encukou: Would you mind to review/approve formally the PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good, thanks!
Other WG members might disagree, but I don't see anything too controversial any more.
I createdcapi-workgroup/decisions#2 for the C API Working Group. |
I merged my Py_HashPointer() PR. I rebased this PR on top on it. |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
I will update the PR to address the reviews, once the API will be approved by the C API Working Group. |
Uh oh!
There was an error while loading.Please reload this page.
* Add again the private _PyHASH_NAN constant.* Add tests: Modules/_testcapi/hash.c and Lib/test/test_capi/test_hash.py.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM, thank you!
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
I updated What's New in Python 3.13 to suggest a recipe replacing usage of the private
|
I would prefer a simpler API that does not treat NaN as special at all and leave this to the user:
Even if it is slightly slower. But it requires less mental effort to understand and use the API. |
What should an user pass in as |
@serhiy-storchaka proposes the API: I would be fine with this API. It was more or less proposed earlier. But it was said that having to call In terms of API,
|
I wrote PR#113115 which implements |
@serhiy-storchaka@mdickinson: So which API do you prefer?
|
I prefer your initial interface Unless there is really large performance advantage in using the second variant. I do not know how large should it be to justify more complex interface. 10% is not large. |
Ok.
If prefer to have a deterministic behavior and always return the same hash value (0) if value is NaN. There are legit use cases to treat NaN as hash value 0. Seemy comment. |
Ditto. I don't see a need for anything more complicated than this, and it feels wrong to me to allow minor performance differences to drive API design.
This sounds good to me. |
Ok. I proposed to change C API Working Group decision on this API:capi-workgroup/decisions#2 (comment) |
It creates a vulnerability. CPython itself never calls this API for a NaN. If a third-party code calls it for a NaN value, it will allow to easily create non-equal objects with the same hash. |
The simpler API,
|
Can this concern be resolved with better documentation? Explain in which case treating all NaN "as equal" can be an issue, or suggest a solution such as@serhiy-storchaka's recipe#112449 (comment) ? I'm not convinced that using the same hash value (0) for all NaN numbers is a big deal, since Python was doing that until 3.9. The hash value is just an optimization to avoid the slower comparison operator, it's more about performance than about correctness.
|
IMO, no. When doing a review, you don't check the docs of all functions you see. Just the surprising ones. |
It is a big deal. Try to create a dict with 10000 or 100000 of NaNs. d= {float('nan'):iforiinrange(100000)} |
Aha, I see: it's way faster in Python 3.10 where hash(nan) is not always 0.
But the proposed API is for For me, |
I'm talking about |
I'm also talking about the documentation. Users won't read the docs. If the function has surprising behaviour, adding a hint that you should look at the docs goes a long way. |
I created PR#112095 more than 1 month ago. I spent time to run benchmark, implement different APIs, try to collect feedback on each API, and discuss in length advantages and disadvantages of each API. Sadly, we failed to reach a consensus on the API. Nowanother API is being discussed. The API looks simple to me, I didn't expect to spend more than one month on a single function. I need to take a break from that topic. I don't have the energy to dig into these discussions. I prefer to close the PR for now. |
Uh oh!
There was an error while loading.Please reload this page.
_Py_HashDouble
public again as "unstable" API #111545📚 Documentation preview 📚:https://cpython-previews--112449.org.readthedocs.build/