Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-99540: Constant hash for _PyNone_Type to aid reproducibility#99541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
ethanfurman merged 2 commits intopython:mainfromyonillasky:none-constant-hash
Dec 16, 2022

Conversation

yonillasky
Copy link
Contributor

@yonillaskyyonillasky commentedNov 16, 2022
edited by bedevere-bot
Loading

@bedevere-bot
Copy link

Most changes to Pythonrequire a NEWS entry.

Please add it using theblurb_it web app or theblurb command-line tool.

@ghost
Copy link

ghost commentedNov 16, 2022
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.
CLA signed

@ethanfurmanethanfurman changed the titlegh-99540: Constant hash for _PyNone_Typegh-99540: Constant hash for _PyNone_Type to aid reproducibilityDec 10, 2022
@ethanfurman
Copy link
Member

None, likeTrue andFalse, is a special object in Python, and should have either a constant hash, or one that can be set by specifying the random hash seed.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@yonillasky
Copy link
ContributorAuthor

yonillasky commentedDec 10, 2022
edited
Loading

There is some value in making it dependent on the hash secret (PYTHONHASHSEED):

  • for intentional fuzzing purposes
  • it addresses these nebulous concerns I've seen on the discuss forum, that people might start unwittingly depending on the specific constant value of this hash

The cost of doing that is an extra global variable and a single calculation done once upon setting the hash secret. I think that is a reasonable cost - I'll try writing it that way.

EDIT: now that I did, having second thoughts - it's a larger change to CPython, and perhaps people who run their code only on platforms with ASLR disabled today will be surprised by the "non-determinism by default" we'll be introducing here.

The hash secret was introduced to negate chosen input complexity attacks on builtin hash sets/maps, which is not relevant here, as None is a singleton. The "fuzzing" use case is secondary.

I'm willing to go with either solution. Personally, I will benefit more from the PYTHONHASHSEED dependent hash, but I don't know that it's the right thing to do for general use.

@bedevere-bot
Copy link

Most changes to Pythonrequire a NEWS entry.

Please add it using theblurb_it web app or theblurb command-line tool.

@netlify
Copy link

netlifybot commentedDec 10, 2022
edited
Loading

Deploy Preview forpython-cpython-preview canceled.

NameLink
🔨 Latest commita1cbd39
🔍 Latest deploy loghttps://app.netlify.com/sites/python-cpython-preview/deploys/6394e55392d2760008ffb6e2

@bedevere-bot
Copy link

Most changes to Pythonrequire a NEWS entry.

Please add it using theblurb_it web app or theblurb command-line tool.

@yonillasky
Copy link
ContributorAuthor

I'm waiting with the blurb update until we make a final decision whether to go with a hash-secret-dependent calculation or not.

@yonillasky
Copy link
ContributorAuthor

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@ethanfurman: please review the changes made to this pull request.

Copy link
Member

@ethanfurmanethanfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let's do a simple constant hash forNone.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Member

@methanemethane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How 0xFCA86420 is chosen?
Is it better than random value?

@ethanfurman
Copy link
Member

The constant was chosen basically at whim (that particular value amused me); it's easier to implement as a constant because if we went with a random number we would have to make it contingent on the hash seed used to randomizestr andbyte hashes so that different runs of a program could be consistent (a virtual necessity for bug-hunting).

If it was a truly random value (which it currently is if Python was compiled with ASLR), then program results can vary from one run to the next with no way to force them to be the same (sinceNone's hash is not currently dependent on thePYTHONHASHSEED).

@methane
Copy link
Member

it's easier to implement as a constant because if we went with a random number we would have to make it contingent on the hash seed used to randomizestr andbyte hashes so that different runs of a program could be consistent (a virtual necessity for bug-hunting).

I just meant constant random number, like0xb1f921305a6be69e.

Since hashtable (dict and set) uses lower bits first, filling some lower bits can reduce hash collision against small constant ints.

But this is not a big issue because None is just one singleton.

@yonillasky
Copy link
ContributorAuthor

yonillasky commentedDec 12, 2022
edited
Loading

I just meant constant random number, like 0xb1f921305a6be69e.

Just noting that if a 64-bit literal is used, we'd presumably need to add a static check for 32-bit builds, and add a 32-bit "fallback" value for that case.

@ethanfurman
Copy link
Member

Good point. Let's just use a 32-bit number instead.

@ethanfurman
Copy link
Member

Actually, unless I'm missing something, that is a 32-bit number.

@ericvsmith
Copy link
Member

I think that was in response to Inada-san's suggestion of0xb1f921305a6be69e.

@ethanfurman
Copy link
Member

Ah, that's what I was missing. I wish we had a reply directly to a comment ability. :-(

@ethanfurmanethanfurman merged commit432117c intopython:mainDec 16, 2022
carljm added a commit to carljm/cpython that referenced this pull requestDec 16, 2022
* main:pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541)pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168)pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302)  "Compound statement" docs: Fix with-statement step indexing (python#100286)pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278)
shihai1991 added a commit to shihai1991/cpython that referenced this pull requestDec 18, 2022
* origin/main: (1306 commits)  CorrectCVE-2020-10735 documentation (python#100306)pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273)pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551)  Docs: Use `PY_VERSION_HEX` for version comparison (python#100179)pythongh-97909: Fix markup for `PyMethodDef` members (python#100089)pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205)pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469)pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541)pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168)pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302)  "Compound statement" docs: Fix with-statement step indexing (python#100286)pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278)  Improve stats presentation for calls. (pythonGH-100274)  Better stats for `LOAD_ATTR` and `STORE_ATTR` (pythonGH-100295)pythongh-81057: Move the Cached Parser Dummy Name to _PyRuntimeState (python#100277)  Document that zipfile's pwd parameter is a `bytes` object (python#100209)pythongh-99767: mark `PyTypeObject.tp_watched` as internal use only in table (python#100271)  Fix typo in introduction.rst (python#100266)  ...
@Lucas-C
Copy link
Contributor

Thank you for this, it allowed to make my project deterministic!

I dropped a comment about it there:https://stackoverflow.com/questions/7681786/how-is-hashnone-calculated/75296606#75296606

yonillasky reacted with thumbs up emoji

d-torrance added a commit to d-torrance/M2 that referenced this pull requestJun 26, 2023
This way, we don't need the "getPythonNone" function, which was onlycalled once, to create the object.  Beginning with Python 3.12, Nonewill have a constant hash anyway.  We use this new value.  Seepython/cpython#99541.
d-torrance added a commit to d-torrance/M2 that referenced this pull requestAug 29, 2023
This way, we don't need the "getPythonNone" function, which was onlycalled once, to create the object.  Beginning with Python 3.12, Nonewill have a constant hash anyway.  We use this new value.  Seepython/cpython#99541.
d-torrance added a commit to d-torrance/M2 that referenced this pull requestAug 30, 2023
This way, we don't need the "getPythonNone" function, which was onlycalled once, to create the object.  Beginning with Python 3.12, Nonewill have a constant hash anyway.  We use this new value.  Seepython/cpython#99541.
mikestillman pushed a commit to Macaulay2/M2 that referenced this pull requestSep 13, 2023
This way, we don't need the "getPythonNone" function, which was onlycalled once, to create the object.  Beginning with Python 3.12, Nonewill have a constant hash anyway.  We use this new value.  Seepython/cpython#99541.
@gsakkis
Copy link

Is this merged into 3.12? There's no mention in thechangelog.

@yonillaskyyonillasky deleted the none-constant-hash branchMarch 1, 2024 23:47
@yonillasky
Copy link
ContributorAuthor

yonillasky commentedMar 1, 2024
edited
Loading

Is this merged into 3.12? There's no mention in thechangelog.

The PR did update the blurb, and it appears ingit log when I check out branch 3.12.

What makes a given PR worthy of appearing in the changelog?
This one is not part of any PEP, nor was it planned work, or something that affects any language requirements. There is no contractual obligation that the hash will stay constant in the future, or that sets will continue to have iteration stability if used with deterministic hashes.

russellb added a commit to russellb/vllm that referenced this pull requestFeb 3, 2025
Prefix caching makes use of Python's built-in `hash()` function. As ofPython 3.12, the behavior of `hash(None)` has changed to be apredictable constant value. This makes it more feasible that someonecould try exploit hash collisions.The impact of a collision would be using cache that was generated usingdifferent content. Given knowledge of prompts in use and predictable hashingbehavior, someone could intentionally populate the cache using a promptknown to collide with another prompt in use. There doesn't seem to be muchvalue to an attacker in doing this, but it's certainly not ideal, and couldinterfere with the accuracy of results for another user.The invasiveness of this fix should be weighed against the severity of theissue to determine whether this is worth fixing.Using a hashing algorithm that is less prone to collision (like sha256, forexample) would be the best way to avoid the possibility of a collision.However, it would have an impact to both performance and memory footprint.An alternative is to continue to use `hash()`, but make it much more difficultto predict the hash value.What we want is that the starting hash value is randomized, which is thebehavior we got here prior to Python 3.12. An easy fix is to use astring. Here we use `'None'` to still make it clear we're starting fromnothing, but with a string we'll get a different hash value each timevllm runs. Note that within a given run, the value will remain the same.This restores the safer hashing behavior from before.Thank you very much to@kexinoh for reporting this concern privately sothat it could be evaluated for its severity prior to our decision to fixthis as a security enhancement.The commit that changed this behavior for Python 3.12 is here:-python/cpython@432117c-python/cpython#99541Signed-off-by: Russell Bryant <rbryant@redhat.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@methanemethanemethane left review comments

@ethanfurmanethanfurmanethanfurman approved these changes

Assignees

@ethanfurmanethanfurman

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

8 participants
@yonillasky@bedevere-bot@ethanfurman@methane@ericvsmith@Lucas-C@gsakkis@rhettinger

[8]ページ先頭

©2009-2025 Movatter.jp