Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-99540: Constant hash for _PyNone_Type to aid reproducibility#99541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
bedevere-bot commentedNov 16, 2022
Most changes to Pythonrequire a NEWS entry. Please add it using theblurb_it web app or theblurb command-line tool. |
ghost commentedNov 16, 2022 • edited by ghost
Loading Uh oh!
There was an error while loading.Please reload this page.
edited by ghost
Uh oh!
There was an error while loading.Please reload this page.
f216cd7
to0c08cac
Compare
|
Uh oh!
There was an error while loading.Please reload this page.
bedevere-bot commentedDec 10, 2022
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
yonillasky commentedDec 10, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There is some value in making it dependent on the hash secret (PYTHONHASHSEED):
The cost of doing that is an extra global variable and a single calculation done once upon setting the hash secret. I think that is a reasonable cost - I'll try writing it that way. EDIT: now that I did, having second thoughts - it's a larger change to CPython, and perhaps people who run their code only on platforms with ASLR disabled today will be surprised by the "non-determinism by default" we'll be introducing here. The hash secret was introduced to negate chosen input complexity attacks on builtin hash sets/maps, which is not relevant here, as None is a singleton. The "fuzzing" use case is secondary. I'm willing to go with either solution. Personally, I will benefit more from the PYTHONHASHSEED dependent hash, but I don't know that it's the right thing to do for general use. |
c12fc81
to961b41f
Comparebedevere-bot commentedDec 10, 2022
Most changes to Pythonrequire a NEWS entry. Please add it using theblurb_it web app or theblurb command-line tool. |
netlifybot commentedDec 10, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
✅ Deploy Preview forpython-cpython-preview canceled.
|
961b41f
tof154d65
Comparebedevere-bot commentedDec 10, 2022
Most changes to Pythonrequire a NEWS entry. Please add it using theblurb_it web app or theblurb command-line tool. |
Uh oh!
There was an error while loading.Please reload this page.
I'm waiting with the blurb update until we make a final decision whether to go with a hash-secret-dependent calculation or not. |
I have made the requested changes; please review again |
bedevere-bot commentedDec 10, 2022
Thanks for making the requested changes! @ethanfurman: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Let's do a simple constant hash forNone
.
bedevere-bot commentedDec 10, 2022
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
How 0xFCA86420 is chosen?
Is it better than random value?
The constant was chosen basically at whim (that particular value amused me); it's easier to implement as a constant because if we went with a random number we would have to make it contingent on the hash seed used to randomize If it was a truly random value (which it currently is if Python was compiled with ASLR), then program results can vary from one run to the next with no way to force them to be the same (since |
I just meant constant random number, like Since hashtable (dict and set) uses lower bits first, filling some lower bits can reduce hash collision against small constant ints. But this is not a big issue because None is just one singleton. |
yonillasky commentedDec 12, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Just noting that if a 64-bit literal is used, we'd presumably need to add a static check for 32-bit builds, and add a 32-bit "fallback" value for that case. |
Good point. Let's just use a 32-bit number instead. |
Actually, unless I'm missing something, that is a 32-bit number. |
I think that was in response to Inada-san's suggestion of |
Ah, that's what I was missing. I wish we had a reply directly to a comment ability. :-( |
* main:pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541)pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168)pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302) "Compound statement" docs: Fix with-statement step indexing (python#100286)pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278)
* origin/main: (1306 commits) CorrectCVE-2020-10735 documentation (python#100306)pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273)pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551) Docs: Use `PY_VERSION_HEX` for version comparison (python#100179)pythongh-97909: Fix markup for `PyMethodDef` members (python#100089)pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890)pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205)pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469)pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541)pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168)pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302) "Compound statement" docs: Fix with-statement step indexing (python#100286)pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278) Improve stats presentation for calls. (pythonGH-100274) Better stats for `LOAD_ATTR` and `STORE_ATTR` (pythonGH-100295)pythongh-81057: Move the Cached Parser Dummy Name to _PyRuntimeState (python#100277) Document that zipfile's pwd parameter is a `bytes` object (python#100209)pythongh-99767: mark `PyTypeObject.tp_watched` as internal use only in table (python#100271) Fix typo in introduction.rst (python#100266) ...
Thank you for this, it allowed to make my project deterministic! I dropped a comment about it there:https://stackoverflow.com/questions/7681786/how-is-hashnone-calculated/75296606#75296606 |
This way, we don't need the "getPythonNone" function, which was onlycalled once, to create the object. Beginning with Python 3.12, Nonewill have a constant hash anyway. We use this new value. Seepython/cpython#99541.
This way, we don't need the "getPythonNone" function, which was onlycalled once, to create the object. Beginning with Python 3.12, Nonewill have a constant hash anyway. We use this new value. Seepython/cpython#99541.
This way, we don't need the "getPythonNone" function, which was onlycalled once, to create the object. Beginning with Python 3.12, Nonewill have a constant hash anyway. We use this new value. Seepython/cpython#99541.
This way, we don't need the "getPythonNone" function, which was onlycalled once, to create the object. Beginning with Python 3.12, Nonewill have a constant hash anyway. We use this new value. Seepython/cpython#99541.
gsakkis commentedMar 1, 2024
Is this merged into 3.12? There's no mention in thechangelog. |
yonillasky commentedMar 1, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The PR did update the blurb, and it appears in What makes a given PR worthy of appearing in the changelog? |
Prefix caching makes use of Python's built-in `hash()` function. As ofPython 3.12, the behavior of `hash(None)` has changed to be apredictable constant value. This makes it more feasible that someonecould try exploit hash collisions.The impact of a collision would be using cache that was generated usingdifferent content. Given knowledge of prompts in use and predictable hashingbehavior, someone could intentionally populate the cache using a promptknown to collide with another prompt in use. There doesn't seem to be muchvalue to an attacker in doing this, but it's certainly not ideal, and couldinterfere with the accuracy of results for another user.The invasiveness of this fix should be weighed against the severity of theissue to determine whether this is worth fixing.Using a hashing algorithm that is less prone to collision (like sha256, forexample) would be the best way to avoid the possibility of a collision.However, it would have an impact to both performance and memory footprint.An alternative is to continue to use `hash()`, but make it much more difficultto predict the hash value.What we want is that the starting hash value is randomized, which is thebehavior we got here prior to Python 3.12. An easy fix is to use astring. Here we use `'None'` to still make it clear we're starting fromnothing, but with a string we'll get a different hash value each timevllm runs. Note that within a given run, the value will remain the same.This restores the safer hashing behavior from before.Thank you very much to@kexinoh for reporting this concern privately sothat it could be evaluated for its severity prior to our decision to fixthis as a security enhancement.The commit that changed this behavior for Python 3.12 is here:-python/cpython@432117c-python/cpython#99541Signed-off-by: Russell Bryant <rbryant@redhat.com>
Uh oh!
There was an error while loading.Please reload this page.