Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-113993: Allow interned strings to be mortal, and fix related issues#120520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
encukou merged 79 commits intopython:mainfromencukou:immortal-interned
Jun 21, 2024

Conversation

encukou
Copy link
Member

@encukouencukou commentedJun 14, 2024
edited by bedevere-appbot
Loading

I've spent too much time looking at this myself, it wants more eyes :)

I spent a week learning about the string interning mechanism, and wrote up how I think it should work in anInternalDocs file I'm adding here.

I found a bunch of ...quirks if not outright bugs. For example, we have duplicate singletons (e.g._Py_ID(a) and the latin1 short stringa). I don't think I can bring back mortal interned strings without getting my idea of the design in sync with the code, so, this ended up being a big PR.


  • Add an InternalDocs file describing how interning should work and how to use it.
    (Please review this first!)

  • Add internal functions toexplicitly request what kind of interning is done:

    • _PyUnicode_InternMortal
    • _PyUnicode_InternImmortal
    • _PyUnicode_InternStatic
  • Switch uses ofPyUnicode_InternInPlace to those.

  • Disallow using_Py_SetImmortal on strings directly.
    You should use_PyUnicode_InternImmortal instead:

    • Strings should be interned before immortalization, otherwise you're possibly
      interning a immortalizing copy.
    • _Py_SetImmortal doesn't handle theSSTATE_INTERNED_MORTAL to
      SSTATE_INTERNED_IMMORTAL update, and those flags can't be changed in
      backports, as they are now part of public API and version-specific ABI.
  • Add private_only_immortal argument forsys.XXX, used in refleak test machinery.

  • Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:

    • _Py_ID
    • _Py_STR (including the empty string)
    • one-character latin-1 singletons

    Now, when you intern a singleton, that exact singleton will be interned.

  • Add a_Py_LATIN1_CHR macro, use it instead of_Py_ID/_Py_STR for one-character latin-1 singletons everywhere (including Clinic).

  • Intern_Py_STR singletons at startup.

    Try this in 3.12: (click to expand)
    importsysa=sys.intern('<module>')# normal stringprint('a',id(a),sys.getrefcount(a))try:raiseException()exceptExceptionaserr:b=err.__traceback__.tb_frame.f_code.co_name# same string via _Py_STRassertsys.intern(a)issys.intern(b)

    In 3.13 the reproducer doesn't work but I don't think the underlying unsoundness was fixed.

  • For free-threaded builds, intern_Py_LATIN1_CHR singletons at startup.

  • Beef up the tests. Cover internal details (marked with@cpython_only).

  • Add lots of assertions

eduardo-elizondo reacted with rocket emoji
Also, the `PyUnicode_InternImmortal` function (with a public-looking name)is switched to use this.(I picked a relatively inconsequential module on purpose.)
AFAIUI, this is now handled by the `statically_allocated` flag.
…rnedMortal interned strings don't count references from the interned_dict intheir refcount. This menans a -2 at interning time, a +2 in ClearInterned,and special handling in unicode_dealloc.Note that unicode_dealloc will currently immortalize an interned string.That means we shouldn't get refleaks *yet*.
It's marshal and the compiler that immortalize strings, not thecode object.
@encukouencukou added needs backport to 3.12only security fixes needs backport to 3.13bugs and security fixes labelsJun 21, 2024
@encukou
Copy link
MemberAuthor

The buildbot failures are unrelated; I'll merge.

Thank you for the reviews!

@encukouencukou merged commit6f1d448 intopython:mainJun 21, 2024
126 of 129 checks passed
@miss-islington-app
Copy link

Thanks@encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@encukouencukou deleted the immortal-interned branchJune 21, 2024 15:19
@miss-islington-app
Copy link

Sorry,@encukou, I could not cleanly backport this to3.13 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 6f1d448bc110633eda110310fd833bd46e7b30f2 3.13

@miss-islington-app
Copy link

Sorry,@encukou, I could not cleanly backport this to3.12 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 6f1d448bc110633eda110310fd833bd46e7b30f2 3.12

@ericsnowcurrently
Copy link
Member

Thanks for taking care of this,@encukou!

encukou added a commit to encukou/cpython that referenced this pull requestJun 24, 2024
…related issues (pythonGH-120520)* Add an InternalDocs file describing how interning should work and how to use it.* Add internal functions to *explicitly* request what kind of interning is done:  - `_PyUnicode_InternMortal`  - `_PyUnicode_InternImmortal`  - `_PyUnicode_InternStatic`* Switch uses of `PyUnicode_InternInPlace` to those.* Disallow using `_Py_SetImmortal` on strings directly.  You should use `_PyUnicode_InternImmortal` instead:  - Strings should be interned before immortalization, otherwise you're possibly    interning a immortalizing copy.  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in    backports, as they are now part of public API and version-specific ABI.* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:  - `_Py_ID`  - `_Py_STR` (including the empty string)  - one-character latin-1 singletons  Now, when you intern a singleton, that exact singleton will be interned.* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).* Intern `_Py_STR` singletons at startup.* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.* Beef up the tests. Cover internal details (marked with `@cpython_only`).* Add lots of assertionsCo-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
@bedevere-app
Copy link

GH-120945 is a backport of this pull request to the3.13 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.13bugs and security fixes labelJun 24, 2024
encukou added a commit that referenced this pull requestJun 24, 2024
…d issues (GH-120520) (GH-120945)* Add an InternalDocs file describing how interning should work and how to use it.* Add internal functions to *explicitly* request what kind of interning is done:  - `_PyUnicode_InternMortal`  - `_PyUnicode_InternImmortal`  - `_PyUnicode_InternStatic`* Switch uses of `PyUnicode_InternInPlace` to those.* Disallow using `_Py_SetImmortal` on strings directly.  You should use `_PyUnicode_InternImmortal` instead:  - Strings should be interned before immortalization, otherwise you're possibly    interning a immortalizing copy.  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in    backports, as they are now part of public API and version-specific ABI.* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:  - `_Py_ID`  - `_Py_STR` (including the empty string)  - one-character latin-1 singletons  Now, when you intern a singleton, that exact singleton will be interned.* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).* Intern `_Py_STR` singletons at startup.* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.* Beef up the tests. Cover internal details (marked with `@cpython_only`).* Add lots of assertionsCo-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
mrahtz pushed a commit to mrahtz/cpython that referenced this pull requestJun 30, 2024
… issues (pythonGH-120520)* Add an InternalDocs file describing how interning should work and how to use it.* Add internal functions to *explicitly* request what kind of interning is done:  - `_PyUnicode_InternMortal`  - `_PyUnicode_InternImmortal`  - `_PyUnicode_InternStatic`* Switch uses of `PyUnicode_InternInPlace` to those.* Disallow using `_Py_SetImmortal` on strings directly.  You should use `_PyUnicode_InternImmortal` instead:  - Strings should be interned before immortalization, otherwise you're possibly    interning a immortalizing copy.  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in    backports, as they are now part of public API and version-specific ABI.* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:  - `_Py_ID`  - `_Py_STR` (including the empty string)  - one-character latin-1 singletons  Now, when you intern a singleton, that exact singleton will be interned.* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).* Intern `_Py_STR` singletons at startup.* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.* Beef up the tests. Cover internal details (marked with `@cpython_only`).* Add lots of assertionsCo-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
@eduardo-elizondo
Copy link
Contributor

This is great! Quick follow-up@encukou, should we rebase this one now:#113601?

encukou reacted with thumbs up emoji

noahbkim pushed a commit to hudson-trading/cpython that referenced this pull requestJul 11, 2024
… issues (pythonGH-120520)* Add an InternalDocs file describing how interning should work and how to use it.* Add internal functions to *explicitly* request what kind of interning is done:  - `_PyUnicode_InternMortal`  - `_PyUnicode_InternImmortal`  - `_PyUnicode_InternStatic`* Switch uses of `PyUnicode_InternInPlace` to those.* Disallow using `_Py_SetImmortal` on strings directly.  You should use `_PyUnicode_InternImmortal` instead:  - Strings should be interned before immortalization, otherwise you're possibly    interning a immortalizing copy.  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in    backports, as they are now part of public API and version-specific ABI.* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:  - `_Py_ID`  - `_Py_STR` (including the empty string)  - one-character latin-1 singletons  Now, when you intern a singleton, that exact singleton will be interned.* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).* Intern `_Py_STR` singletons at startup.* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.* Beef up the tests. Cover internal details (marked with `@cpython_only`).* Add lots of assertionsCo-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
estyxx pushed a commit to estyxx/cpython that referenced this pull requestJul 17, 2024
… issues (pythonGH-120520)* Add an InternalDocs file describing how interning should work and how to use it.* Add internal functions to *explicitly* request what kind of interning is done:  - `_PyUnicode_InternMortal`  - `_PyUnicode_InternImmortal`  - `_PyUnicode_InternStatic`* Switch uses of `PyUnicode_InternInPlace` to those.* Disallow using `_Py_SetImmortal` on strings directly.  You should use `_PyUnicode_InternImmortal` instead:  - Strings should be interned before immortalization, otherwise you're possibly    interning a immortalizing copy.  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in    backports, as they are now part of public API and version-specific ABI.* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:  - `_Py_ID`  - `_Py_STR` (including the empty string)  - one-character latin-1 singletons  Now, when you intern a singleton, that exact singleton will be interned.* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).* Intern `_Py_STR` singletons at startup.* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.* Beef up the tests. Cover internal details (marked with `@cpython_only`).* Add lots of assertionsCo-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
@encukou
Copy link
MemberAuthor

@eduardo-elizondo Yup, I've updated it, and left some notes about the docs.
If you want to delegate the writing, let me know and I'll take over the PR :)

@eduardo-elizondo
Copy link
Contributor

@eduardo-elizondo Yup, I've updated it, and left some notes about the docs. If you want to delegate the writing, let me know and I'll take over the PR :)

Fixing them now! Let's close out in the other PR

encukou added a commit to encukou/cpython that referenced this pull requestAug 16, 2024
…related issues (pythonGH-120520)* Add an InternalDocs file describing how interning should work and how to use it.* Add internal functions to *explicitly* request what kind of interning is done:  - `_PyUnicode_InternMortal`  - `_PyUnicode_InternImmortal`  - `_PyUnicode_InternStatic`* Switch uses of `PyUnicode_InternInPlace` to those.* Disallow using `_Py_SetImmortal` on strings directly.  You should use `_PyUnicode_InternImmortal` instead:  - Strings should be interned before immortalization, otherwise you're possibly    interning a immortalizing copy.  - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to    `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in    backports, as they are now part of public API and version-specific ABI.* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:  - `_Py_ID`  - `_Py_STR` (including the empty string)  - one-character latin-1 singletons  Now, when you intern a singleton, that exact singleton will be interned.* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).* Intern `_Py_STR` singletons at startup.* Beef up the tests. Cover internal details (marked with `@cpython_only`).* Add lots of assertionsCo-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
Yhg1s pushed a commit that referenced this pull requestSep 27, 2024
…H-121903,GH-122303) (#123065)This backports several PRs forgh-113993, making interned strings mortal so they can be garbage-collected when no longer needed.* Allow interned strings to be mortal, and fix related issues (GH-120520)  * Add an InternalDocs file describing how interning should work and how to use it.  * Add internal functions to *explicitly* request what kind of interning is done:    - `_PyUnicode_InternMortal`    - `_PyUnicode_InternImmortal`    - `_PyUnicode_InternStatic`  * Switch uses of `PyUnicode_InternInPlace` to those.  * Disallow using `_Py_SetImmortal` on strings directly.    You should use `_PyUnicode_InternImmortal` instead:    - Strings should be interned before immortalization, otherwise you're possibly      interning a immortalizing copy.    - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to      `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in      backports, as they are now part of public API and version-specific ABI.  * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.   Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:    - `_Py_ID`    - `_Py_STR` (including the empty string)    - one-character latin-1 singletons    Now, when you intern a singleton, that exact singleton will be interned.  * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).  * Intern `_Py_STR` singletons at startup.  * Beef up the tests. Cover internal details (marked with `@cpython_only`).  * Add lots of assertions* Don't immortalize in PyUnicode_InternInPlace; keep immortalizing in other API (GH-121364)  * Switch PyUnicode_InternInPlace to _PyUnicode_InternMortal, clarify docs  * Document immortality in some functions that take `const char *`  This is PyUnicode_InternFromString;  PyDict_SetItemString, PyObject_SetAttrString;  PyObject_DelAttrString; PyUnicode_InternFromString;  and the PyModule_Add convenience functions.  Always point out a non-immortalizing alternative.  * Don't immortalize user-provided attr names in _ctypes* Immortalize names in code objects to avoid crash (GH-121903)* Intern latin-1 one-byte strings at startup (GH-122303)There are some 3.12-specific changes, mainly to allow statically allocated strings in deepfreeze. (In 3.13, deepfreeze switched to the general `_Py_ID`/`_Py_STR`.)Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
@@ -0,0 +1,122 @@
# String interning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would be good to have a link to this file from the index in InternalDocs/README.md.

encukou reacted with thumbs up emoji
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@iritkatrieliritkatrieliritkatriel left review comments

@ericsnowcurrentlyericsnowcurrentlyericsnowcurrently approved these changes

@neoneneneoneneneonene left review comments

@erlend-aaslanderlend-aaslandAwaiting requested review from erlend-aaslanderlend-aasland is a code owner

@berkerpeksagberkerpeksagAwaiting requested review from berkerpeksagberkerpeksag is a code owner

@pablogsalpablogsalAwaiting requested review from pablogsalpablogsal is a code owner

@lysnikolaoulysnikolaouAwaiting requested review from lysnikolaoulysnikolaou is a code owner

@kumaraditya303kumaraditya303Awaiting requested review from kumaraditya303kumaraditya303 is a code owner

@isidenticalisidenticalAwaiting requested review from isidentical

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

@methanemethaneAwaiting requested review from methanemethane is a code owner

Assignees

@encukouencukou

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

8 participants
@encukou@neonene@bedevere-bot@mdboom@ericsnowcurrently@eduardo-elizondo@iritkatriel@hugovk

[8]ページ先頭

©2009-2025 Movatter.jp