Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-113190: Reenable non-debug interned string cleanup#113601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

@eduardo-elizondo
Copy link
Contributor

@eduardo-elizondoeduardo-elizondo commentedDec 31, 2023
edited
Loading

InGH-19474 we introduced immortalized interned strings. However, at the time, we didn't have a strict guarantee that a runtime shutdown (i.ePy_Finalize()) would completely cleanup all the allocated PyObjects (i.e a simple example of./python -X showrefcount -c 'import itertools' would show leaks).

Today, these leaks have been largely fixed and there have been stricter guarantees into how obmalloc and the interpreter state works after a runtime shutdown. Given this, it should be safe to re-enable the interned string cleanup and revert the structseq test that had issues with this leaks before. This test should show the correctness of this code now.

davidmerwin and tkieft reacted with heart emojierlend-aasland and furkanonder reacted with rocket emoji
@Eclips4Eclips4 changed the title[DONOTMETGE] gh-113190: Reenable non-debug interned string cleanup[DO NOT MERGE] gh-113190: Reenable non-debug interned string cleanupDec 31, 2023
@Eclips4
Copy link
Member

Eclips4 commentedDec 31, 2023
edited
Loading

I addedDO-NOT-MERGE label for you :)

@Eclips4Eclips4 changed the title[DO NOT MERGE] gh-113190: Reenable non-debug interned string cleanupgh-113190: Reenable non-debug interned string cleanupDec 31, 2023
@eduardo-elizondo
Copy link
ContributorAuthor

eduardo-elizondo commentedDec 31, 2023
edited
Loading

I addedDO-NOT-MERGE label for you :)

Thanks!! Will ping you once all the tests are passing to move this into safe-to-merge mode 🙂

@eduardo-elizondo
Copy link
ContributorAuthor

eduardo-elizondo commentedDec 31, 2023
edited
Loading

@Eclips4 tests are passing! Could you help me by adding the "skip news" label (since it's probably not required) and removing the "DO-NOT-MERGE" label?

Thanks!!

Eclips4 reacted with thumbs up emoji

@Eclips4Eclips4 added skip news interpreter-core(Objects, Python, Grammar, and Parser dirs) needs backport to 3.12only security fixes and removed DO-NOT-MERGE labelsDec 31, 2023
@eduardo-elizondo
Copy link
ContributorAuthor

cc@nascheme to take a look when you have a chance!

@nascheme
Copy link
Member

It would be nice to do a better cleanup but I'm concerned about how safe this actually is. Since we don't have an accurate refcnt for these strings, we are assuming that no one is actually still using them. It seems possible that some extensions or programs embedding Python could hang on to interned strings afterPy_Finalize(). That would risk introducing "use-after-free" bugs. In the worst case, that could turn into an arbitrary code execution security bug.

Maybe I'm being too paranoid? I don't know how we could actually fix this though. Keeping hold of and using memory allocated by the Python runtime afterPy_Finalize() surely puts you an very shaky ground. Previously, the obmalloc state was global to the process and you could keep using memory allocated from there after finalizing. With sub-interpeters potentially having their own obmalloc state, that's likely not okay.

My gut feeling is we should proceed cautiously, even though we would like to fix the leak. Maybe we could enable this for alphas and then turn it off again for release? Maybe we need to wait until we getpython -X showrefcount showing zero for all or nearly all built-in and well known extensions? Seems like a tall order though.

eduardo-elizondo reacted with thumbs up emoji

@eduardo-elizondo
Copy link
ContributorAuthor

eduardo-elizondo commentedJan 5, 2024
edited
Loading

@nascheme I agree with what you mentioned above and I was paranoid of that scenario too, hence the disabling of the code.

Purely going off from the docs, I don't think there's anywhere where Python provides an explicit expectation of what the behavior is of memory that survives a call to Py_Finalize. Furthermore, in the C-API it is specified thatPy_Finalize will: "Undo all initializations made byPy_Initialize() and subsequent use of Python/C API functions". This gives an an implicit assumption that holding onto these interned strings after Py_Finalize is an undefined behavior.

Perhaps we should be more explicit about this in the docs (maybe in the "Bugs and Caveats" section of Py_Finalize)? i.e python can't guarantee correctness for any memory allocated during a python initialization and used in any subsequent initializations. Thoughts?

Regardless, I completely agree that we should still be careful about putting this out there. Stating with alphas sounds like a good idea to get some early signal. Let me know what's the best way to only enable this only for alphas.

@eduardo-elizondo
Copy link
ContributorAuthor

eduardo-elizondo commentedJan 5, 2024
edited
Loading

Maybe we need to wait until we get python -X showrefcount showing zero for all or nearly all built-in and well known extensions?

I think we are already there? Here's a quick test program that I ran:

./python -X showrefcount -c "import sysimport importlibfor mod in sys.modules.keys():  importlib.import_module(mod)"[0 refs, 0 blocks]

@vstinner
Copy link
Member

vstinner commentedJan 8, 2024
edited
Loading

Maybe we need to wait until we get python -X showrefcount showing zero for all or nearly all built-in and well known extensions?

It's mostly done for one year:https://mail.python.org/archives/list/python-dev@python.org/thread/E4C6TDNVDPDNNP73HTGHN5W42LGAE22F/ "Mostly" means: some remaining stdlib C extensions still implement their own static types which are not cleared by_PyTypes_FiniTypes(), and/or don't use the multi-phase initialization API yet.

test_embed checks thatpython -c pass does not leak memory at Python exit: seeMiscTests.test_no_memleak().

Purely going off from the docs, I don't think there's anywhere where Python provides an explicit expectation of what the behavior is of memory that survives a call to Py_Finalize.

This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13:

  • Explain how to qualify if a crash is related to this issue,
  • Document how to update affected code.

In short, a program must not keep references to any Python object betweenPy_Finalize() andPy_Initialize(). IMO the general rule is callingPy_Finalize() converts all Python object pointers to dangling pointers. There are some very specific exceptions which should be listed with conditions, such as "only on CPython and it's an implementation detail which can change between two Python versions".

In general, I suggest to reset all pointers to Python objects at each Py_Initialize() call. Or said differently: don't storeany pointer to Python objects after Py_Finalize() :-)

Examples of exceptions:

  • Immortal objects
  • Static types (built-in immortal objects are marked as immortal)
  • Singletons (built-in singleton objects are marked as immortal) such as: empty string, empty tuple, small integers (-5..256).
eduardo-elizondo and erlend-aasland reacted with thumbs up emoji

@eduardo-elizondo
Copy link
ContributorAuthor

eduardo-elizondo commentedJan 8, 2024
edited by vstinner
Loading

This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13

For sure, I can update the PR with the details with what you mentioned here!

It's mostly done for one year

Given this guarantee, I think it's safe to assume that if we have any Python object leak it's a user induced leak, right? i.e. the runtime itself will not cause incorrect behavior. I will include some wording around this so that users can isolate the surface are when looking for issues (i.e look at the third-party extensions not at the runtime).

In short, a program must not keep references to any Python object between Py_Finalize() and Py_Initialize()

That's my take too - though it's not specified in the docs! Any thoughts of also updating the Py_Finalize/Py_Initialize documentation as part of this PR to include stricter language like this?

erlend-aasland reacted with thumbs up emoji

@vstinner
Copy link
Member

This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13

I looked again at the code. It has a complicated history:

  • In Python 3.9 and older, interned strings werenever deleted. Only special build for Valgrind and Purity cleared them at exit.
  • In Python 3.10 and 3.11, interned strings arealways deleted at exit.
  • In Python 3.12, interned strings aredeleted only if Python is built in debug mode: they are not deleted in release mode.

If there are applications relying on interned strings to survive between multiple Py_Initialize()/Py_Finalize() cycles, they should already be affected by Python 3.10 and 3.11.

By the way, Python 3.11 added _PyTypes_FiniTypes() to "clear" static types in Py_Finalize().

@encukouencukou added the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelAug 12, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@encukou for commit6aee7f5 🤖

If you want to schedule another build, you need to add the🔨 test-with-refleak-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-refleak-buildbotsTest PR w/ refleak buildbots; report in status section labelAug 12, 2024
@encukouencukouenabled auto-merge (squash)August 15, 2024 11:28
@encukouencukou merged commit3203a74 intopython:mainAug 15, 2024
@miss-islington-app
Copy link

Thanks@eduardo-elizondo for the PR, and@encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry,@eduardo-elizondo and@encukou, I could not cleanly backport this to3.13 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 3203a7412977b8da3aba2770308136a37f48c927 3.13

@miss-islington-app
Copy link

Sorry,@eduardo-elizondo and@encukou, I could not cleanly backport this to3.12 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 3203a7412977b8da3aba2770308136a37f48c927 3.12

@eduardo-elizondo
Copy link
ContributorAuthor

Thanks@encukou !!

ericsnowcurrently reacted with heart emoji

@neonene
Copy link
Contributor

Is there any chance this will be backported to 3.13.1?

hugovk pushed a commit to hugovk/cpython that referenced this pull requestFeb 26, 2025
…ythonGH-113601)(cherry picked from commit3203a74)Co-authored-by: Eddie Elizondo <eelizondo@meta.com>
@bedevere-app
Copy link

GH-130581 is a backport of this pull request to the3.13 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.13bugs and security fixes labelFeb 26, 2025
hugovk pushed a commit to hugovk/cpython that referenced this pull requestFeb 26, 2025
…ythonGH-113601)(cherry picked from commit3203a74)Co-authored-by: Eddie Elizondo <eelizondo@meta.com>
@bedevere-app
Copy link

GH-130582 is a backport of this pull request to the3.12 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.12only security fixes labelFeb 26, 2025
@hugovk
Copy link
Member

The conflict preventing Miss Islington from backporting#113601 was the addition toDoc/whatsnew/3.14.rst.

I've made the backport PRs. Do we need any other docs in them?

@encukou
Copy link
Member

Is there any chance this will be backported to 3.13.1?

Sorry, I missed the notification!

See this paragraph in the added docs:

Note that Python will do a best effort at freeing all memory allocated by the Python
interpreter. Therefore, any C-Extension should make sure to correctly clean up all
of the preveiously allocated PyObjects before using them in subsequent calls to
:c:func:Py_Initialize. Otherwise it could introduce vulnerabilities and incorrect
behavior.

In other words, it's possible that some extensions will start crashing afterPy_Finalize &Py_Initialize.
It would only be extensions that used the API incorreclty, but, that's little consolation for affected users.

I don't think we should backport to bugfix branches, unfortunately.

@hugovk
Copy link
Member

I don't think we should backport to bugfix branches, unfortunately.

Okay, I'll close the backports.

jcfr added a commit to jcfr/MeVisLab-pythonqt that referenced this pull requestSep 4, 2025
…) in sanitizer runsUnder ASan/LSan with Python 3.12 on Ubuntu 22.04/24.04, the cleanup testsreport direct leaks rooted in CPython’s unicode interning path (frames viaPyUnicode_New) during module import in our embedded init.This is a known CPython issue (python/cpython#113190). Interned-unicodecleanup was re-enabled in 3.13 (python/cpython#113601) but not backportedto 3.12. To keep CI signal meaningful while we still test against 3.12,enable a narrow LSan suppression for PyUnicode_New only, and only on 3.12.Notes:- detect_leaks=1 remains enabled; the suppression is limited to PyUnicode_New  and won’t mask leaks in PythonQt.- Remove once CI moves off Python 3.12.Refs:python/cpython#113190,python/cpython#113601
mrbean-bremen pushed a commit to MeVisLab/pythonqt that referenced this pull requestSep 4, 2025
…) in sanitizer runsUnder ASan/LSan with Python 3.12 on Ubuntu 22.04/24.04, the cleanup testsreport direct leaks rooted in CPython’s unicode interning path (frames viaPyUnicode_New) during module import in our embedded init.This is a known CPython issue (python/cpython#113190). Interned-unicodecleanup was re-enabled in 3.13 (python/cpython#113601) but not backportedto 3.12. To keep CI signal meaningful while we still test against 3.12,enable a narrow LSan suppression for PyUnicode_New only, and only on 3.12.Notes:- detect_leaks=1 remains enabled; the suppression is limited to PyUnicode_New  and won’t mask leaks in PythonQt.- Remove once CI moves off Python 3.12.Refs:python/cpython#113190,python/cpython#113601
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@naschemenaschemenascheme left review comments

@encukouencukouencukou approved these changes

Assignees

@encukouencukou

Labels

interpreter-core(Objects, Python, Grammar, and Parser dirs)skip news

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

10 participants

@eduardo-elizondo@Eclips4@nascheme@vstinner@neonene@ericsnowcurrently@encukou@bedevere-bot@hugovk@serhiy-storchaka

[8]ページ先頭

©2009-2025 Movatter.jp