Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-135552: Make the GC clear weakrefs later.#136189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
nascheme wants to merge14 commits intopython:main
base:main
Choose a base branch
Loading
fromnascheme:gh-135552-wr-clear-later

Conversation

nascheme
Copy link
Member

@naschemenascheme commentedJul 1, 2025
edited
Loading

Fix two bugs related to the interaction of weakrefs and the garbage
collector. The weakrefs in thetp_subclasses dictionary are needed in
order to correctly invalidate type caches (for example, by calling
PyType_Modified()). Clearing weakrefs before calling finalizers causes
the caches to not be correctly invalidated. That can cause crashes since the
caches can refer to invalid objects. This is fixed by deferring the
clearing of weakrefs to classes and without callbacks until after finalizers
are executed.

The second bug is caused by weakrefs created while running finalizers. Those
weakrefs can be outside the set of unreachable garbage and therefore survive
thedelete_garbage() phase (wheretp_clear() is called on objects).
Those weakrefs can expose to Python-level code objects that have had
tp_clear() called on them. SeeGH-91636 as an example of this kind of
bug. This is fixed be clearing all weakrefs to unreachable objects after
finalizers have been executed.

Clear the weakrefs to unreachable objects after finalizers are called.
@neonene
Copy link
Contributor

I can confirm this PR fixes thegh-132413 issue as well.

nascheme and efimov-mikhail reacted with thumbs up emoji

@nascheme
Copy link
MemberAuthor

I think this fixes (or mostly fixes)gh-91636 as well.

@naschemenascheme requested a review frompablogsalJuly 1, 2025 23:33
@naschemenascheme added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelJul 1, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@nascheme for commit12f0b5c 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelJul 1, 2025
@nascheme
Copy link
MemberAuthor

This introduces refleaks, it seems. One of the leaking tests:
test.test_concurrent_futures.test_shutdown.ProcessPoolSpawnProcessPoolShutdownTest.test_shutdown_gh_132969_case_1
My unconfirmed suspicion is that a finalizer is now resurrecting an object via a weakref. Previously that weakref would be cleared before the finalizer is run. The multiprocessing finalizer logic seems very complicated. :-/

@nascheme
Copy link
MemberAuthor

This is the smallest leaking example I could make so far. Something withProcessPoolExecutor leaking maybe.

class LeakTest(unittest.TestCase):    @classmethod    def _fail_task(cls, n):        raise ValueError("failing task")    def test_leak_case(self):        # this leaks references        executor = futures.ProcessPoolExecutor(                max_workers=1,                max_tasks_per_child=1,                )        f2 = executor.submit(LeakTest._fail_task, 0)        try:            f2.result()        except ValueError:            pass        # Ensure that the executor cleans up after called        # shutdown with wait=False        executor_manager_thread = executor._executor_manager_thread        executor.shutdown(wait=False)        time.sleep(0.2)        executor_manager_thread.join()    def test_leak_case2(self):        # this does not leak        with futures.ProcessPoolExecutor(                max_workers=1,                max_tasks_per_child=1,                ) as executor:            f2 = executor.submit(LeakTest._fail_task, 0)            try:                f2.result()            except ValueError:                pass

@neonene
Copy link
Contributor

neonene commentedJul 2, 2025
edited
Loading

Other leaking examples (on Windows):

1. test_logging:
importloggingimportlogging.configimportlogging.handlersfrommultiprocessingimportQueue,ManagerclassConfigDictTest(unittest.TestCase):deftest_multiprocessing_queues_XXX(self):config= {'version':1,'handlers' : {'spam' : {'class':'logging.handlers.QueueHandler','queue':Manager().Queue()  ,# Leak# 'queue': Manager().JoinableQueue()  # Leak# 'queue': Queue(),                   # No leak                },            },'root': {'handlers': ['spam']}        }logger=logging.getLogger()logging.config.dictConfig(config)whilelogger.handlers:h=logger.handlers[0]logger.removeHandler(h)h.close()
2. test_interpreters.test_api:
importcontextlibimportthreadingimporttypesfromconcurrentimportinterpretersdeffunc():raiseException('spam!')@contextlib.contextmanagerdefcaptured_thread_exception():ctx=types.SimpleNamespace(caught=None)defexcepthook(args):ctx.caught=argsorig_excepthook=threading.excepthookthreading.excepthook=excepthooktry:yieldctxfinally:threading.excepthook=orig_excepthookclassTestInterpreterCall(unittest.TestCase):deftest_call_in_thread_XXX(self):interp=interpreters.create()call= (interp._call,interp.call)[1]# 0: No leak, 1: Leakwithcaptured_thread_exception()as_:t=threading.Thread(target=call,args=(interp,func, (), {}))t.start()t.join()

UPDATE:

importweakref,_interpreterswd=weakref.WeakValueDictionary()classTestInterpreterCall(unittest.TestCase):deftest_call_in_thread_XXX(self):id=_interpreters.create()wd[id]=type("", (), {})_interpreters.destroy(id)
efimov-mikhail reacted with thumbs up emoji

@nascheme
Copy link
MemberAuthor

The majority (maybe all) of these leaks are caused by theWeakValueDictionary used asmultiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the.data dict works but it not too elegant.

sergey-miryanov reacted with eyes emoji

This avoids breaking tests while fixing the issue with tp_subclasses. Inthe long term, it would be better to defer the clear of all weakrefs,not just the ones referring to classes.  However, that is a moredistruptive change and would seem to have a higher chance of breakinguser code.  So, it would not be something to do in a bugfix release.
@nascheme
Copy link
MemberAuthor

The majority (maybe all) of these leaks are caused by the WeakValueDictionary used as multiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the .data dict works but it not too elegant.

Nope, that doesn't fix all the leaks. And having to explicitly clean the weakrefs from the WeakValueDictionary really shouldn't be needed, I think. TheKeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

For the purposes of having a fix that we can backport (should probably be backported to all maintained Python versions), a less disruptive fix would be better. To that end, I've changed this PR to only defer clearing weakrefs to class objects. That fixes thetp_subclasses bug but should be less likely to break currently working code.

sergey-miryanov reacted with thumbs up emoji

@naschemenascheme added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelJul 3, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@nascheme for commit2f3daba 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelJul 3, 2025
We need to clear those before executing the callback.  Since thisensures they can't run a second time, we don't need_PyGC_SET_FINALIZED().  Revise comments.
@nascheme
Copy link
MemberAuthor

The KeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

Ah, theKeyedRef callback requires that the weakref is cleared when it is called, otherwise it was not deleting the item from the weak dictionary. So we need to clear the weakrefs with callbacks before executing them. That fixes the refleaks, I believe.

I revised this PR to be something that is potentially suitable for backporting. To minimize the behaviour change, I'm only deferring the clear of weakrefs that refer to types (in order to allow tp_subclasses to work) or with callbacks. I still have an extra clearing pass that gets done after the finalizers are called. That avoids bugs like#91636.

If this gets merged, I think we should consider deferring clearing all weakrefs (without callbacks) until after finalizers are executed. I think that's more correct since it gives the finalizers a better opportunity to do their job.

efimov-mikhail and sergey-miryanov reacted with thumbs up emoji

* *not* be cleared so that caches based on the type version are correctly
* invalidated (see GH-135552 as a bug caused by this).
*/
clear_weakrefs(&final_unreachable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ough! Nice trick withfinal_unreachable!

@pablogsalpablogsal self-assigned thisJul 3, 2025
This is a bit trickly because the wrlist can be changing as weiterate over it.
@sergey-miryanov
Copy link
Contributor

TheKeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

IIUC, there maybe cases whenselfref orself_weakref already cleared byWeakValueDictionary or_WeakValueDictionary still have values should should be cleared to. But they will not be cleared because of conditionif selfref() is not None andif self_weakref() is not None.

IIUC, becausePEP-442 landed we can remove weakrefs here

defremove(wr,selfref=ref(self),_atomic_removal=_remove_dead_weakref):

and there

self_weakref=_weakref.ref(self)

and use a strong reference that will outlive the KeyedRefs in the dictionaries. I replaced the weakref with a strong reference in the _bootstrap.py and checked with refleak buildbot (#136345). No leaks were found.

@nascheme WDYT?

@neonene
Copy link
Contributor

Weakrefwith a callback needs to be cleared early (as-is), accounting for the 3rd-party applications that have similar logic toWeakValueDictionary.

@sergey-miryanov
Copy link
Contributor

Weakrefwith a callback needs to be cleared early (as-is)

I'm not against this. But IIUC (feel free to correct me) we can face a situation in which weakref to theWeakValueDictionary is cleared in onegc collect call andKeyedRefs are cleared in another. And strong ref should handle this situation. We can teach users about this (I suppose).

@neonene
Copy link
Contributor

we can face a situation

The experiment#136345 seems currently invalid, which is based on main and not effective on this PR where I omitted calling_PyWeakref_ClearRef inhandle_weakref_callbacks.

@sergey-miryanov
Copy link
Contributor

The experiment#136345 seems currently invalid, which is based on main and not effective on this PR where I omitted calling_PyWeakref_ClearRef inhandle_weakref_callbacks.

If we use a strong reference, then it doesn't matter if we omit_PyWeakref_ClearRef or not. Am I wrong?

@neonene
Copy link
Contributor

Before discussing, have you checked on this PR? I can see the leaks.

@sergey-miryanov
Copy link
Contributor

sergey-miryanov commentedJul 7, 2025
edited
Loading

Results
# clean test-weak-value-dict➜ .\python.bat -m test test_logging -R :Running Debug|x64 interpreter...Using random seed: 30907501910:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX... ....0:06:26 [1/1] test_logging passed in 6 min 26 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 26 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS# clean gh-135552-wr-clear-later➜ .\python.bat -m test test_logging -R :             Running Debug|x64 interpreter...Using random seed: 25208066900:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX... ...2test_logging leaked [0, 0, 0, 2] memory blocks, sum=2 (this is fine)0:06:19 [1/1] test_logging passed in 6 min 19 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 19 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS# gh-135552-wr-clear-later + weakref removed from WeakValueDictionary➜ .\python.bat -m test test_logging -R :Running Debug|x64 interpreter...Using random seed: 10967972890:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX.2. ....0:06:24 [1/1] test_logging passed in 6 min 24 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 24 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS# gh-135552-wr-clear-later + weakref removed from WeakValueDictionary + call _PyWeakref_ClearRef only for wr with callbacks➜ .\python.bat -m test test_logging -R :Running Debug|x64 interpreter...Using random seed: 25993190690:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX.2. ....0:06:32 [1/1] test_logging passed in 6 min 32 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 32 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS# gh-135552-wr-clear-later + call _PyWeakref_ClearRef only for wr with callbacks➜ .\python.bat -m test test_logging -R :Running Debug|x64 interpreter...Using random seed: 14934248240:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX2.. ....0:06:29 [1/1] test_logging passed in 6 min 29 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 29 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS

@neonene
Copy link
Contributor

Let's move on to#136345. (The result above looks long here.)

@nascheme
Copy link
MemberAuthor

GH-136401 was merged. I think it should be backported as well since it should be pretty low risk.

The main branch has been merged into this PR. I've updated the PR to defer clearing all weakrefs without callbacks (not just weakrefs to classes). This seems more correct to me (rather than special casing weakrefs to classes) and seems the best way to fixGH-132413.

I thinkGH-132413 is actually more likely to be encountered in real code. For example, if you have some logging function in a__del__ method that uses the datetime module. So, maybe this PR should be backported to 3.13 and 3.14 as well.

neonene, sergey-miryanov, and efimov-mikhail reacted with thumbs up emoji

@sergey-miryanov
Copy link
Contributor

Looks good to me. Thanks for detailed comments!

@nascheme
Copy link
MemberAuthor

@pablogsal are you planning to take another look at this? If so, no problem, there is no great rush. If not, I think I'll do another review pass myself and then merge.

@pablogsal
Copy link
Member

pablogsal commentedJul 14, 2025
edited
Loading

Yeah i have it pending for this week (sorry fixing a lot of bugs for 3.14 currently) but if you feel I am taking too much time I am happy if you go ahead I don't want to block it on this

@sergey-miryanov
Copy link
Contributor

@nascheme I see you updated comment forhandle_weakrefs and removed mentions ofGC_REACHABLE andGC_TENTATIVELY_UNREACHABLE.

Would you mind also removing another one fromsubtract_refs comment?

cpython/Python/gc.c

Lines 572 to 576 in9363703

/* Subtract internal references from gc_refs. After this, gc_refs is >= 0
* for all objects in containers, and is GC_REACHABLE for all tracked gc
* objects not in containers. The ones with gc_refs > 0 are directly
* reachable from outside containers, and so can't be collected.
*/

@efimov-mikhail
Copy link
Contributor

Do we want to merge this PR w/o tests on subclasses destruction from#135552?
Or maybe it will be better to place them here?

@nascheme
Copy link
MemberAuthor

Do we want to merge this PR w/o tests on subclasses destruction from#135552?
Or maybe it will be better to place them here?

Sergey has tests in another PR and I'll merge that right after I merge this one.

efimov-mikhail reacted with thumbs up emoji

self.assertIs(wr(), None)
# This used to be None because weakrefs were cleared before
# calling finalizers. Now they are cleared after.
self.assertIsNot(wr(), None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
self.assertIsNot(wr(),None)
self.assertIsNotNone(wr())

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@sergey-miryanovsergey-miryanovsergey-miryanov left review comments

@neoneneneoneneneonene left review comments

@kumaraditya303kumaraditya303kumaraditya303 left review comments

@pablogsalpablogsalAwaiting requested review from pablogsal

@pgansslepganssleAwaiting requested review from pgansslepganssle is a code owner

@abalkinabalkinAwaiting requested review from abalkinabalkin is a code owner

Assignees

@pablogsalpablogsal

Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

7 participants
@nascheme@neonene@bedevere-bot@sergey-miryanov@efimov-mikhail@pablogsal@kumaraditya303

[8]ページ先頭

©2009-2025 Movatter.jp